-                 
		Quick start
-                 
		API- 
							 			- 
						 
		- Resume
- Add
- AdditiveAttention
- AlphaDropout
- Attention
- Average
- AvgPool1D
- AvgPool2D
- AvgPool3D
- BatchNormalization
- Bidirectional
- Concatenate
- Conv1D
- Conv1DTranspose
- Conv2D
- Conv2DTranspose
- Conv3D
- Conv3DTranspose
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Cropping1D
- Cropping2D
- Cropping3D
- Dense
- DepthwiseConv2D
- Dropout
- ELU
- Embedding
- Exponential
- Flatten
- GaussianDropout
- GaussianNoise
- GELU
- GlobalAvgPool1D
- GlobalAvgPool2D
- GlobalAvgPool3D
- GlobalMaxPool1D
- GlobalMaxPool2D
- GlobalMaxPool3D
- GRU
- HardSigmoid
- Input
- LayerNormalization
- LeakyReLU
- Linear
- LSTM
- MaxPool1D
- MaxPool2D
- MaxPool3D
- MultiHeadAttention
- Multiply
- Output Predict
- Output Train
- Permute3D
- PReLU
- ReLU
- Reshape
- RNN
- SELU
- SeparableConv1D
- SeparableConv2D
- Sigmoid
- SimpleRNN
- SoftMax
- SoftPlus
- SoftSign
- SpatialDropout
- Split
- Substract
- Swish
- TanH
- ThresholdedReLU
- UpSampling1D
- UpSampling2D
- UpSampling3D
- ZeroPadding1D
- ZeroPadding2D
- ZeroPadding3D
- Show All Articles (64) Collapse Articles
 
- 
						 
		 			- 
						 
		 			
- 
						 			- 
						 
		- Abs
- Acos
- Acosh
- ArgMax
- ArgMin
- Asin
- Asinh
- Atan
- Atanh
- AveragePool
- Bernouilli
- BitwiseNot
- BlackmanWindow
- Cast
- Ceil
- Celu
- ConcatFromSequence
- Cos
- Cosh
- DepthToSpace
- Det
- DynamicTimeWarping
- Erf
- Exp
- EyeLike
- Flatten
- Floor
- GlobalAveragePool
- GlobalLpPool
- GlobalMaxPool
- HammingWindow
- HannWindow
- HardSwish
- HardMax
- Identity
- ImageDecoder
- Inverse
- lrfft
- lslnf
- lsNaN
- Log
- LogSoftmax
- LpNormalization
- LpPool
- LRN
- MeanVarianceNormalization
- MicrosoftGelu
- Mish
- Multinomial
- MurmurHash3
- Neg
- NhwcMaxPool
- NonZero
- Not
- OptionalGetElement
- OptionalHasElement
- QuickGelu
- RandomNormalLike
- RandomUniformLike
- RawConstantOfShape
- Reciprocal
- ReduceSumInteger
- RegexFullMatch
- Rfft
- Round
- SampleOp
- SequenceLength
- Shape
- Shrink
- Sign
- Sin
- Sinh
- Size
- SpaceToDepth
- Sqrt
- StringNormalizer
- Tan
- TfldfVectorizer
- Tokenizer
- Transpose
- UnfoldTensor
- Show All Articles (66) Collapse Articles
 
 
- 
						 
		
- 
						 			- 
						 
		- Add
- AffineGrid
- And
- BiasAdd
- BiasGelu
- BiasSoftmax
- BiasSplitGelu
- BitShift
- BitwiseAnd
- BitwiseOr
- BitwiseXor
- CastLike
- CDist
- CenterCropPad
- Clip
- Col2lm
- ComplexMul
- ComplexMulConj
- Compress
- Conv
- ConvInteger
- ConvTranspose
- ConvTransposeWithDynamicPads
- CropAndResize
- CumSum
- DeformConv
- DequantizeBFP
- DequantizeLinear
- DequantizeWithOrder
- DFT
- Div
- DynamicQuantizeMatMul
- Equal
- Expand
- ExpandDims
- FastGelu
- FusedConv
- FusedGemm
- FusedMatMul
- FusedMatMulActivation
- GatedRelativePositionBias
- Gather
- GatherElements
- GatherND
- Gemm
- GemmFastGelu
- GemmFloat8
- Greater
- GreaterOrEqual
- GreedySearch
- GridSample
- GroupNorm
- InstanceNormalization
- Less
- LessOrEqual
- LongformerAttention
- MatMul
- MatMulBnb4
- MatMulFpQ4
- MatMulInteger
- MatMulInteger16
- MatMulIntergerToFloat
- MatMulNBits
- MaxPoolWithMask
- MaxRoiPool
- MaxUnPool
- MelWeightMatrix
- MicrosoftDequantizeLinear
- MicrosoftGatherND
- MicrosoftGridSample
- MicrosoftPad
- MicrosoftQLinearConv
- MicrosoftQuantizeLinear
- MicrosoftRange
- MicrosoftTrilu
- Mod
- MoE
- Mul
- MulInteger
- NegativeLogLikelihoodLoss
- NGramRepeatBlock
- NhwcConv
- NhwcFusedConv
- NonMaxSuppression
- OneHot
- Or
- PackedAttention
- PackedMultiHeadAttention
- Pad
- Pow
- QGemm
- QLinearAdd
- QLinearAveragePool
- QLinearConcat
- QLinearConv
- QLinearGlobalAveragePool
- QLinearLeakyRelu
- QLinearMatMul
- QLinearMul
- QLinearReduceMean
- QLinearSigmoid
- QLinearSoftmax
- QLinearWhere
- QMoE
- QOrderedAttention
- QOrderedGelu
- QOrderedLayerNormalization
- QOrderedLongformerAttention
- QOrderedMatMul
- QuantizeLinear
- QuantizeWithOrder
- Range
- ReduceL1
- ReduceL2
- ReduceLogSum
- ReduceLogSumExp
- ReduceMax
- ReduceMean
- ReduceMin
- ReduceProd
- ReduceSum
- ReduceSumSquare
- RelativePositionBias
- Reshape
- Resize
- RestorePadding
- ReverseSequence
- RoiAlign
- RotaryEmbedding
- ScatterElements
- ScatterND
- SequenceAt
- SequenceErase
- SequenceInsert
- Slice
- SparseToDenseMatMul
- SplitToSequence
- Squeeze
- STFT
- StringConcat
- Sub
- Tile
- TorchEmbedding
- TransposeMatMul
- Trilu
- Unsqueeze
- Where
- WordConvEmbedding
- Xor
- Show All Articles (134) Collapse Articles
 
- 
						 
		- Attention
- AttnLSTM
- BatchNormalization
- BiasDropout
- BifurcationDetector
- BitmaskBiasDropout
- BitmaskDropout
- DecoderAttention
- DecoderMaskedMultiHeadAttention
- DecoderMaskedSelfAttention
- Dropout
- DynamicQuantizeLSTM
- EmbedLayerNormalization
- GemmaRotaryEmbedding
- GroupQueryAttention
- GRU
- LayerNormalization
- LSTM
- MicrosoftMultiHeadAttention
- QAttention
- RemovePadding
- RNN
- Sampling
- SkipGroupNorm
- SkipLayerNormalization
- SkipSimplifiedLayerNormalization
- SoftmaxCrossEntropyLoss
- SparseAttention
- TopK
- WhisperBeamSearch
- Show All Articles (15) Collapse Articles
 
 
- 
						 
		
 
 
- 
						 
		 			
- 
						 
		 			
- 
						 			
- 
						 			
- 
						 
		- Resume
- Constant
- GlorotNormal
- GlorotUniform
- HeNormal
- HeUniform
- Identity
- LecunNormal
- LecunUniform
- Ones
- Orthogonal
- RandomNormal
- RandomUnifom
- TruncatedNormal
- VarianceScaling
- Zeros
- Show All Articles (1) Collapse Articles
 
- 
						 
		- Resume
- BinaryCrossentropy
- CategoricalCrossentropy
- CategoricalHinge
- CosineSimilarity
- Hinge
- Huber
- KLDivergence
- LogCosh
- MeanAbsoluteError
- MeanAbsolutePercentageError
- MeanSquaredError
- MeanSquaredLogarithmicError
- Poisson
- SquaredHinge
- Custom
- Show All Articles (1) Collapse Articles
 
 
 
- 
						 
		
- 
							 
		 			
- 
							 
		 			- 
						 
		 			- 
						 
		- Dense
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- AdditiveAttention
- Attention
- MutiHeadAttention
- Conv1D
- Conv2D
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- Embedding
- BatchNormalization
- LayerNormalization
- Bidirectional
- GRU
- LSTM
- SimpleRNN
- Show All Articles (12) Collapse Articles
 
- 
						 
		- Dense
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- AdditiveAttention
- Attention
- MultiHeadAttention
- Conv1D
- Conv2D
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- Embedding
- BatchNormalization
- LayerNormalization
- Bidirectional
- GRU
- LSTM
- SimpleRNN
- Show All Articles (12) Collapse Articles
 
 
- 
						 
		
- 
						 
		- Resume
- Dense
- AdditiveAttention
- Attention
- MultiHeadAttention
- BatchNormalization
- LayerNormalization
- Bidirectional
- GRU
- LSTM
- SimpleRNN
- Conv1D
- Conv2D
- Conv3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- Embedding
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- Show All Articles (13) Collapse Articles
 
- 
						 
		 			- 
						 
		- Dense
- Embedding
- AdditiveAttention
- Attention
- MultiHeadAttention
- Conv1D
- Conv2D
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- BatchNormalization
- LayerNormalization
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- Bidirectional
- GRU
- LSTM
- RNN (GRU)
- RNN (LSTM)
- RNN (SimpleRNN)
- SimpleRNN
- Show All Articles (15) Collapse Articles
 
- 
						 
		- Dense
- Embedding
- AdditiveAttention
- Attention
- MultiHeadAttention
- Conv1D
- Conv2D
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- BatchNormalization
- LayerNormalization
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- Bidirectional
- GRU
- LSTM
- RNN (GRU)
- RNN (LSTM)
- RNN (SimpleRNN)
- SimpleRNN
- Show All Articles (15) Collapse Articles
 
 
- 
						 
		
- 
						 
		 			- 
						 
		- Dense
- Embedding
- AdditiveAttention
- Attention
- MultiHeadAttention
- Conv1D
- Conv2D
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- BatchNormalization
- LayerNormalization
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- Bidirectional
- GRU
- LSTM
- RNN (GRU)
- RNN (LSTM)
- RNN (SimpleRNN)
- SimpleRNN
- Show All Articles (15) Collapse Articles
 
- 
						 
		- Dense
- Embedding
- AdditiveAttention
- Attention
- MultiHeadAttention
- Conv1D
- Conv2D
- Conv3D
- ConvLSTM1D
- ConvLSTM2D
- ConvLSTM3D
- Conv1DTranspose
- Conv2DTranspose
- Conv3DTranspose
- DepthwiseConv2D
- SeparableConv1D
- SeparableConv2D
- BatchNormalization
- LayerNormalization
- PReLU 2D
- PReLU 3D
- PReLU 4D
- PReLU 5D
- Bidirectional
- GRU
- LSTM
- RNN (GRU)
- RNN (LSTM)
- RNN (SimpleRNN)
- SimpleRNN
- Show All Articles (15) Collapse Articles
 
 
- 
						 
		
- 
						 
		- Resume
- Accuracy
- BinaryAccuracy
- BinaryCrossentropy
- BinaryIoU
- CategoricalAccuracy
- CategoricalCrossentropy
- CategoricalHinge
- CosineSimilarity
- FalseNegatives
- FalsePositives
- Hinge
- Huber
- IoU
- KLDivergence
- LogCoshError
- Mean
- MeanAbsoluteError
- MeanAbsolutePercentageError
- MeanIoU
- MeanRelativeError
- MeanSquaredError
- MeanSquaredLogarithmicError
- MeanTensor
- OneHotIoU
- OneHotMeanIoU
- Poisson
- Precision
- PrecisionAtRecall
- Recall
- RecallAtPrecision
- RootMeanSquaredError
- SensitivityAtSpecificity
- SparseCategoricalAccuracy
- SparseCategoricalCrossentropy
- SparseTopKCategoricalAccuracy
- Specificity
- SpecificityAtSensitivity
- SquaredHinge
- Sum
- TopKCategoricalAccuracy
- TrueNegatives
- TruePositives
- Show All Articles (28) Collapse Articles
 
 
- 
						 
		 			
 
- 
							 			
QOrderedAttention
Description
Quantized version of simplified Multi-Head Self Attention(using int8 with specific matrix Layout). Multi-Head Self Attention that can be either unidirectional (like GPT-2) or bidirectional (like BERT). The mask_index input is optional. Besides raw attention mask with shape (batch_size, past_sequence_length + sequence_length) or (batch_size, sequence_length, past_sequence_length + sequence_length) with value 0 for masked and 1 otherwise, we also support other two formats: When input has right-side padding, mask_index is one dimension with shape (batch_size), where value of each element is the end position, or valid length of actual sequence excluding padding. When input has left-side padding, mask_index has shape (2 * batch_size), where the values are the exclusive end positions followed by the inclusive start positions. When unidirectional is 1, and each token only attend to previous tokens. For GPT-2, both past and present state are optional. Present state could appear in output even when past state is not in input. Current version does not support past/present, attention_bias and qkv_hidden_sizes.

Input parameters
 specified_outputs_name : array, this parameter lets you manually assign custom names to the output tensors of a node.
 specified_outputs_name : array, this parameter lets you manually assign custom names to the output tensors of a node.
 Graphs in : cluster, ONNX model architecture.
 Graphs in : cluster, ONNX model architecture.
 input (heterogeneous) – Q : object, 3D input tensor with shape (batch_size, sequence_length, input_hidden_size).
 input (heterogeneous) – Q : object, 3D input tensor with shape (batch_size, sequence_length, input_hidden_size). scale_input (heterogeneous) – S : object, scale of the input, scalar value (per tensor) currently.
 scale_input (heterogeneous) – S : object, scale of the input, scalar value (per tensor) currently. scale_Q_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization).
 scale_Q_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization). scale_K_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization).
 scale_K_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization). scale_V_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization).
 scale_V_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization). Q_weight (heterogeneous) – Q : object, 2D input tensor with shape (input_hidden_size, hidden_size), where hidden_size = num_heads * head_size.
 Q_weight (heterogeneous) – Q : object, 2D input tensor with shape (input_hidden_size, hidden_size), where hidden_size = num_heads * head_size. K_weight (heterogeneous) – Q : object, 2D input tensor with shape (input_hidden_size, hidden_size), where hidden_size = num_heads * head_size.
 K_weight (heterogeneous) – Q : object, 2D input tensor with shape (input_hidden_size, hidden_size), where hidden_size = num_heads * head_size. V_weight (heterogeneous) – Q : object, 2D input tensor with shape (input_hidden_size, hidden_size), where hidden_size = num_heads * head_size.
 V_weight (heterogeneous) – Q : object, 2D input tensor with shape (input_hidden_size, hidden_size), where hidden_size = num_heads * head_size. scale_Q_weight (heterogeneous) – S : object, scale of the weight (scalar for per-tensor quantization or 1-D of dims [hidden_size] for per-channel quantization).
 scale_Q_weight (heterogeneous) – S : object, scale of the weight (scalar for per-tensor quantization or 1-D of dims [hidden_size] for per-channel quantization). scale_K_weight (heterogeneous) – S : object, scale of the weight (scalar for per-tensor quantization or 1-D of dims [hidden_size] for per-channel quantization).
 scale_K_weight (heterogeneous) – S : object, scale of the weight (scalar for per-tensor quantization or 1-D of dims [hidden_size] for per-channel quantization). scale_V_weight (heterogeneous) – S : object, scale of the weight (scalar for per-tensor quantization or 1-D of dims [hidden_size] for per-channel quantization).
 scale_V_weight (heterogeneous) – S : object, scale of the weight (scalar for per-tensor quantization or 1-D of dims [hidden_size] for per-channel quantization). Q_bias (heterogeneous) – S : object, 1D input tensor with shape (hidden_size).
 Q_bias (heterogeneous) – S : object, 1D input tensor with shape (hidden_size). K_bias (heterogeneous) – S : object, 1D input tensor with shape (hidden_size).
 K_bias (heterogeneous) – S : object, 1D input tensor with shape (hidden_size). V_bias (heterogeneous) – S : object, 1D input tensor with shape (hidden_size).
 V_bias (heterogeneous) – S : object, 1D input tensor with shape (hidden_size). scale_QKT_gemm (optional, heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization).
 scale_QKT_gemm (optional, heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization). scale_QKT_softmax (optional, heterogeneous) – S : object, scale of the softmax result – scalar (per-tensor quantization).
 scale_QKT_softmax (optional, heterogeneous) – S : object, scale of the softmax result – scalar (per-tensor quantization). scale_values_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization). Also this is the output scale for the operator.
 scale_values_gemm (heterogeneous) – S : object, scale of the gemm – scalar (per-tensor quantization). Also this is the output scale for the operator. mask_index (optional, heterogeneous) – G : object, attention mask with shape (batch_size, 1, max_sequence_length, max_sequence_length), (batch_size, past_sequence_length + sequence_length)or (batch_size, sequence_length, past_sequence_length + sequence_length), or index with shape (batch_size) or (2 * batch_size).
 mask_index (optional, heterogeneous) – G : object, attention mask with shape (batch_size, 1, max_sequence_length, max_sequence_length), (batch_size, past_sequence_length + sequence_length)or (batch_size, sequence_length, past_sequence_length + sequence_length), or index with shape (batch_size) or (2 * batch_size). past (optional, heterogeneous) – Q : object, past state for key and value with shape (2, batch_size, num_heads, past_sequence_length, head_size).
 past (optional, heterogeneous) – Q : object, past state for key and value with shape (2, batch_size, num_heads, past_sequence_length, head_size). relative_position_bias (optional, heterogeneous) – S : object, additional add to QxK’ with shape (batch_size or 1, num_heads or 1, sequence_length, total_sequence_length).
 relative_position_bias (optional, heterogeneous) – S : object, additional add to QxK’ with shape (batch_size or 1, num_heads or 1, sequence_length, total_sequence_length).
 
			 Parameters : cluster,
 Parameters : cluster,
 num_heads : integer, number of attention heads.
 num_heads : integer, number of attention heads.
Default value “0”. order_input : integer, cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
 order_input : integer, cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition.
Default value “0”. order_output : integer, cublasLt order of global bias.
 order_output : integer, cublasLt order of global bias.
Default value “0”. order_weight : integer, cublasLt order of weight matrix.
 order_weight : integer, cublasLt order of weight matrix.
Default value “0”. qkv_hidden_sizes : array, hidden layer sizes of Q, K, V paths in Attention.
 qkv_hidden_sizes : array, hidden layer sizes of Q, K, V paths in Attention.
Default value “empty”. unidirectional : boolean, whether every token can only attend to previous tokens.
 unidirectional : boolean, whether every token can only attend to previous tokens.
Default value “False”. training? : boolean, whether the layer is in training mode (can store data for backward).
 training? : boolean, whether the layer is in training mode (can store data for backward).
Default value “True”. lda coeff : float, defines the coefficient by which the loss derivative will be multiplied before being sent to the previous layer (since during the backward run we go backwards).
 lda coeff : float, defines the coefficient by which the loss derivative will be multiplied before being sent to the previous layer (since during the backward run we go backwards).
Default value “1”.
 name (optional) : string, name of the node.
 name (optional) : string, name of the node.
 
			Output parameters
 output (heterogeneous) – Q : object, 3D output tensor with shape (batch_size, sequence_length, hidden_size).
 output (heterogeneous) – Q : object, 3D output tensor with shape (batch_size, sequence_length, hidden_size).
Type Constraints
Q in (tensor(int8)) : Constrain input and output types to int8 tensors.
S in (tensor(float)) : Constrain scales to float32 tensors.
G in (tensor(int32)) : Constrain to integer types.

