UpdatedSeptember 5, 2025

BatchNormalization

Description

Carries out batch normalization as described in the paper https://arxiv.org/abs/1502.03167.

Depending on the mode it is being run, There are five required inputs ‘X’, ‘scale’, ‘B’, ‘input_mean’ and ‘input_var’. Note that ‘input_mean’ and ‘input_var’ are expected to be the estimated statistics in inference mode (training_mode=False, default), and the running statistics in training mode (training_mode=True). There are multiple cases for the number of outputs, which we list below:

Output case #1: Y, running_mean, running_var (training_mode=True)
Output case #2: Y (training_mode=False)

When training_mode=False, extra outputs are invalid. The outputs are updated as follows when training_mode=True:

running_mean = input_mean * momentum + current_mean * (1 - momentum)
running_var = input_var * momentum + current_var * (1 - momentum)

Y = (X - current_mean) / sqrt(current_var + epsilon) * scale + B

where:

current_mean = ReduceMean(X, axis=all_except_channel_index)
current_var =  ReduceVar(X, axis=all_except_channel_index)

Notice that ReduceVar refers to the population variance, and it equals to sum(sqrd(x_i - x_avg)) / N where N is the population size (this formula does not use sample size N - 1).

The computation of ReduceMean and ReduceVar uses float to avoid overflow for float16 inputs.

When training_mode=False:

Y = (X - input_mean) / sqrt(input_var + epsilon) * scale + B

For previous (depreciated) non-spatial cases, implementors are suggested to flatten the input shape to (N x C * D1 * D2 * … * Dn) before a BatchNormalization Op. This operator has optional inputs/outputs. See ONNX IR for more details about the representation of optional arguments. An empty string may be used in the place of an actual argument’s name to indicate a missing argument. Trailing optional arguments (those not followed by an argument that is present) may also be simply omitted.

Input parameters

specified_outputs_name : array, this parameter lets you manually assign custom names to the output tensors of a node.

Graphs in : cluster, ONNX model architecture.

X (heterogeneous) – T : object, input data tensor from the previous operator; dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size, C is the number of channels. Statistics are computed for every channel of C over N and D1 to Dn dimensions. For image data, input dimensions become (N x C x H x W). The op also accepts single dimension input of size N in which case C is assumed to be 1.
scale (heterogeneous) – T1 : object, scale tensor of shape ©.
B (heterogeneous) – T1 : object, bias tensor of shape ©.
input_mean (heterogeneous) – T2 : object, running (training) or estimated (testing) mean tensor of shape ©.
input_var (heterogeneous) – T2 : object, running (training) or estimated (testing) variance tensor of shape ©.

Parameters : cluster,

epsilon : float, the epsilon value to use to avoid division by zero.
Default value “1e-05”.
momentum : float, factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 – momentum).
Default value “1”.
training_mode : boolean, whether the layer is in training mode (can store data for backward).
Default value “False”.
lda coeff : float, defines the coefficient by which the loss derivative will be multiplied before being sent to the previous layer (since during the backward run we go backwards).
Default value “1”.

name (optional) : string, name of the node.

Output parameters

Graphs out : cluster, ONNX model architecture.

Y (heterogeneous) – T : object, the output tensor of the same shape as X.
running_mean (optional, heterogeneous) – T2 : object, the running mean after the BatchNormalization operator.
running_var (optional, heterogeneous) – T2 : object, the running variance after the BatchNormalization operator. This op uses the population size (N) for calculating variance, and not the sample size N-1.

Type Constraints

T in (tensor(bfloat16), tensor(double), tensor(float), tensor(float16)) : Constrain input and output types to float tensors.

T1 in (tensor(bfloat16), tensor(double), tensor(float), tensor(float16)) : Constrain scale and bias types to float tensors.

T2 in (tensor(bfloat16), tensor(double), tensor(float), tensor(float16)) : Constrain mean and variance types to float tensors.

Example

All these exemples are snippets PNG, you can drop these Snippet onto the block diagram and get the depicted code added to your VI (Do not forget to install Deep Learning library to run it).

Quick start

Installation guide

Execution providers

General

Iconography

API

Architecture

Layers

Nodes

Nodes

Activation

Mono Input

Parameters

Graph Function

Graph

File

Get & Set

Runtime

Create

Inference

Training

Academic Training

Exec

Inference

Input

Reinforcement Learning

Advanced

Add Weight

Index

Name

Format Weight

Get Weight

Index

Name

Set Weight

More

Layers parameters

Nodes Parameters