Batch Normalization

Batch Normalization is a technique used in neural networks to improve training speed, stability, and performance. It was introduced to address the issue of internal covariate shift, which occurs when the distributions of each layer's inputs change during training, making it difficult for the network to converge.

How Batch Normalization Works

Normalization Process: Batch Normalization standardizes the inputs to a layer for each batch. This involves adjusting the inputs of the layer to have a mean of zero and a standard deviation of one. Mathematically, this is done by subtracting the batch mean and dividing by the batch standard deviation.
Learnable Parameters: After normalization, each normalized value is scaled and shifted using the learnable parameters γ (scale) and β (shift). This step ensures that the network retains the ability to represent the identity transformation.
Position in Layers: Batch Normalization is typically applied after the linear part of the layer (e.g., after a convolutional or fully connected layer) but before the non-linear activation function (like ReLU).
Improvements in Training: By normalizing the inputs, Batch Normalization reduces internal covariate shift, allowing for higher learning rates and reducing the sensitivity to weight initialization. This typically results in faster convergence during training.
Regularization: Batch Normalization can indirectly help in reducing overfitting in neural networks by stabilizing the learning process, although its primary purpose is not regularization. However, the noise introduced during the mini-batch normalization process can have a regularization effect, as it adds a slight amount of randomness to the activations within each layer, similar to other regularization techniques.

Note

The Batch Normalization layer individually analyzes the input distribution for each neuron.

Every neuron possesses distinct γ and β parameters, which are learned through the Batch Normalization process.

In a deep neural network without Batch Normalization, layers deeper in the network have to adapt to the constantly changing distribution of inputs, which can slow down training and make it harder for the network to converge. With Batch Normalization, these input distributions are more stable, which makes it easier for each layer to learn.

Implementing BatchNorm in Keras

Adding Batch Normalization to a TensorFlow model is straightforward using Keras:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, BatchNormalization

model = Sequential()
model.add(Input(shape=(num_input_features,)))
model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid'))

Note

The activation function should be used following Batch Normalization. Therefore, the correct order is Dense - BatchNormalization - Activation.

1. What primary issues does Batch Normalization address in neural networks?

2. What are the roles of γ (scale) and β (shift) in Batch Normalization?

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 5

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Neural Networks with TensorFlow