Weight Initialization Technique | Neural Network

To get a neural network better and optimized results, it is very important to use the right weight for each neuron. Weights selection plays a very important role during the training of a neural network. If the weights are too small then it will cause vanishing gradient problem and if the weights are too high then it will cause exploding gradient problem.

In this blog, we will study about the different weight initialization technique used in the neural network. This article has been written under the assumption that the reader is already familiar with the concept of neural network, weight, bias, activation functions, forward and backward propagation etc.

Key Points we should consider before initializing weights:

  1. Weights should be small
  2. Weights should not be same
  3. Weights should have good variance

Glorot Initialization

Glorot Initialization is also known as Xavier Initialization. The idea is to initialize each weight with a small Gaussian value with mean = 0.0 and variance based on the fan-in and fan-out of the weight.

Looking at the logistic activation function, you can see that when inputs become large(negative or positive), the function saturates at 0 or 1, with a derivative extremely close to 0.

Glorot and Bengio proposed a good solution that has proven to work very well in practice: the connection weights of layer must be initialized randomly as normal distribution with mean 0 and variance as:

or a uniform distribution between -r and +r:


LeCun Initialization

LeCun initialization is equivalent to Glorot initialization when,

So, now the equation will becomes for normal distribution with mean 0 and variance will be,

And, for uniform distribution, ranges between -r and +r where,

This trick helps in speed up training considerably, and this led to the the success of Deep Learning.

He Initialization

He Initialization, is an initialization method for neural networks that takes into account the non-linearity of activation functions, such as ReLU activations.

So, the equation for Normal distribution will be,

Initialization parameters for each type of activation function


In this tutorial, you discovered weight initialization techniques for deep learning neural networks.

Do you have any questions?

You can mail to me to my Email-ID imrk2k17@gmail.com