Gary Gao

The vanishing gradient problem is a historic problem tied with training deep neural networks.

When the gradient propagates backwards through the backward press, it could be scaled by a multiplicative factor such as a Signoid. When it is applied iteratively across the layers, the multiplicative factor could be scaled exponentially - causing the gradient to explode or shrink.

Gradient Norm Clipping

Gradient Norm Clipping is one way to prevent this problem from happening. Essentially, Gradient Norm Clipping just rescales the gradient when it becomes too big or small.

The Vanishing Gradient Problem

Gradient Norm Clipping