While training a deep neural network, you notice that the gradients are becoming extremely small, making the weights of the initial layers change very slowly. What might be the primary cause of this issue?

Overfitting
Vanishing gradients due to the use of deep activation functions
Underfitting due to a small learning rate
Excessive learning rate causing divergence

The primary cause of extremely small gradients in deep neural networks is vanishing gradients, often caused by the use of deep activation functions like sigmoid or tanh. As gradients propagate backward through many layers, they tend to approach zero, which can slow down training. Proper initialization techniques and activation functions like ReLU can help mitigate this issue.

Add your answer

Facebook Twitter Linkedin Reddit Pinterest

Data Science Quiz

Quiz

Which CNN architecture is known for its residual connections and improved training performance?

For machine learning model deployment in a production environment, which tool or language is often integrated due to its performance and scalability?

While training a deep neural network, you notice that the gradients are becoming extremely small, making the weights of the initial layers change very slowly. What might be the primary cause of this issue?

Related Quiz

Leave a commentCancel