What is one major drawback of using the sigmoid activation function in deep networks?

Prone to vanishing gradient
Limited to binary classification
Efficiently handles negative values
Non-smooth gradient behavior

One major drawback of using the sigmoid activation function in deep networks is its susceptibility to the vanishing gradient problem. This can hinder training deep networks as the gradient becomes very small for extreme values, slowing down learning.

Add your answer