Going Deeper into Back-Propagation
1. Gradient descent optimization Gradient-based methods make use of the gradient information to adjust the parameters. Among them, gradient descent can be the simplest. Gradient descent makes the parameters to walk a small step in the direction of the negative gradient. $$ \boldsymbol{w}^{\tau + 1} = \boldsymbol{w}^{\tau} - \eta \nabla_{\boldsymbol{w}^{\tau}} E \tag{1.1} $$ where $\eta, \tau, E$ label learning rate ($\eta > 0$), the iteration step and the loss function. Wait!...