Optimization

1. Gradient descent optimization Gradient-based methods make use of the gradient information to adjust the parameters. Among them, gradient descent can be the simplest. Gradient descent makes the parameters to walk a small step in the direction of the negative gradient. $$ \mathbf{w}^{\tau + 1} = \mathbf{w}^{\tau} - \eta \nabla_{\mathbf{w}^{\tau}} E \tag{1.1} $$ where $\eta, \tau, E$ label learning rate ($\eta > 0$), the iteration step and the loss function....