gradient descent

_{scroll ↓ to Resources}

Note

Data normalization positively affects the convergence speed of the algorithm because of the more rounded shape of the function surface we optimize

Initialize trainable parameters $θ_{0} = (W, b)$
Compute the gradient of the loss function $Δ E_{θ_{0}}$
Update $θ$ : $θ_{n + 1} = θ_{n} - γ Δ E_{θ_{n}}$ where $γ$ is the learning rate hyperparameters
Decide if it is time to stop or continue
- Stopping decision can be done
  - because of limited computational budget, number of iterations or time allowed
  - if the value of selected ML metric on the validation set has stabilized and not changing much
if continue, go to step 3

table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"