gradient descent

scroll ↓ to Resources

Note

  • Data normalization positively affects the convergence speed of the algorithm because of the more rounded shape of the function surface we optimize

Steps of the algorithm

  1. Initialize trainable parameters
  2. Compute the gradient of the loss function
  3. Update : where is the learning rate hyperparameters
  4. Decide if it is time to stop or continue
    • Stopping decision can be done
      • because of limited computational budget, number of iterations or time allowed
      • if the value of selected ML metric on the validation set has stabilized and not changing much
  5. if continue, go to step 3

Resources


table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"