This performs gradient descent on each of the samples in the dataset.


  1. = i’th input sample and label in the dataset
  2. all other equations and parameters are as explained in gradient descent


  1. Runs fast
  2. Similar convergence guarantees as in batch gradient descent, though one should gradually anneal over time to achieve this.
  3. Supports online learning


  1. Creates high variance updates to the parameters