RMSProp
Same as Adagrad, but, the running average is limited to a fixed window. Developed around the same time as that of AdaDelta.
Where:
- all other equations and parameters are as explained in Adagrad
See also: gradient-descent nesterov-accelerated-gradient adagrad
References: