Same as Adagrad, but, the running average is limited to a fixed window. Developed around the same time as that of AdaDelta.

Where:

  1. all other equations and parameters are as explained in Adagrad