Same as Adagrad, but, the running average is limited to a fixed window.

Moving window average is defined as:

The updates for AdaDelta will then be as follows:

Where:

  1. all other equations and parameters are as explained in Adagrad