AdaDelta
Same as Adagrad, but, the running average is limited to a fixed window.
Moving window average is defined as:
The updates for AdaDelta will then be as follows:
Where:
- all other equations and parameters are as explained in Adagrad
See also: gradient-descent nesterov-accelerated-gradient adagrad
References: