rmsprop¶
This module provides an implementation of rmsprop.
-
class
climin.rmsprop.
RmsProp
(wrt, fprime, step_rate, decay=0.9, momentum=0, step_adapt=False, step_rate_min=0, step_rate_max=inf, args=None)¶ RmsProp optimizer.
RmsProp [tieleman2012rmsprop] is an optimizer that utilizes the magnitude of recent gradients to normalize the gradients. We always keep a moving average over the root mean squared (hence Rms) gradients, by which we divide the current gradient. Let \(f'(\theta_t)\) be the derivative of the loss with respect to the parameters at time step \(t\). In its basic form, given a step rate \(\alpha\) and a decay term \(\gamma\) we perform the following updates:
\[\begin{split}r_t &=& (1 - \gamma)~f'(\theta_t)^2 + \gamma r_{t-1} , \\ v_{t+1} &=& {\alpha \over \sqrt{r_t}} f'(\theta_t), \\ \theta_{t+1} &=& \theta_t - v_{t+1}.\end{split}\]In some cases, adding a momentum term \(\beta\) is beneficial. Here, Nesterov momentum is used:
\[\begin{split}\theta_{t+{1 \over 2}} &=& \theta_t - \beta v_t, \\ r_t &=& (1 - \gamma)~f'(\theta_{t + {1 \over 2}})^2 + \gamma r_{t-1}, \\ v_{t+1} &=& \beta v_t + {\alpha \over \sqrt{r_t}} f'(\theta_{t + {1 \over 2}}), \\ \theta_{t+1} &=& \theta_t - v_{t+1}\end{split}\]Additionally, this implementation has adaptable step rates. As soon as the components of the step and the momentum point into the same direction (thus have the same sign) the step rate for that parameter is multiplied with
1 + step_adapt
. Otherwise, it is multiplied with1 - step_adapt
. In any way, the minimum and maximum step ratesstep_rate_min
andstep_rate_max
are respected and exceeding values truncated to it.RmsProp has several advantages; for one, it is a very robust optimizer which has pseudo curvature information. Additionally, it can deal with stochastic objectives very nicely, making it applicable to mini batch learning.
Note
Works with gnumpy.
[tieleman2012rmsprop] Tieleman, T. and Hinton, G. (2012), Lecture 6.5 - rmsprop, COURSERA: Neural Networks for Machine Learning Attributes
wrt (array_like) Current solution to the problem. Can be given as a first argument to .fprime
.fprime (Callable) First derivative of the objective function. Returns an array of the same shape as .wrt
.step_rate (float or array_like) Step rate of the optimizer. If an array, means that per parameter step rates are used. momentum (float or array_like) Momentum of the optimizer. If an array, means that per parameter momentums are used. step_adapt (float or bool) Constant to adapt step rates. If False, step rate adaption is not done. step_rate_min (float, optional, default 0) When adapting step rates, do not move below this value. step_rate_max (float, optional, default inf) When adapting step rates, do not move above this value. Methods
-
__init__
(wrt, fprime, step_rate, decay=0.9, momentum=0, step_adapt=False, step_rate_min=0, step_rate_max=inf, args=None)¶ Create an RmsProp object.
Parameters: wrt : array_like
Array that represents the solution. Will be operated upon in place.
fprime
should accept this array as a first argument.fprime : callable
step_rate : float or array_like
Step rate to use during optimization. Can be given as a single scalar value or as an array for a different step rate of each parameter of the problem.
decay : float
Decay parameter for the moving average. Must lie in [0, 1) where lower numbers means a shorter “memory”.
momentum : float or array_like
Momentum to use during optimization. Can be specified analoguously (but independent of) step rate.
step_adapt : float or bool
Constant to adapt step rates. If False, step rate adaption is not done.
step_rate_min : float, optional, default 0
When adapting step rates, do not move below this value.
step_rate_max : float, optional, default inf
When adapting step rates, do not move above this value.
args : iterable
Iterator over arguments which
fprime
will be called with.
-