bgd.optimizers module
This module contains all the optimizers that are implemented.
Any optimizer implemented needs to inherit from
bgd.optimizers.Optimizer
and to implement its abstract
method (update
).
- class bgd.optimizers.AdamOptimizer(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)[source]
Bases:
bgd.optimizers.Optimizer
- Parameters
learning_rate (
float
, optional) – Constant steplength.beta_1 (
float
, optional) – Exponential decay rate of the moving average of the gradient.beta_2 (
float
, optional) – Exponential decay rate of the moving average of th.epsilon (
float
, optional) – Constant for numeric stability.
- step
Current iteration.
- Type
int
- moment_1
Last 1st moment vector.
- Type
np.ndarray
- moment_2
Last 2nd moment vector.
- Type
np.ndarray
References
- ADAM: A Method For Stochastic Optimization
Diederik P. Kingma and Jimmy Lei Ba https://arxiv.org/pdf/1412.6980.pdf
- class bgd.optimizers.LBFGS(m=10, epsilon=0.01, first_order_optimizer=<bgd.optimizers.AdamOptimizer object>)[source]
Bases:
bgd.optimizers.Optimizer
Quasi-newtonian optimizer with limited memory.
- Parameters
m (
int
, optional) – Memory size.epsilon (
float
, optional) – Constant for numeric stability.first_order_optimizer (
Optimizer
) – First order optimizer used to approximate the Hessian product.
- k
Current iteration of L-BFGS.
- Type
int
- previous_grad
Gradient vector at iteration k-1.
- Type
np.ndarray
- y
List of m last gradient differences. y_t = grad_{t+1} - grad_t
- Type
list
- s
List of m last update vectors. s_t = H * grad * steplength, where H is the Hessian matrix.
- Type
list
- alpha
List of m last alpha coefficients alpha_i = rho_i * s_i.T * grad, where rho_i = 1. / (s_i.T * y_i).
- Type
list
References
- Updating Quasi-Newton Matrices with Limited Storage
Nocedal, J. (1980) Mathematics of Computation. 35 (151): 773–782
- class bgd.optimizers.MomentumOptimizer(learning_rate=0.005, momentum=0.9)[source]
Bases:
bgd.optimizers.Optimizer
Simple first order optimizer with momentum support.
- Parameters
learning_rate (
float
, optional) – Constant steplength.momentum (
float
, optional) – Persistence of previous gradient vectors. Old vectors are re-used to compute the new search direction, with respect to the momentum value.
- previous_grad
Gradient vector at previous iteration.
- Type
np.ndarray
- class bgd.optimizers.Optimizer[source]
Bases:
object
Base class for first order and second order optimizers.
- gradient_fragments
List of tuples of NumPy arrays, where the number of tuples is equal to the number of learnable layers in the network.
- Type
list
- update(F)[source]
Computes best move in the parameter space at current iteration using optimization techniques. All gradient fragments added to gradient_fragments are flattened and concatenated to get the batch gradient vector of the whole network. The optimized delta vector is then split into several fragments of original shapes. Finally, those delta fragments are used to update the parameters of each layer, individually.