bgd.optimizers module

This module contains all the optimizers that are implemented. Any optimizer implemented needs to inherit from bgd.optimizers.Optimizer and to implement its abstract method (update).

class bgd.optimizers.AdamOptimizer(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)[source]

Bases: bgd.optimizers.Optimizer

Parameters

learning_rate (float, optional) – Constant steplength.
beta_1 (float, optional) – Exponential decay rate of the moving average of the gradient.
beta_2 (float, optional) – Exponential decay rate of the moving average of th.
epsilon (float, optional) – Constant for numeric stability.

step

Current iteration.

Type: int

moment_1

Last 1st moment vector.

Type: np.ndarray

moment_2

Last 2nd moment vector.

Type: np.ndarray

References

ADAM: A Method For Stochastic Optimization: Diederik P. Kingma and Jimmy Lei Ba https://arxiv.org/pdf/1412.6980.pdf

class bgd.optimizers.LBFGS(m=10, epsilon=0.01, first_order_optimizer=<bgd.optimizers.AdamOptimizer object>)[source]

Bases: bgd.optimizers.Optimizer

Quasi-newtonian optimizer with limited memory.

Parameters

m (int, optional) – Memory size.
epsilon (float, optional) – Constant for numeric stability.
first_order_optimizer (Optimizer) – First order optimizer used to approximate the Hessian product.

k

Current iteration of L-BFGS.

Type: int

previous_grad

Gradient vector at iteration k-1.

Type: np.ndarray

y

List of m last gradient differences. y_t = grad_{t+1} - grad_t

Type: list

s

List of m last update vectors. s_t = H * grad * steplength, where H is the Hessian matrix.

Type: list

alpha

List of m last alpha coefficients alpha_i = rho_i * s_i.T * grad, where rho_i = 1. / (s_i.T * y_i).

Type: list

References

Updating Quasi-Newton Matrices with Limited Storage: Nocedal, J. (1980) Mathematics of Computation. 35 (151): 773–782

class bgd.optimizers.MomentumOptimizer(learning_rate=0.005, momentum=0.9)[source]

Bases: bgd.optimizers.Optimizer

Simple first order optimizer with momentum support.

Parameters

learning_rate (float, optional) – Constant steplength.
momentum (float, optional) – Persistence of previous gradient vectors. Old vectors are re-used to compute the new search direction, with respect to the momentum value.

previous_grad

Gradient vector at previous iteration.

Type: np.ndarray

class bgd.optimizers.Optimizer[source]

Bases: object

Base class for first order and second order optimizers.

gradient_fragments

List of tuples of NumPy arrays, where the number of tuples is equal to the number of learnable layers in the network.

Type: list

update(F)[source]: Computes best move in the parameter space at current iteration using optimization techniques. All gradient fragments added to gradient_fragments are flattened and concatenated to get the batch gradient vector of the whole network. The optimized delta vector is then split into several fragments of original shapes. Finally, those delta fragments are used to update the parameters of each layer, individually.