bgd.optimizers module

This module contains all the optimizers that are implemented. Any optimizer implemented needs to inherit from bgd.optimizers.Optimizer and to implement its abstract method (update).

class bgd.optimizers.AdamOptimizer(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08)[source]

Bases: bgd.optimizers.Optimizer

Parameters
  • learning_rate (float, optional) – Constant steplength.

  • beta_1 (float, optional) – Exponential decay rate of the moving average of the gradient.

  • beta_2 (float, optional) – Exponential decay rate of the moving average of th.

  • epsilon (float, optional) – Constant for numeric stability.

step

Current iteration.

Type

int

moment_1

Last 1st moment vector.

Type

np.ndarray

moment_2

Last 2nd moment vector.

Type

np.ndarray

References

ADAM: A Method For Stochastic Optimization

Diederik P. Kingma and Jimmy Lei Ba https://arxiv.org/pdf/1412.6980.pdf

class bgd.optimizers.LBFGS(m=10, epsilon=0.01, first_order_optimizer=<bgd.optimizers.AdamOptimizer object>)[source]

Bases: bgd.optimizers.Optimizer

Quasi-newtonian optimizer with limited memory.

Parameters
  • m (int, optional) – Memory size.

  • epsilon (float, optional) – Constant for numeric stability.

  • first_order_optimizer (Optimizer) – First order optimizer used to approximate the Hessian product.

k

Current iteration of L-BFGS.

Type

int

previous_grad

Gradient vector at iteration k-1.

Type

np.ndarray

y

List of m last gradient differences. y_t = grad_{t+1} - grad_t

Type

list

s

List of m last update vectors. s_t = H * grad * steplength, where H is the Hessian matrix.

Type

list

alpha

List of m last alpha coefficients alpha_i = rho_i * s_i.T * grad, where rho_i = 1. / (s_i.T * y_i).

Type

list

References

Updating Quasi-Newton Matrices with Limited Storage

Nocedal, J. (1980) Mathematics of Computation. 35 (151): 773–782

class bgd.optimizers.MomentumOptimizer(learning_rate=0.005, momentum=0.9)[source]

Bases: bgd.optimizers.Optimizer

Simple first order optimizer with momentum support.

Parameters
  • learning_rate (float, optional) – Constant steplength.

  • momentum (float, optional) – Persistence of previous gradient vectors. Old vectors are re-used to compute the new search direction, with respect to the momentum value.

previous_grad

Gradient vector at previous iteration.

Type

np.ndarray

class bgd.optimizers.Optimizer[source]

Bases: object

Base class for first order and second order optimizers.

gradient_fragments

List of tuples of NumPy arrays, where the number of tuples is equal to the number of learnable layers in the network.

Type

list

update(F)[source]

Computes best move in the parameter space at current iteration using optimization techniques. All gradient fragments added to gradient_fragments are flattened and concatenated to get the batch gradient vector of the whole network. The optimized delta vector is then split into several fragments of original shapes. Finally, those delta fragments are used to update the parameters of each layer, individually.