Algorithms¶

Euler-Maruyama (SGLD)¶

The classic Euler-Maruyama scheme, also known as Stochastic Gradient Langevin Dynamics (SGLD) when a stochastic approximation is used as the gradient.

This scheme uses the gradient of the posterior to move through the configuration space, without Metropolis correction. It produces a biased trajectory (though convergent as the learning rate decreases), but is a good first-choice in high dimension for general distributions.

Usage¶

Set algo="sgld" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

grad :: The (potentially approximate) gradient of the log posterior.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

This scheme takes no additional parameters.

References

This scheme is extensively studied, it should be featured heavily in any good statistics textbook.

Numerical Solution of Stochastic Differential Equations by Eckhard Platen and Peter Kloeden, Springer (1992)
Welling, Max, and Yee W. Teh. “Bayesian learning via stochastic gradient Langevin dynamics.” Proceedings of the 28th international conference on machine learning (ICML-11). 2011 https://www.ics.uci.edu/~welling/publications/papers/stoclangevin_v6.pdf

Leimkuhler-Matthews (LM)¶

A biased MCMC sampling method that provides significant accuracy improvements over the Euler-Maruyama scheme, without extra gradient evaluations. Gives exact sampling for Gaussian target distributions.

Usage¶

Set algo="LM" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

grad :: The gradient of the log posterior.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

This scheme takes no additional parameters.

References

Molecular Dynamics: With Deterministic and Stochastic Numerical Methods by Benedict Leimkuhler and Charles Matthews, Springer (2017)

Random Walk Metropolis¶

One of the simplest and widely used MCMC sampling algorithms, the Metropolis algorithm proposes a new point using an isotropic Gaussian distribution and accepts with a probability preserving the detailed balance condition.

The Metropolis condition ensures this method is unbiased, but can be extremely slow to converge (particularly in high dimension).

Note

The acceptance rate for the scheme can be checked by calling the acceptance_rate function on the sampler. A good choice is to choose a learning rate so that acceptance is around 0.234.

Usage¶

Set algo="RWMetropolis" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

llh :: The value of the log posterior.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

This scheme takes no additional parameters.

BAOAB¶

A biased Langevin sampling routine that incorporates momentum to expedite sampling over barriers. A damping parameter can be chosen to customize the mixing with the heat bath. In the limit of infinite damping this method becomes the LM scheme. Similarly, the BAOAB method is exact for Gaussian distributions.

Usage¶

Set algo="baoab" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

grad :: The gradient of the log posterior.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

gpositive float
(Default 1.0) The isotropic friction constant (gamma) used in the Langevin dynamics.

References

Molecular Dynamics: With Deterministic and Stochastic Numerical Methods by Benedict Leimkuhler and Charles Matthews, Springer (2017)

Hybrid Monte Carlo (HMC)¶

Hybrid (or Hamiltonian) Monte Carlo takes steps of constant-energy dynamics after redrawing the kinetic energy. The symplectic scheme used for the constant-energy integration allows for very favorable behavior with dimension, in spite of the Metropolization step.

Note

The acceptance rate for the scheme can be checked by calling the acceptance_rate function on the sampler. A good choice is to choose a learning rate so that acceptance is between 0.7 and 0.8.

Usage¶

Set algo="hmc" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

llh :: The log posterior value.
grad :: The gradient of the log posterior.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

Tpositive int
(Default 5) The number of constant-energy substeps to take between Metropolis tests.

References

Hybrid Monte Carlo; Simon Duane, A.D. Kennedy, Brian J. Pendleton, Duncan Roweth (1987) https://doi.org/10.1016/0370-2693(87)91197-X

BADODAB¶

A scheme that adaptively learns a constant damping term required to correct the distribution when using stochastic gradients.

Usage¶

Set algo="badodab" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

grad :: The gradient of the log posterior.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

gpositive float
(Default 1.0) The damping parameter used in the Langevin dynamics.
mupositive float
(Default 1.0) The precision (reciprocal variance) of the auxillary variables distribution.

References

Leimkuhler, Benedict, and Xiaocheng Shang. “Adaptive thermostats for noisy gradient systems.” SIAM Journal on Scientific Computing 38.2 (2016)

Stochastic Gradient HMC¶

A biased scheme for use with stochastic gradients. Damps the dynamics to compensate for additional noise coming from a stochastic gradient.

The covariance of the stochastic gradient term, denoted B, is calculated at every step using a user-supplied function that takes as input the current value of B and the grad_data terms, if supplied. For reasons of ergodicity (i.e. to ensure proper mixing behavior in the method) a matrix C>=B*h/2 must be given, where h is the learning rate or step size. In this implementation we use scale the identity matrix scaled by a parameter g as the C matrix. The large damping parameter required can mean that this scheme can easily become unstable and slow to converge if the gradient noise is large.

Usage¶

Set algo="sghmc" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

grad :: The gradient of the log posterior.
grad_data :: The gradient of the log likelihood with respect to the current position, itemized over the data batch. Expects a (N,D) array, where the position space is N dimensional and the batch size is D.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

gpositive float
(Default 1.0) The damping parameter used in the Langevin dynamics. Must be large enough to dominate the gradient noise.
auto_frictionbool
(Default True) Automatically increase the friction g if the target covariance matrix is not positive-definite.
Bnumpy array
(Default 0) The initial value to use as the damping matrix for the stochastic gradient.
grad_covfunction
A function estimating the covariance of the stochastic gradient, with signature grad_cov(B, grad_data). The current covariance estimate B and the grad_data term are supplied to the function, and it outputs an (N,N) numpy array as the new B matrix. If a function is not specified, the default behavior is to use the last estimate of B without any change.

References

Chen, Tianqi, Emily Fox, and Carlos Guestrin. “Stochastic gradient hamiltonian monte carlo.” International conference on machine learning. (2014)

Racecar¶

A general purpose sampling algorithm, specifically designed for efficient sampling in systems with a stochastic gradient.

Usage¶

Set algo="racecar" in The sampler module.

Requires¶

The llh function needs to output the following dictionary keys:

grad :: The gradient of the log posterior.
grad_data :: (optional) The gradient of the log likelihood with respect to the current position, itemized over the data batch. Expects a (N,D) array, where the position space is N dimensional and the batch size is D.

Params¶

The behavior of the sampler can be customized by including the following arguments in the sampler’s params dict.

gpositive float
(Default 1.0) The damping parameter used in the Langevin dynamics. Must be large enough to dominate the force noise.
mupositive float
(Default 1.0) The precision (reciprocal variance) of the auxillary variables distribution.
estimate_basisbool
(Default True) If set to true and grad_data is given, then the method will approximate the dominant directions for the gradient noise.
estimate_timefloat
(Default 0.5) The time in seconds to spend initially estimating the basis vectors by generating new grad-data. By default it will spend 0.5s building this at setup. Set it to 0 to just use the initial grad-data for the guess.
basis_sizepositive int
(Default Ndim) The number of basis vectors to use. A smaller number can improve efficiency in high dimensions, but will remove potential accuracy.
basisNxK numpy array
(optional) The rank-K array of basis vectors to use for the damping of the noisy gradient.

Note

The Racecar scheme damps the system in directions specified by basis vectors. If a basis is not given, and estimate_basis is True, then these vectors are estimated from the grad_data value if it is outputted from the llh function. If there is no grad_data, then it will the standard (diagonal) basis instead.