Welcome to margarine’s documentation!
margarine: Posterior Sampling and Marginal Bayesian Statistics
Introduction
- margarine:
Marginal Bayesian Statistics
- Authors:
Harry T.J. Bevins
- Version:
1.2.8
- Homepage:
- Documentation:
Installation
The software should be installed via the git repository using the following commands in the terminal
git clone https://github.com/htjb/margarine.git # or the equivalent using ssh keys
cd margarine
python setup.py install --user
or via a pip install with
pip install margarine
Note that the pip install is not always the most up to date version of the code.
Details/Examples
margarine is designed to make the calculation of marginal bayesian statistics feasible given a set of samples from an MCMC or nested sampling run.
An example of how to use the code can be found on the github in the jupyter notebook notebook/Tutorial.ipynb or alternatively at here.
Documentation
The documentation is available at: https://margarine.readthedocs.io/
To compile it locally you can run
cd docs
sphinx-build source html-build
after cloning the repo and installing the relevant packages with
pip install sphinx numpydoc sphinx_rtd_theme
Licence and Citation
The software is available on the MIT licence.
If you use the code for academic purposes we request that you cite the following paper and the MaxEnt22 proceedings If you use the clustering implementation please cite the following preprint. You can use the following bibtex
@ARTICLE{2023MNRAS.526.4613B,
author = {{Bevins}, Harry T.~J. and {Handley}, William J. and {Lemos}, Pablo and {Sims}, Peter H. and {de Lera Acedo}, Eloy and {Fialkov}, Anastasia and {Alsing}, Justin},
title = "{Marginal post-processing of Bayesian inference products with normalizing flows and kernel density estimators}",
journal = {\mnras},
keywords = {methods: data analysis, methods: statistical, cosmic background radiation, dark ages, reionization, first stars, Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - Cosmology and Nongalactic Astrophysics, Computer Science - Machine Learning},
year = 2023,
month = dec,
volume = {526},
number = {3},
pages = {4613-4626},
doi = {10.1093/mnras/stad2997},
archivePrefix = {arXiv},
eprint = {2205.12841},
primaryClass = {astro-ph.IM},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023MNRAS.526.4613B},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
and
@ARTICLE{2022arXiv220711457B,
author = {{Bevins}, Harry and {Handley}, Will and {Lemos}, Pablo and {Sims}, Peter and {de Lera Acedo}, Eloy and {Fialkov}, Anastasia},
title = "{Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology}",
journal = {arXiv e-prints},
keywords = {Astrophysics - Cosmology and Nongalactic Astrophysics, Astrophysics - Instrumentation and Methods for Astrophysics},
year = 2022,
month = jul,
eid = {arXiv:2207.11457},
pages = {arXiv:2207.11457},
archivePrefix = {arXiv},
eprint = {2207.11457},
primaryClass = {astro-ph.CO},
adsurl = {https://ui.adsabs.harvard.edu/abs/2022arXiv220711457B},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
and
@ARTICLE{2023arXiv230502930B,
author = {{Bevins}, Harry and {Handley}, Will},
title = "{Piecewise Normalizing Flows}",
journal = {arXiv e-prints},
keywords = {Statistics - Machine Learning, Computer Science - Machine Learning},
year = 2023,
month = may,
eid = {arXiv:2305.02930},
pages = {arXiv:2305.02930},
doi = {10.48550/arXiv.2305.02930},
archivePrefix = {arXiv},
eprint = {2305.02930},
primaryClass = {stat.ML},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv230502930B},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
Requirements
The code requires the following packages to run:
To compile the documentation locally you will need:
To run the test suit you will need:
Contributing
Contributions and suggestions for areas of development are welcome and can be made by opening a issue to report a bug or propose a new feature for discussion.
Tutorial
A tutorial can be found on the github in the jupyter notebook notebook/Tutorial.ipynb or alternatively at here.
margarine Functions
- class margarine.maf.MAF(theta, **kwargs)[source]
This class is used to train, load and call instances of a bijector built from a series of autoregressive neural networks.
Parameters:
- theta: numpy array or anesthetic.samples
- The samples from the probability distribution that we require the MAF to learn. This can either be a numpy array or an anesthetic NestedSamples or MCMCSamples object.
kwargs:
- weights: numpy array / default=np.ones(len(theta))
- The weights associated with the samples above. If an anesthetic NestedSamples or MCMCSamples object is passed the code draws the weights from this.
- number_networks: int / default = 6
- The bijector is built by chaining a series of autoregressive neural networks together and this parameter is used to determine how many networks there are in the chain.
- learning_rate: float / default = 1e-3
- The learning rate determines the ‘step size’ of the optimization algorithm used to train the MAF. Its value can effect the quality of emulation.
- hidden_layers: list / default = [50, 50]
- The number of layers and number of nodes in each hidden layer for each neural network. The default is two hidden layers with 50 nodes each and each network in the chain has the same hidden layer structure.
- activation_func: string / default = ‘tanh’
- The choice of activation function. It must be an activation function keyword recognisable by TensorFlow. The default is ‘tanh’, the hyperbolic tangent activation function.
- theta_max: numpy array
- The true upper limits of the priors used to generate the samples that we want the MAF to learn.
- theta_min: numpy array
- As above but the true lower limits of the priors.
- parameters: list of strings
- A list of the relevant parameters to train on. Only needed if theta is an anestehetic samples object. If not provided, all parameters will be used.
Attributes:
A list of some key attributes accessible to the user.
- theta_max: numpy array
- The true upper limits of the priors used to generate the samples that we want the MAF to learn. If theta_max is not supplied as a kwarg, then this is is an approximate estimate.
- theta_min: numpy array
- As above but for the true lower limits of the priors. If theta_max is not supplied as a kwarg, then this is is an approximate estimate.
- loss_history: list
- This list contains the value of the loss function at each epoch during training.
Methods
__call__
(u)This function is used when calling the MAF class to transform samples from the unit hypercube to samples on the MAF.
Generating the masked autoregressive flow.
load
(filename)This function can be used to load a saved MAF.
log_like
(params, logevidence[, prior_de])This function should return the log-likelihood for a given set of parameters.
log_prob
(params)Function to caluclate the log-probability for a given MAF and set of parameters.
sample
([length])This function is used to generate samples on the MAF via the MAF __call__ function.
save
(filename)This function can be used to save an instance of a trained MAF as a pickled class so that it can be loaded and used in differnt scripts.
train
([epochs, early_stop, loss_type])This function is called to train the MAF once it has been initialised.
- __call__(u)[source]
This function is used when calling the MAF class to transform samples from the unit hypercube to samples on the MAF.
Parameters:
- u: numpy array
- Samples on the uniform hypercube.
- classmethod load(filename)[source]
This function can be used to load a saved MAF. For example
from margarine.maf import MAF file = 'path/to/pickled/MAF.pkl' bij = MAF.load(file)
Parameters:
- filename: string
- Path to the saved MAF.
- log_like(params, logevidence, prior_de=None)[source]
This function should return the log-likelihood for a given set of parameters.
It requires the logevidence from the original nested sampling run in order to do this and in the case that the prior is non-uniform a trained prior density estimator should be provided.
Parameters:
- params: numpy array
- The set of samples for which to calculate the log probability.
- logevidence: float
- Should be the log-evidence from the full nested sampling run with nuisance parameters.
- prior_de: margarine.maf.MAF / default=None
- If the prior is non-uniform then a trained prior density estimator should be provided. Otherwise the prior is assumed to be uniform and the prior probability is calculated analytically from the minimum and maximum values of the parameters.
- log_prob(params)[source]
Function to caluclate the log-probability for a given MAF and set of parameters.
While the density estimator has its own built in log probability function, a correction has to be applied for the transformation of variables that is used to improve accuracy when learning. The correction is implemented here.
Parameters:
- params: numpy array
- The set of samples for which to calculate the log probability.
- sample(length=1000)[source]
This function is used to generate samples on the MAF via the MAF __call__ function.
Kwargs:
- length: int / default=1000
- This should be an integer and is used to determine how many samples are generated when calling the MAF.
- save(filename)[source]
This function can be used to save an instance of a trained MAF as a pickled class so that it can be loaded and used in differnt scripts.
Parameters:
- filename: string
- Path in which to save the pickled MAF.
- train(epochs=100, early_stop=False, loss_type='sum')[source]
This function is called to train the MAF once it has been initialised. It calls the _training() function.
from margarine.maf import MAF bij = MAF(theta, weights=weights) bij.train()
Kwargs:
- epochs: int / default = 100
- The number of iterations to train the neural networks for.
- early_stop: boolean / default = False
- Determines whether or not to implement an early stopping algorithm or train for the set number of epochs. If set to True then the algorithm will stop training when test loss has not improved for 2% of the requested epochs. At this point margarine will roll back to the best model and return this to the user.
- loss_type: string / default = ‘sum’
- Determines whether to use the sum or mean of the weighted log probabilities to calculate the loss function.
- class margarine.kde.KDE(theta, **kwargs)[source]
This class is used to generate a KDE given a weighted set of samples, generate samples from that KDE, transform samples on the hypercube into samples on the KDE and save and load the KDE model.
Parameters:
- theta: numpy array or anesthetic.samples
- The samples from the probability distribution that we require the MAF to learn. This can either be a numpy array or an anesthetic NestedSamples or MCMCSamples object.
kwargs:
- weights: numpy array / default=np.ones(len(theta))
- The weights associated with the samples above. If an anesthetic NestedSamples or MCMCSamples object is passed the code draws the weights from this.
- bw_method: str, scalar or callable
- The bandwidth for the KDE.
- theta_max: numpy array
- The true upper limits of the priors used to generate the samples that we want the MAF to learn.
- theta_min: numpy array
- As above but the true lower limits of the priors.
- parameters: list of strings
- A list of the relevant parameters to train on. Only needed if theta is an anestehetic samples object. If not provided, all parameters will be used.
Attributes:
A list of some key attributes accessible to the user.
- kde: Instance of scipy.stats.gaussian_kde
- Once the class has been initalised with a set of samples and their corresponding weights we can generate the kde using the following code
from bayesstats.kde import KDE import numpy as np theta = np.loadtxt('path/to/samples.txt') weights = np.loadtxt('path/to/weights.txt') KDE_class = KDE(theta, weights) KDE_class.generate_kde()
This is analogous to training a Normalising Flow (Bijector class). Once the KDE is generated it can be accessed via KDE_class.kde. Initialisation of the class and generation of the KDE are kept seperate to allow models to be saved and loaded effectively.
- theta_max: numpy array
- The true upper limits of the priors used to generate the samples that we want the MAF to learn. If theta_max is not supplied as a kwarg, then this is is an approximate estimate.
- theta_min: numpy array
- As above but for the true lower limits of the priors. If theta_max is not supplied as a kwarg, then this is an approximate estimate.
Methods
__call__
(u)This function is used when calling the kde class to transform samples from the unit hypercube to samples on the kde.
Function noramlises the input data into a standard normal parameter space and then generates a weighted KDE.
load
(filename)This function can be used to load a saved KDE.
log_like
(params, logevidence[, prior, ...])This function should return the log-likelihood for a given set of parameters.
log_prob
(params)Function to caluclate the log-probability for a given KDE and set of parameters.
sample
([length])Function can be used to generate samples from the KDE.
save
(filename)Function can be used to save an initalised version of the KDE class and it's assosiated generated KDE.
- __call__(u)[source]
This function is used when calling the kde class to transform samples from the unit hypercube to samples on the kde.
Parameters:
- u: numpy array
- Samples on the uniform hypercube.
- generate_kde()[source]
Function noramlises the input data into a standard normal parameter space and then generates a weighted KDE.
- classmethod load(filename)[source]
This function can be used to load a saved KDE. For example
from margarine.kde import KDE file = 'path/to/pickled/bijector.pkl' KDE_class = KDE.load(file)
Parameters:
- filename: string
- Path to the saved KDE.
- log_like(params, logevidence, prior=None, prior_weights=None)[source]
This function should return the log-likelihood for a given set of parameters.
It requires the logevidence from the original nested sampling run in order to do this and in the case that the prior is non-uniform prior samples should be provided.
Parameters:
- params: numpy array
- The set of samples for which to calculate the log probability.
- logevidence: float
- Should be the log-evidence from the full nested sampling run with nuisance parameters.
- prior: numpy array/default=None
- An array of prior samples corresponding to the prior. Default assumption is that the prior is uniform which is required if you want to combine likelihoods from different experiments/data sets. In this case samples and prior samples should be reweighted prior to any training.
- log_prob(params)[source]
Function to caluclate the log-probability for a given KDE and set of parameters.
While the density estimator has its own built in log probability function, a correction has to be applied for the transformation of variables that is used to improve accuracy when learning. The correction is implemented here.
Parameters:
- params: numpy array
- The set of samples for which to calculate the log probability.
- sample(length=1000)[source]
Function can be used to generate samples from the KDE. It is much faster than the __call__ function but does not transform samples from the hypercube onto the KDE. It is however useful if we want to generate a large number of samples that can then be used to calulate the marginal statistics.
Kwargs:
- length: int / default=1000
- This should be an integer and is used to determine how many samples are generated when calling the bijector.
- class margarine.clustered.clusterMAF(theta, **kwargs)[source]
This class is used to train, load and call a piecewise normalising flow built from a set of masked autoregressive flows. The class is essentially a wrapper around the MAF class with some additional clustering functionality. This class has all the same functionality as the MAF class and can be used in the same way.
Parameters:
- theta: numpy array or anesthetic.samples
- The samples from the probability distribution that we require the MAF to learn. This can either be a numpy array or an anesthetic NestedSamples or MCMCSamples object.
kwargs:
- weights: numpy array / default=np.ones(len(theta))
- The weights associated with the samples above. If an anesthetic NestedSamples or MCMCSamples object is passed the code draws the weights from this.
- number_networks: int / default = 6
- The bijector is built by chaining a series of autoregressive neural networks together and this parameter is used to determine how many networks there are in the chain.
- learning_rate: float / default = 1e-3
- The learning rate determines the ‘step size’ of the optimization algorithm used to train the MAF. Its value can effect the quality of emulation.
- hidden_layers: list / default = [50, 50]
- The number of layers and number of nodes in each hidden layer for each neural network. The default is two hidden layers with 50 nodes each and each network in the chain has the same hidden layer structure.
- activation_func: string / default = ‘tanh’
- The choice of activation function. It must be an activation function keyword recognisable by TensorFlow. The default is ‘tanh’, the hyperbolic tangent activation function.
- cluster_labels: list / default = None
- If clustering has been performed externally to margarine you can provide a list of labels for the samples theta. The labels should be integers from 0 to k corresponding to the cluster that each sample is in.
- cluster_number: int / default = None
- If clustering has been performed externally to margarine you need to provide the number of clusters, k, alongside the cluster labels.
- parameters: list of strings
- A list of the relevant parameters to train on. Only needed if theta is an anestehetic samples object. If not provided, all parameters will be used.
Methods
__call__
(u[, seed])This function is used when calling the clusterMAF class to transform samples from the unit hypercube to samples on the clusterMAF.
load
(filename)This function can be used to load a saved MAF.
log_like
(params, logevidence[, prior_de])This function should return the log-likelihood for a given set of parameters.
log_prob
(params)Function to caluclate the log-probability for a given clusterMAF and set of parameters.
sample
([length])This function is used to generate samples on the clusterMAF via the clusterMAF __call__ function.
save
(filename)This function can be used to save an instance of a trained clusterMAF as a pickled class so that it can be loaded and used in differnt scripts.
train
([epochs, early_stop, loss_type])This function is called to train the clusterMAF once it has been initialised.
- __call__(u, seed=1420)[source]
This function is used when calling the clusterMAF class to transform samples from the unit hypercube to samples on the clusterMAF.
Parameters:
- u: numpy array
- Samples on the uniform hypercube.
- seed: int / default=1420
- Set the seed for the cluster choice.
- classmethod load(filename)[source]
This function can be used to load a saved MAF. For example
from margarine.clustered import clusterMAF file = 'path/to/pickled/MAF.pkl' bij = clusterMAF.load(file)
Parameters:
- filename: string
- Path to the saved MAF.
- log_like(params, logevidence, prior_de=None)[source]
This function should return the log-likelihood for a given set of parameters.
It requires the logevidence from the original nested sampling run in order to do this and in the case that the prior is non-uniform a trained prior density estimator should be provided.
Parameters:
- params: numpy array
- The set of samples for which to calculate the log probability.
- logevidence: float
- Should be the log-evidence from the full nested sampling run with nuisance parameters.
- prior_de: margarine.maf.MAF / default=None
- If the prior is non-uniform then a trained prior density estimator should be provided. Otherwise the prior is assumed to be uniform and the prior probability is calculated analytically from the minimum and maximum values of the parameters.
- log_prob(params)[source]
Function to caluclate the log-probability for a given clusterMAF and set of parameters.
While each density estimator has its own built in log probability function, a correction has to be applied for the transformation of variables that is used to improve accuracy when learning and we have to sum probabilities over the series of flows. The correction and the sum are implemented here.
Parameters:
- params: numpy array
- The set of samples for which to calculate the log probability.
- sample(length=1000)[source]
This function is used to generate samples on the clusterMAF via the clusterMAF __call__ function.
Kwargs:
- length: int / default=1000
- This should be an integer and is used to determine how many samples are generated when calling the clusterMAF.
- save(filename)[source]
This function can be used to save an instance of a trained clusterMAF as a pickled class so that it can be loaded and used in differnt scripts.
Parameters:
- filename: string
- Path in which to save the pickled MAF.
- train(epochs=100, early_stop=False, loss_type='sum')[source]
This function is called to train the clusterMAF once it has been initialised. It calls the train() function for each flow.
from margarine.clustered import clusterMAF bij = clusterMAF(theta, weights=weights) bij.train()
Kwargs:
- epochs: int / default = 100
- The number of iterations to train the neural networks for.
- early_stop: boolean / default = False
- Determines whether or not to implement an early stopping algorithm or train for the set number of epochs. If set to True then the algorithm will stop training when test loss has not improved for 2% of the requested epochs. At this point margarine will roll back to the best model and return this to the user.
- loss_type: string / default = ‘sum’
- Determines whether to use the sum or mean of the weighted log probabilities to calculate the loss function.
- class margarine.marginal_stats.calculate(de, **kwargs)[source]
This class, once initalised with a trained MAF or KDE and samples, can be used to calculate marginal KL divergences and bayesian dimensionalities.
Paramesters:
- de: instance of MAF class or KDE class
- This should be a loaded and trained instance of a MAF, clusterMAF or KDE.Bijectors can be loaded like so
from margarine.maf import MAF from margarine.kde import KDE from margarine.clustered import clusterMAF file = '/trained_maf.pkl' maf = MAF.load(file) file = '/trained_kde.pkl' kde = KDE.load(file) file = '/trained_clustered_maf.pkl' clustered_maf = clusterMAF.load(file)
- samples: numpy array
- This should be the output of the bijector when called to generate a set of samples from the replicated probability distribution. e.g. after loading a trained MAF we would pass
u = np.random.uniform(0, 1, size=(10000, 5)) prior_limits = np.array([[0]*5, [1]*5]) samples = maf(u, prior_limits)
Kwargs:
- prior_de: instance of MAF class, clusterMAF class or KDE class
- This should be a loaded and trained instance of a MAF, clusterMAF or KDE for the prior. If not provided, a uniform prior will be used.
Methods
integrate
(loglikelihood, prior_pdf[, ...])Importance sampling integration of a likelihood function
Calculate marginal bayesian KL divergence and dimensionality with approximate errors.
- integrate(loglikelihood, prior_pdf, batch_size=1000, sample_size=10000, logzero=-1e+30)[source]
Importance sampling integration of a likelihood function
- args:
- loglikelihood: function
- A function that takes a numpy array of samples and returns the loglikelihood of each sample.
- prior_pdf: function
- A function that takes a numpy array of samples and returns the prior logpdf of each sample.
- batch_size: int
- The number of samples to draw at each iteration.
- sample_size: int
- The number of samples to draw in total.
- logzero: float
The definition of zero for the loglikelihood function.
- returns:
- stats: dict
- Dictionary containing useful statistics