Introduction and model setup¶
Welcome to SAMBA, our toddler computational package that allows you to explore the world of Bayesian model mixing! SAMBA (SAndbox for Mixing using Bayesian Analysis) currently supports three methods of model mixing developed in our paper (see this arXiv link). In the future, we will release a version of this package where you, the reader, can input your own functions to mix. At the time of this release, however, we've got our toy problem set up for you to play with.
Let's quickly define the toy problem from the paper linked above. We want to mix the expansions of the zero-dimensional $\phi^4$-theory partition function, below:
$$ F(g) = \int_{-\infty}^{\infty} dx~ e^{-\frac{x^{2}}{2} - g^{2} x^{4}} = \frac{e^{\frac{1}{32 g^{2}}}}{2 \sqrt{2}g} K_{\frac{1}{4}}\left(\frac{1}{32 g^{2}} \right). $$
The two expansions are limits taken at $g = 0$ and $g = \infty$:
$$ F_{s}^{N_s}(g) = \sum_{k=0}^{N_{s}} s_{k} g^{k}, $$
and
$$ F_{l}^{N_{l}}(g) = \frac{1}{\sqrt{g}} \sum_{k=0}^{N_{l}} l_{k} g^{-k}, $$
with coefficients given as:
$$ s_{2k} = \frac{\sqrt{2} \Gamma{(2k + 1/2)}}{k!} (-4)^{k},~~~~~s_{2k + 1} = 0 $$
and
$$ l_{k} = \frac{\Gamma{\left(\frac{k}{2} + \frac{1}{4}\right)}}{2k!} \left(-\frac{1}{2}\right)^{k}. $$
We begin by importing all of the Python packages we will need in this Jupyter notebook.
%load_ext autoreload
%autoreload 2
import numpy as np
import math
import statistics
import emcee
import corner
from cycler import cycler
from scipy import stats, special, integrate
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
%matplotlib inline
#matplotlib settings for Latex plots
import matplotlib
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload
Now we import the classes from SAMBA that we'll need for the Linear Mixture Model (LMM from here on).
import sys
sys.path.append('../../')
from samba.models import Models, Uncertainties
from samba.mixing import LMM
from samba.priors import Priors
The first two classes, Models and Uncertainties, give our toy model and its theory errors. LMM is the class that performs the linear mixture method, and Priors are the priors needed to sample any parameters in this model. This will be clear later on.
Now we plot the functions we want to mix to see what they look like. We need to first pick a value for $N_{s}$ and $N_{l}$. Let's pick $N_{s}$ = 2 and $N_{l}$ = 2, as in the paper.
#set up the linspace for input variable g and N_s and N_l
g = np.linspace(1e-6,1.0,100)
ns = np.array([2])
nl = np.array([2])
#call the plot_models() function from Models()
m = LMM(ns, nl, error_model='informative')
plot = m.plot_models(g)
Now let's try to mix these two functions using the LMM() class. We'll need data in the gap, as this method relies on some data there. To do this, we estimate the length of the gap by eye and make a linspace over it with a few data points.
#make linspace for data
g_data = np.linspace(0.1, 0.6, 10)
#call the add_data() function to create the data
data, sigma = m.add_data(g, g_data, error=0.01)
Choosing a function for the weights and parameter estimation using MCMC¶
Now we call the mixing function, mixed_model(), and give it this data. It will ask us which mixing function we wish to use, out of three options: logistic, cdf, and piecewise cosine. As discussed in the paper, the best option is the piecewise cosine. This mixing function will require emcee to sample 3 parameters, as we'll see later on in a plot we can generate with the results of the sampling.
The equation the sampler will solve is the posterior written as $$ p(\theta|\mathbf{D}) = p(\theta)\prod_{i=1}^{n} \left\{ \alpha(g_{i}; \theta) ~\mathcal{N}(F^{N_s}_{s}(g_{i}), \sigma_{d_{i}}^{2} + \sigma_{N_s}^2) + (1 - \alpha(g_{i}; \theta))~ \mathcal{N}(F^{N_l}_{l}(g_{i}), \sigma_{d_{i}}^{2} + \sigma_{N_l}^2) \right\}, $$
where $\alpha(g;\theta)$ is this piecewise cosine mixing function (see Eq. (15) in the paper), the two individual models are written as Gaussian distributions, and their variances, $\sigma_{d_{i}}^{2} + \sigma_{N_s}^2$ and $\sigma_{d_{i}}^{2} + \sigma_{N_l}^2$, are combinations of the error on the data points ($\sigma_{d_{i}}^{2}$) and the theory error at each data point ($\sigma_{N}^2$). The prior $p(\theta)$ is given as three truncated Gaussians in the form
$$ p(\theta) = \mathcal{U}(\theta_{1} \in [0,b])~\mathcal{N}(\theta_{1}; \mu_{1}, 0.05) \times\mathcal{U}(\theta_{2} \in [\theta_{1},b])~\mathcal{N}(\theta_{2}; \mu_{2}, 0.05) \times\mathcal{U}(\theta_{3} \in [\theta_{2},b])~\mathcal{N}(\theta_{3}; \mu_{3}, 0.05). $$
The means $\mu$ and ranges $b$, as well as the variance (given there as 0.05 for each prior) are dependent upon the series expansion chosen and therefore can be altered as need arises.
NOTE: If you merely choose some data points and run the code, you will notice that, when all is said and done, the parameters may not be located in very accurate spots in the gap. This could be because the Priors() function lpdf() needs some adjusting, by you! Your freedom in the toy model so far is to adjust the information in the Priors() class so that the best starting points for the sampler are given to it. As mentioned above, the mean and variance of the Gaussian prior can be changed to better match where one would expect that parameter to be located, and the uniform prior helps the sampler keep the parameters from crossing when they are sampled. They must not cross, else the sampler will not be happy and the mixing will not work properly. Hence, adjusting the uniform prior so that the parameters remain in their own regions, but allowing them some wiggle room via the Gaussian variance, is the best method when dealing with the piecewise cosine mixing function.
So far, the best way change these parameters is by directly altering the class file priors.py yourself, but in the future when more flexibility for users is added, lpdf() will become a function you can directly input into the class without changing anything internally in the package.
Note that the values here will not be absolute---the emcee sampler will allow for some differences between runs, as it is stochastic, and the data generated in this notebook is randomized for each run. However, expected values for the trace and onward in this notebook should be within a few percent of the results you obtain yourself using the prior values below:
Prior information for benchmark (set within priors.py file):
g1 = self.luniform(params[0], 0.01, 0.3) + stats.norm.logpdf(params[0], 0.1, 0.05)
g3 = self.luniform(params[2], params[0], 0.55) + stats.norm.logpdf(params[2], 0.4, 0.05)
g2 = self.luniform(params[1], params[2], 0.8) + stats.norm.logpdf(params[1], 0.6, 0.05)
*Expected values of trace for $N_{s}=N_{l}=2$, cosine mixing function, nsteps=3000:
trace = [[0.07418137 0.07418137 0.03490732 ... 0.12225827 0.14157856 0.16379292] [0.59066046 0.59066046 0.54589913 ... 0.51536977 0.50214119 0.48472922] [0.39022897 0.39022897 0.37559441 ... 0.41987635 0.4359556 0.42130717]]
#call mixed_model()
chain, trace = m.mixed_model(g_data, data, sigma, mixing_function='cosine', nsteps=3000)
Using 20 walkers with 3000 steps each, for a total of 60000 samples. Calculation finished! Duration = 1 min, 33 sec.
We have the samples in hand now. However, we would like to thin them so that we cut out any correlations in the samples. This allows us to capture the most important features of the samples. Let's do this using the autocorrelation length in the function stats_chain(). The results for the mean and median of the mixing function parameter distributions will also be returned from this function, along with the thinned array for any further work we wish to do with the sample results.
Expected values for thin_array, mean, median for $N_{s}=N_{l}=2$, cosine mixing function:
thin_array = [[0.13377046 0.15029928 0.2248071 ... 0.17242319 0.19149209 0.11299201] [0.31068 0.30495626 0.49641581 ... 0.51533557 0.63964489 0.57477458] [0.21107303 0.25329124 0.37645051 ... 0.36013452 0.36440693 0.42444696]]
mean = [0.11096199 0.5882248 0.38049597]
median = [0.10839164 0.59186747 0.38149871]
thin_array, mean, median = m.stats_chain(chain)
Now that we know these values, we can take a look at the mixing function we used (before looking at the final PPD result of the mixed model). To do this properly, we employ the MAP value results from the function MAP_values(), instead of using the mean or the median, as these can be misleading when one's distributions are not purely Gaussian. We then overlay the MAP values with the mixing curve itself, to show where the parameter values have landed in $g$.
Expected MAP values for $N_{2}=N_{l}=2$, cosine mixing function:
map_values = [0.10497989 0.59295989 0.38282007]
map_values = m.MAP_values(thin_array, g, g_data, data, sigma)
print(map_values)
[0.10164784 0.59502968 0.3833919 ]
Weights¶
#define a weight plot function for simplicity
def weight_plot(g, map_values):
#set up figure
fig = plt.figure(figsize=(8,6), dpi=600)
ax = plt.axes()
ax.tick_params(axis='x', labelsize=18)
ax.tick_params(axis='y', labelsize=18)
ax.locator_params(nbins=5)
ax.xaxis.set_minor_locator(AutoMinorLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
ax.set_xlim(0.0,1.0)
ax.set_ylim(0.0,1.0)
ax.set_yticks([0.1, 0.3, 0.5, 0.7, 0.9])
#labels and true model
# ax.set_title('Method: {}'.format(k))
ax.set_xlabel('g', fontsize=22)
ax.set_ylabel(r'$\hat{w}_{k}$', fontsize=22)
#calculate the cosine function
cosine = np.zeros(len(g))
for i in range(len(g)):
cosine[i] = m.switchcos(map_values, g[i])
#plot the weights
ax.plot(g, cosine, 'r', linewidth=3, label=r'$\alpha(g;\theta)$')
ax.plot(g, 1-cosine, 'b', linewidth=3, label=r'$1 - \alpha(g;\theta)$')
#plot the training points as lines
ax.axvline(x=map_values[0], color='darkorange', linestyle='dashed', label=r'$\theta_1$')
ax.axvline(x=map_values[2], color='darkviolet', linestyle='dashdot', label=r'$\theta_2$')
ax.axvline(x=map_values[1], color='deeppink', linestyle='dashed', label=r'$\theta_3$')
ax.legend(fontsize=14, loc='upper right', frameon=False)
plt.show()
return None
weight_plot(g, map_values)
Calculating the posterior predictive distribution (PPD)¶
Now we want to calculate the mixed model, which means we need to obtain the posterior predictive distribution, or PPD, as a function of points in $g$ that can be spread across the input space. For this, we make another array in $g$ and call the ppd() function, employing our thinned trace as well.
The PPD equation that is solved in this code is given as
$$ p(\tilde y(g)|\theta, \mathbf{D}) = \sum_{j=1}^{M} \alpha(g; \theta_{j}) ~\mathcal{N}(F^{N_s}_{s}(g), \sigma_{d_{i}}^{2} + \sigma_{N_s}^2) + (1 - \alpha(g; \theta_{j}))~ \mathcal{N}(F^{N_l}_{l}(g), \sigma_{d_{i}}^{2} + \sigma_{N_l}^2), $$
much like before with the sampled posterior, except now we are predicting for a new observation $\tilde y(g)$.
Expected median_results, intervals for $N_{s}=N_{l}=2$, cosine mixing function:
median_results = [2.50662827, 2.50643831, 2.50586856, 2.50491903, 2.50358972...1.57507493, 1.57205394, 1.56905142, 1.56606715, 1.56310093] intervals = [[2.50662827, 2.50662827], [2.50643831, 2.50643831], [2.50586856, 2.50586856],...[1.56905142, 1.56905142], [1.56606715, 1.56606715], [1.56310093, 1.56310093]]
#PPD linspace
g_ppd = np.linspace(1e-6, 1.0, 200)
#PPD calculation using ppd() and MAP parameter values
median_results, intervals = m.ppd(thin_array, map_values, g_data, g_ppd, data, 0.68)
And that's the mixed model result! The green curve is our PPD result, which has been generated using the median of the posterior at each point in $g$. The green band is the credibility interval at 68%, using the HPD (Highest Posterior Density) method, which is contained in the function hpd_interval() in the LMM() class, and does its work by finding the smallest region with 68% of the posterior density and plotting it.
It is quite evident, however, that the mixed model here is not matching the true curve very well at all. In the next notebook (Bivariate_BMM) we will look at another method to mix these two series expansions together.
Written by: Alexandra Semposki (06 June 2022)