An introduction to SAMBA: Trivariate Bayesian Model Mixing w/GPs¶
Alexandra Semposki
Date: 07 June 2022
Introduction¶
From the last notebook, you saw the bivariate model mixing produce a well-mixed model with an overly conservative uncertainty band in the gap (for our toy model). We now want to try mixing in an interpolant in the form of a Gaussian Process (GP). This will enter $f^{\dagger}$ as a third individual model. Recall the function for $f^\dagger$:
$$ f^{\dagger} \sim \bigl(Z_P^{-1}\sum_k \frac{1}{v_k}f_k, Z_P^{-1}\bigr), $$
where
$$ Z_P \equiv \sum_{k=1}^{K}\frac{1}{v_k}. $$
$Z_P$ is the precision, or the inverse of the variance, of the $K$ models, $v_{k}$ the individual variances of each model (which we previously denoted the theory error), and $f^{\dagger}$ the mixed model. Now $K = 3$ for this trivariate case.
We start by loading all of our necessary packages and SAMBA classes.
%load_ext autoreload
%autoreload 2
#import packages
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from matplotlib.ticker import AutoMinorLocator
%matplotlib inline
#matplotlib settings for Latex plots
import matplotlib
#import the SAMBA classes needed
import sys
sys.path.append('../../')
from samba.models import Models, Uncertainties
from samba.discrepancy import Bivariate
from samba.gaussprocess import GP
from samba.fprdat import FPR
As usual, we start by defining two series expansions to mix. We will begin with $N_{s} = N_{l} = 3$. (Note that we could mix more than two series expansions, but we're sticking with the bare minimum in this tutorial.)
#define g and series expansions
g = np.linspace(1e-6,1.0,100)
ns = 3
nl = 3
# run something simple for array to check and save in a txt file
obj = Models(ns,nl)
objvar = Uncertainties(error_model='informative')
low_val = obj.low_g(g)[0]
high_val = obj.high_g(g)[0]
low_std = objvar.variance_low(g, ns)
high_std = objvar.variance_high(g, nl)
model1 = Bivariate(ns, nl, error_model='informative')
mean1, intervals1 = model1.plot_mix(g, plot_fdagger=True)
std_dev = intervals1[:,1] - mean1
# obtain the weights and then store them
weights = model1.var_weights
weights_low = weights[0,:]
weights_high = weights[1,:]
# concatenate them / list format arrays
samba_values = np.array([low_val, high_val, low_std, high_std, mean1, intervals1[:,0], intervals1[:,1], std_dev, weights_low, weights_high])
samba_arrays = np.savetxt('samba_results.txt', samba_values, delimiter=',')
Setting up a Gaussian Process model¶
Now we instantiate the GP class, which acts as a wrapper for the Python package scikit learn. We immediately pick the kernel we wish to use for the GP here. The options are RBF, Matern, and Rational Quadratic. If we want to pick Matern, we also supply the value of $\nu$ when asked. This is shown below.
#instantiate GP object for N_s = N_l = 3
obj1 = GP(g, ns, nl, kernel="Matern", nu=1.5, ci=68, error_model='informative')
Now we have a GP object that will use the Matern 3/2 kernel for its analysis. When using a GP, we need to train on a set of data. In a typical situation where the user has data to give the GP, these would be the training points used. However, in our case, with a toy model, we must generate these training points. In SAMBA, the training points are chosen by offsetting the original linspace in $g$ by a small amount, so a completely new linspace in $g$ is created that does not overlap exactly with any of the points first given to the GP class. This is because those points will be used in both the series expansion calculation and the GP prediction---which means, if we want the series expansions and GP to mix correctly in $f^\dagger$, we need to have the points in $g$ for all three models to be the same, or we will not be mixing the right points. Hence, we create a training set after a prediction set for the toy model, unlike in real situations.
SAMBA takes care of generating this new training array for you, but it does request that you choose a method to determine the fourth and final training point, located within the region of the large-$g$ expansion. This is because there are many different ways we could assign that training point. The three methods that we tried are
Fixing the point at $g = 0.6$ (independent of the location of the gap)
Choosing the point in the array computed for the large-$g$ expansion where the value of $F_s(g)$ cuts off (where we choose to cut off the calculation of the small-$g$ expansion at the edge of the pertinent domain in $F(g)$)
Putting the training point at the location where the theory error on the current point in the large-$g$ expansion array is larger than 5% of the next value of $F_l(g)$ in the array
SAMBA places the other 3 points at fixed points: two in the small-$g$ expansion array, and one at $g = 1$ in the large-$g$ expansion.
In our tutorial, we pick method 2, which tends to yield the best results when comparing the true curve to the mixed model. However, none of the 3 methods are spectacularly better than the others, which shows how the GP is not overly dependent on the placement of the training points (great news for generalizing this!). We also select 'error=True' in the arguments of the training() function, because we wish to use the theory errors in our GP.
Expected Gaussian parameters for method 2 with Matern 3/2 kernel and $N_{s}=N_{l}=3$ below:
length_scale = 2.08
variance = 2.27
#call the training() function from the GP() class
obj_tr1 = obj1.training(error=True, method=2, plot=True)
Gaussian process parameters: 2.3**2 * Matern(length_scale=2.14, nu=1.5)
The plot above shows the theory errors on each calculated training point, with the red and blue curves being the small-$g$ and large-$g$ expansions, as usual. The black points are the four chosen training points for our GP that it just used to train itself. We will now use the validate() function in the GP class to predict at each original point in $g$.
Expected mean1, sig1, cov1 for method 2, Matern 3/2 kernel and $N_{s}=N_{l}=3$ below:
mean1 = [2.55678727 2.550882 2.54491822 2.53889552 2.53281353 ...1.56303294 1.56056141 1.55811837 1.55570382 1.55331773]
sig1 = [0.0193262 0.01764195 0.01599236 0.01437896 0.01280341...0.00835179 0.00663118 0.00485759 0.00305163 0.00135394]
cov1 = [[3.73502044e-04 3.40524975e-04 3.07502834e-04 ... 4.08375077e-07 2.44743834e-07 8.20266537e-08] [3.40524975e-04 3.11238518e-04 2.81740028e-04 ... 3.80396282e-07 2.27975826e-07 7.64068068e-08] ... [2.44743834e-07 2.27975826e-07 2.11062770e-07 ... 1.46638192e-05 9.31247488e-06 3.69959419e-06] [8.20266537e-08 7.64068068e-08 7.07383467e-08 ... 5.46592671e-06 3.69959419e-06 1.83315966e-06]]
#call the validate() function
mean1, sig1, cov1 = obj1.validate(plot=True)
We can now see the shaded green region and green curve in the plot above, which correspond to the GP variance band and GP mean curve, respectively.
Multivariate model mixing: PPD¶
Now comes the time to mix this model in with the two expansions and see what we get. To do this, we call plot_mix() from the Bivariate() class again, but this time we send it the GP mean and variance results.
Expected values of mixed_mean, mixed_intervals for Matern 3/2 kernel, method 2, with $N_{s}=N_{l}=3$ below:
mixed_mean = [2.50662827 2.50643831 2.50586856 2.50491903 2.50358975 ... 1.56463932 1.56177765 1.55892173 1.55608208 1.55336891]
mixed_intervals = [[2.50662827 2.50662827] [2.50643825 2.50643836] [2.50586769 2.50586943] ... [1.55670377 1.56113969] [1.55417787 1.55798629] [1.5521918 1.55454602]]
#call plot_mix() to mix in the GP
mixed_mean, mixed_intervals = obj1.plot_mix(g, plot_fdagger=True, plot_true=True, GP_mean=mean1, GP_var=np.square(sig1))
This result looks WAY better in the gap than the bivariate BMM did alone! Now let's directly compare to that case using our comparison function subplot_mix(), also found in the Bivariate() class.
#call subplot_mix() and plot no GP results next to GP results
obj1.subplot_mix(g, GP_mean=mean1, GP_var=np.square(sig1))
Diagnostics: Mahalanobis distance¶
The GP shows an obvious improvement in the gap in panel (b), as it follows the true curve quite well. The uncertainty band is also more believable, as it encompasses most of the true curve and does not allow for the possibility of any surprising result (like the dip seen in panel (a), which might be possible for a physical situation but not in a toy case where we know exactly what we are expecting to get).
Let's check our results and see if we can truly trust them. One good way to do this is our diagnostic, the Mahalanobis distance. This is given by
$$ \mathrm{D}^{2}_{\mathrm{MD}} = (y - m)^{T}\textit{K}^{-1}(y - m), $$
where $y$ is the vector of true solutions at each point in $g$ that we are comparing to (our true curve), $m$ is the GP solution at each point in $g$, and $K^{-1}$ is the inverse of the GP covariance matrix, returned by our validate() function above.
Let's try this calculation to see what we get for the Mahalanobis distance. We will use 3 points to calculate it, as our lengthscale is quite long, and we can only put a few points in it without $K$ becoming a singular matrix. We will also choose 1000 draws from our GP to compare our Mahalanobis distance to a reference distribution.
Expected values for md_g, md_mean, md_sig, and md_cov for pts=3, $N_{s}=N_{l}=3$, Matern 3/2 kernel, method 2 below:
md_g = [0.29648312 0.52763866 0.76381933]
md_mean = [2.16914526 1.90627322 1.69666482]
md_sig = [0.04170691 0.04040816 0.02373173]
md_cov = [[ 0.00173947 0.00111714 -0.00027924] [ 0.00111714 0.00163282 -0.00048348] [-0.00027924 -0.00048348 0.00056319]]
Expected values for md_gp, md_ref below:
md_gp = 0.2452667195670961
md_ref = [ 2.18957992 2.51578661 0.71175949 1.7594288 2.12702599 ... 2.92643679 1.28506737 0.50875747 2.12148018 4.50742567]
#calculate the Mahalanobis points
md_g, md_mean, md_sig, md_cov = obj1.MD_set(pts=3, plot=True)
#use the points to calculate the Mahalanobis distance for our GP
md_gp, md_ref = obj1.md_squared(md_g, md_mean, md_cov, n_curves=1000)
Now that we have all of this information, let's plot the histogram of our GP curves to double check that we are indeed getting the $\chi^{2}$ distribution shape we expect from our GP draws.
#call
help(obj1.md_plotter)
obj1.md_plotter(md_gp, md_ref, md_mean, md_cov, hist=True, box=False)
Help on method md_plotter in module samba.gaussprocess: md_plotter(md_gp, md_ref, md_mean=None, md_cov=None, hist=True, box=False) method of samba.gaussprocess.GP instance A plotting function that allows the Mahalanobis distance to be plotted using either a histogram or a box and whisker plot, or both. Box and whisker plot code heavily drawn from J. Melendez' gsum code (https://github.com/buqeye/gsum). :Example: GP.md_plotter(md_gp=np.array([]), md_ref=np.array([]), hist=False, box=True) Parameters: ----------- md_gp : float The MD^2 value for the GP curve. md_ref : numpy.ndarray The array of MD^2 values for the reference distribution. md_mean : numpy.ndarray The values of the GP mean at the md_g points. Only used for box and whisker option; default is None. md_cov : numpy.ndarray The values of the GP covariance matrix at the md_g points. Only used for box and whisker option; default is None. hist : bool Toggle for plotting a histogram. Default is True. box : bool Toggle for plotting a box plot. Default is False. Returns: -------- None.
This looks good; the red dot at the low end is the squared MD result from our GP. It is close to 0, indicating that the result we get is not too far from the true curve, and it is within the expectation from the $\chi^{2}$, so we have a good prediction for our mixed model.
Weights¶
Now let's look at what the weights of each function look like (the normalized precisions of each model) across the input space, $g$. The actual weight values are contained within a class variable in the fdagger() function, under the name var_weights.
#pull weights out of fdagger()
vargp1 = obj1.var_weights[0]
vargp2 = obj1.var_weights[1]
vargp3 = obj1.var_weights[2]
#set up figure
fig = plt.figure(figsize=(8,6), dpi=600)
ax = plt.axes()
ax.tick_params(axis='x', labelsize=18)
ax.tick_params(axis='y', labelsize=18)
ax.locator_params(nbins=5)
ax.xaxis.set_minor_locator(AutoMinorLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
ax.set_xlim(0.0,1.0)
ax.set_ylim(0.0,1.0)
ax.set_yticks([0.1, 0.3, 0.5, 0.7, 0.9])
#labels and true model
ax.set_xlabel('g', fontsize=22)
ax.set_ylabel(r'$\hat{w}_{k}$', fontsize=22)
ax.plot(g, vargp1, 'r', linewidth=3, label=r'$v_{s}^{-1}$')
ax.plot(g, vargp2, 'b', linewidth=3, label=r'$v_{l}^{-1}$')
ax.plot(g, vargp3, 'g', linewidth=3, label=r'$v_{GP}^{-1}$')
ax.legend(fontsize=18, loc='upper right')
plt.show()
This is quite an interesting shape; you should see that the GP (the green curve) takes over in the gap in the mixing process, but then drops in favour of the large-$g$ expansion (blue curve) at the edge of the gap, but then comes back as the dominant model after this for a little bit before mostly surrendering to the large-$g$ expansion. It appears that the jumps in the weights are at the values of $g$ that are close to the training points.
FPR Comparison¶
In case you wish to compare these results to those of Honda in the paper, where he used the Fractional Power of Rational function (FPR) method (see Sec. 2.3 of this paper) to approximate the true model, you can pull some of his results from the FPR() class and overlay them with the mixed model result we just obtained above.
#call the fpr_plot function from FPR() class
fpr_obj1 = FPR(g, ns, nl)
fpr_obj1.fpr_plot(mixed_mean, mixed_intervals, fpr_keys=['(3,3)^(1/6)'], ci=68)
You can see, in the inset plot, the purple dashed line there, which represents the FPR result. You can change the value of $\alpha$ to any of the other available values (for $N_{s}, N_{l}=3$, these are 1/2, 1/6, and 1/14) in the fprset() function in the FPR() class, and see how the FPR result changes in comparison to both the true curve and our mixed model result. You see that the FPR result has no uncertainty quantification despite often being rather good at predicting the true curve, making the need for our mixed model clear.
Now let's repeat the process of mixing our models for $N_{s} = N_{l} = 2$ and $N_{s} = N_{l} = 5$, as in Fig. 7 in the paper.
*Expected Gaussian parameters for for $N_{s}=N_{l}=2$, Matern 3/2 kernel, method 2 below:
length_scale = 2.19
variance = 2.44
Expected values for mean2, sig2, cov2 below:
mean2 = [2.55722619 2.55126655 2.54525169 2.53918126 2.53305492...1.57512836 1.57209261 1.56908678 1.56611116 1.56316604]
sig2 = [0.01876248 0.01712592 0.01552338 0.01395631 0.01242632 ... 0.0099753 0.00865197 0.00746104 0.00652541 0.00602179]
cov2 = [[3.52030736e-04 3.20925503e-04 2.89789816e-04 ... 3.72260454e-07 2.40822250e-07 1.10005451e-07] [3.20925503e-04 2.93297249e-04 2.65481431e-04 ... 3.46610444e-07 2.24230144e-07 1.02428391e-07] ... [2.40822250e-07 2.24230144e-07 2.07502354e-07 ... 4.73465254e-05 4.25809124e-05 3.75668099e-05] [1.10005451e-07 1.02428391e-07 9.47893683e-08 ... 3.87738358e-05 3.75668099e-05 3.62619107e-05]]
Expected values for mixed_mean2, mixed_intervals2 below:
mixed_mean2 = [2.50662827 2.50643831 2.50586856 2.50491903 2.50358975 ... 1.5751143 1.57208418 1.56908052 1.56610477 1.56315758]
mixed_intervals2 = [[2.50662827 2.50662827] [2.50643825 2.50643836] [2.50586769 2.50586943]...[1.56230621 1.57585484] [1.56006899 1.57214055] [1.55753961 1.56877554]]
#new object for N_s = N_l = 2 and same steps as before
ns = 2
nl = 2
obj2 = GP(g, ns, nl, ci=68, kernel="Matern", nu=1.5, error_model='informative')
obj_tr2 = obj2.training(plot=False)
mean2, sig2, cov2 = obj2.validate(plot=True)
mixed_mean2, mixed_intervals2 = obj2.plot_mix(g, plot_fdagger=True, plot_true=True, GP_mean=mean2, GP_var=np.square(sig2))
Gaussian process parameters: 2.53**2 * Matern(length_scale=2.34, nu=1.5)
Expected Gaussian parameters for $N_{s}=N_{l}=5$, Matern 3/2 kernel, method 2 below:
length_scale = 0.889
variance = 1.43
Expected values for mean3, sig3, cov3 below:
mean3 = [2.51973149 2.51700401 2.51418648 2.51127741 2.50827525...1.55654875 1.55573191 1.55497964 1.55429276 1.55367207]
sig3 = [0.03393844 0.03096305 0.02803949 0.02517217 0.02236587... 0.0198764 0.01557175 0.01120256 0.00676925 0.00227229]
cov3 = [[ 1.15181746e-03 1.04912112e-03 9.45330624e-04 ... -1.83742307e-06 -1.09250665e-06 -3.60753668e-07] [ 1.04912112e-03 9.58710327e-04 8.66592363e-04 ... -1.71804751e-06 -1.02152758e-06 -3.37315859e-07] ... [-1.09250665e-06 -1.02152758e-06 -9.49045906e-07 ... 7.55800038e-05 4.58227379e-05 1.52962535e-05] [-3.60753668e-07 -3.37315859e-07 -3.13381883e-07 ... 2.51179533e-05 1.52962535e-05 5.16330684e-06]]
Expected values for mixed_mean3, mixed_intervals3 below:
mixed_mean3 = [2.50662827 2.50643839 2.5058699 2.50492583 2.5036112 ... 1.56617842 1.56330951 1.56045572 1.55761692 1.55479279]
mixed_intervals3 = [[2.50662827 2.50662827] [2.50643839 2.50643839] [2.5058699 2.50586991]...[1.56040869 1.56050275] [1.55757141 1.55766243] [1.55474875 1.55483682]]
#new object for N_s = N_l = 5 and same steps as before
ns = 5
nl = 5
obj3 = GP(g, ns, nl, ci=68, kernel="Matern", nu=1.5, error_model='informative')
obj_tr3 = obj3.training(plot=False)
mean3, sig3, cov3 = obj3.validate(plot=True)
mixed_mean3, mixed_intervals3 = obj3.plot_mix(g, plot_fdagger=True, plot_true=True, GP_mean=mean3, GP_var=np.square(sig3))
Gaussian process parameters: 1.42**2 * Matern(length_scale=0.865, nu=1.5)
These results also look pretty good! We can say that this method is by far the best for our toy model, and we cannot wait to apply this to real nuclear physics applications! Stay tuned for the next release, where you will be able to input your own functions and test them in this sandbox!
Written by: Alexandra Semposki (07 June 2022)