An introduction to SAMBA: Trivariate Bayesian Model Mixing w/GPs
Alexandra Semposki
Date: 07 June 2022
Introduction
From the last notebook, you saw the bivariate model mixing produce a well-mixed model with an overly conservative uncertainty band in the gap (for our toy model). We now want to try mixing in an interpolant in the form of a Gaussian Process (GP). This will enter \(f^{\dagger}\) as a third individual model. Recall the function for \(f^\dagger\):
where
\(Z_P\) is the precision, or the inverse of the variance, of the \(K\) models, \(v_{k}\) the individual variances of each model (which we previously denoted the theory error), and \(f^{\dagger}\) the mixed model. Now \(K = 3\) for this trivariate case.
We start by loading all of our necessary packages and SAMBA classes.
As usual, we start by defining two series expansions to mix. We will begin with \(N_{s} = N_{l} = 3\). (Note that we could mix more than two series expansions, but we're sticking with the bare minimum in this tutorial.)
# run something simple for array to check and save in a txt file
obj = Models(ns,nl)
objvar = Uncertainties(error_model='informative')
low_val = obj.low_g(g)[0]
high_val = obj.high_g(g)[0]
low_std = objvar.variance_low(g, ns)
high_std = objvar.variance_high(g, nl)
model1 = Bivariate(ns, nl, error_model='informative')
mean1, intervals1 = model1.plot_mix(g, plot_fdagger=True)
std_dev = intervals1[:,1] - mean1
# obtain the weights and then store them
weights = model1.var_weights
weights_low = weights[0,:]
weights_high = weights[1,:]
# concatenate them / list format arrays
samba_values = np.array([low_val, high_val, low_std, high_std, mean1, intervals1[:,0], intervals1[:,1], std_dev, weights_low, weights_high])
samba_arrays = np.savetxt('samba_results.txt', samba_values, delimiter=',')
Setting up a Gaussian Process model
Now we instantiate the GP class, which acts as a wrapper for the Python package scikit learn. We immediately pick the kernel we wish to use for the GP here. The options are RBF, Matern, and Rational Quadratic. If we want to pick Matern, we also supply the value of \(\nu\) when asked. This is shown below.
Now we have a GP object that will use the Matern 3/2 kernel for its analysis. When using a GP, we need to train on a set of data. In a typical situation where the user has data to give the GP, these would be the training points used. However, in our case, with a toy model, we must generate these training points. In SAMBA, the training points are chosen by offsetting the original linspace in \(g\) by a small amount, so a completely new linspace in \(g\) is created that does not overlap exactly with any of the points first given to the GP class. This is because those points will be used in both the series expansion calculation and the GP prediction---which means, if we want the series expansions and GP to mix correctly in \(f^\dagger\), we need to have the points in \(g\) for all three models to be the same, or we will not be mixing the right points. Hence, we create a training set after a prediction set for the toy model, unlike in real situations.
SAMBA takes care of generating this new training array for you, but it does request that you choose a method to determine the fourth and final training point, located within the region of the large-\(g\) expansion. This is because there are many different ways we could assign that training point. The three methods that we tried are
-
Fixing the point at \(g = 0.6\) (independent of the location of the gap)
-
Choosing the point in the array computed for the large-\(g\) expansion where the value of \(F_s(g)\) cuts off (where we choose to cut off the calculation of the small-\(g\) expansion at the edge of the pertinent domain in \(F(g)\))
-
Putting the training point at the location where the theory error on the current point in the large-\(g\) expansion array is larger than 5% of the next value of \(F_l(g)\) in the array
SAMBA places the other 3 points at fixed points: two in the small-\(g\) expansion array, and one at \(g = 1\) in the large-\(g\) expansion.
In our tutorial, we pick method 2, which tends to yield the best results when comparing the true curve to the mixed model. However, none of the 3 methods are spectacularly better than the others, which shows how the GP is not overly dependent on the placement of the training points (great news for generalizing this!). We also select 'error=True' in the arguments of the training() function, because we wish to use the theory errors in our GP.
Expected Gaussian parameters for method 2 with Matern 3/2 kernel and \(N_{s}=N_{l}=3\) below:
length_scale = 2.08
variance = 2.27
The plot above shows the theory errors on each calculated training point, with the red and blue curves being the small-\(g\) and large-\(g\) expansions, as usual. The black points are the four chosen training points for our GP that it just used to train itself. We will now use the validate() function in the GP class to predict at each original point in \(g\).
Expected mean1, sig1, cov1 for method 2, Matern 3/2 kernel and \(N_{s}=N_{l}=3\) below:
mean1 = [2.55678727 2.550882 2.54491822 2.53889552 2.53281353 ...1.56303294 1.56056141 1.55811837 1.55570382 1.55331773]
sig1 = [0.0193262 0.01764195 0.01599236 0.01437896 0.01280341...0.00835179 0.00663118 0.00485759 0.00305163 0.00135394]
cov1 = [[3.73502044e-04 3.40524975e-04 3.07502834e-04 ... 4.08375077e-07 2.44743834e-07 8.20266537e-08] [3.40524975e-04 3.11238518e-04 2.81740028e-04 ... 3.80396282e-07 2.27975826e-07 7.64068068e-08] ... [2.44743834e-07 2.27975826e-07 2.11062770e-07 ... 1.46638192e-05 9.31247488e-06 3.69959419e-06] [8.20266537e-08 7.64068068e-08 7.07383467e-08 ... 5.46592671e-06 3.69959419e-06 1.83315966e-06]]
We can now see the shaded green region and green curve in the plot above, which correspond to the GP variance band and GP mean curve, respectively.
Multivariate model mixing: PPD
Now comes the time to mix this model in with the two expansions and see what we get. To do this, we call plot_mix() from the Bivariate() class again, but this time we send it the GP mean and variance results.
Expected values of mixed_mean, mixed_intervals for Matern 3/2 kernel, method 2, with \(N_{s}=N_{l}=3\) below:
mixed_mean = [2.50662827 2.50643831 2.50586856 2.50491903 2.50358975 ... 1.56463932 1.56177765 1.55892173 1.55608208 1.55336891]
mixed_intervals = [[2.50662827 2.50662827] [2.50643825 2.50643836] [2.50586769 2.50586943] ... [1.55670377 1.56113969] [1.55417787 1.55798629] [1.5521918 1.55454602]]
This result looks WAY better in the gap than the bivariate BMM did alone! Now let's directly compare to that case using our comparison function subplot_mix(), also found in the Bivariate() class.
Diagnostics: Mahalanobis distance
The GP shows an obvious improvement in the gap in panel (b), as it follows the true curve quite well. The uncertainty band is also more believable, as it encompasses most of the true curve and does not allow for the possibility of any surprising result (like the dip seen in panel (a), which might be possible for a physical situation but not in a toy case where we know exactly what we are expecting to get).
Let's check our results and see if we can truly trust them. One good way to do this is our diagnostic, the Mahalanobis distance. This is given by
where \(y\) is the vector of true solutions at each point in \(g\) that we are comparing to (our true curve), \(m\) is the GP solution at each point in \(g\), and \(K^{-1}\) is the inverse of the GP covariance matrix, returned by our validate() function above.
Let's try this calculation to see what we get for the Mahalanobis distance. We will use 3 points to calculate it, as our lengthscale is quite long, and we can only put a few points in it without \(K\) becoming a singular matrix. We will also choose 1000 draws from our GP to compare our Mahalanobis distance to a reference distribution.
Expected values for md_g, md_mean, md_sig, and md_cov for pts=3, \(N_{s}=N_{l}=3\), Matern 3/2 kernel, method 2 below:
md_g = [0.29648312 0.52763866 0.76381933]
md_mean = [2.16914526 1.90627322 1.69666482]
md_sig = [0.04170691 0.04040816 0.02373173]
md_cov = [[ 0.00173947 0.00111714 -0.00027924] [ 0.00111714 0.00163282 -0.00048348] [-0.00027924 -0.00048348 0.00056319]]
Expected values for md_gp, md_ref below:
md_gp = 0.2452667195670961
md_ref = [ 2.18957992 2.51578661 0.71175949 1.7594288 2.12702599 ... 2.92643679 1.28506737 0.50875747 2.12148018 4.50742567]
Now that we have all of this information, let's plot the histogram of our GP curves to double check that we are indeed getting the \(\chi^{2}\) distribution shape we expect from our GP draws.
This looks good; the red dot at the low end is the squared MD result from our GP. It is close to 0, indicating that the result we get is not too far from the true curve, and it is within the expectation from the \(\chi^{2}\), so we have a good prediction for our mixed model.
Weights
Now let's look at what the weights of each function look like (the normalized precisions of each model) across the input space, \(g\). The actual weight values are contained within a class variable in the fdagger() function, under the name var_weights.
#pull weights out of fdagger()
vargp1 = obj1.var_weights[0]
vargp2 = obj1.var_weights[1]
vargp3 = obj1.var_weights[2]
#set up figure
fig = plt.figure(figsize=(8,6), dpi=600)
ax = plt.axes()
ax.tick_params(axis='x', labelsize=18)
ax.tick_params(axis='y', labelsize=18)
ax.locator_params(nbins=5)
ax.xaxis.set_minor_locator(AutoMinorLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
ax.set_xlim(0.0,1.0)
ax.set_ylim(0.0,1.0)
ax.set_yticks([0.1, 0.3, 0.5, 0.7, 0.9])
#labels and true model
ax.set_xlabel('g', fontsize=22)
ax.set_ylabel(r'$\hat{w}_{k}$', fontsize=22)
ax.plot(g, vargp1, 'r', linewidth=3, label=r'$v_{s}^{-1}$')
ax.plot(g, vargp2, 'b', linewidth=3, label=r'$v_{l}^{-1}$')
ax.plot(g, vargp3, 'g', linewidth=3, label=r'$v_{GP}^{-1}$')
ax.legend(fontsize=18, loc='upper right')
plt.show()
This is quite an interesting shape; you should see that the GP (the green curve) takes over in the gap in the mixing process, but then drops in favour of the large-\(g\) expansion (blue curve) at the edge of the gap, but then comes back as the dominant model after this for a little bit before mostly surrendering to the large-\(g\) expansion. It appears that the jumps in the weights are at the values of \(g\) that are close to the training points.
FPR Comparison
In case you wish to compare these results to those of Honda in the paper, where he used the Fractional Power of Rational function (FPR) method (see Sec. 2.3 of this paper) to approximate the true model, you can pull some of his results from the FPR() class and overlay them with the mixed model result we just obtained above.
You can see, in the inset plot, the purple dashed line there, which represents the FPR result. You can change the value of \(\alpha\) to any of the other available values (for \(N_{s}, N_{l}=3\), these are 1/2, 1/6, and 1/14) in the fprset() function in the FPR() class, and see how the FPR result changes in comparison to both the true curve and our mixed model result. You see that the FPR result has no uncertainty quantification despite often being rather good at predicting the true curve, making the need for our mixed model clear.
Now let's repeat the process of mixing our models for \(N_{s} = N_{l} = 2\) and \(N_{s} = N_{l} = 5\), as in Fig. 7 in the paper.
*Expected Gaussian parameters for for \(N_{s}=N_{l}=2\), Matern 3/2 kernel, method 2 below:
length_scale = 2.19
variance = 2.44
Expected values for mean2, sig2, cov2 below:
mean2 = [2.55722619 2.55126655 2.54525169 2.53918126 2.53305492...1.57512836 1.57209261 1.56908678 1.56611116 1.56316604]
sig2 = [0.01876248 0.01712592 0.01552338 0.01395631 0.01242632 ... 0.0099753 0.00865197 0.00746104 0.00652541 0.00602179]
cov2 = [[3.52030736e-04 3.20925503e-04 2.89789816e-04 ... 3.72260454e-07 2.40822250e-07 1.10005451e-07] [3.20925503e-04 2.93297249e-04 2.65481431e-04 ... 3.46610444e-07 2.24230144e-07 1.02428391e-07] ... [2.40822250e-07 2.24230144e-07 2.07502354e-07 ... 4.73465254e-05 4.25809124e-05 3.75668099e-05] [1.10005451e-07 1.02428391e-07 9.47893683e-08 ... 3.87738358e-05 3.75668099e-05 3.62619107e-05]]
Expected values for mixed_mean2, mixed_intervals2 below:
mixed_mean2 = [2.50662827 2.50643831 2.50586856 2.50491903 2.50358975 ... 1.5751143 1.57208418 1.56908052 1.56610477 1.56315758]
mixed_intervals2 = [[2.50662827 2.50662827] [2.50643825 2.50643836] [2.50586769 2.50586943]...[1.56230621 1.57585484] [1.56006899 1.57214055] [1.55753961 1.56877554]]
#new object for N_s = N_l = 2 and same steps as before
ns = 2
nl = 2
obj2 = GP(g, ns, nl, ci=68, kernel="Matern", nu=1.5, error_model='informative')
obj_tr2 = obj2.training(plot=False)
mean2, sig2, cov2 = obj2.validate(plot=True)
mixed_mean2, mixed_intervals2 = obj2.plot_mix(g, plot_fdagger=True, plot_true=True, GP_mean=mean2, GP_var=np.square(sig2))
Expected Gaussian parameters for \(N_{s}=N_{l}=5\), Matern 3/2 kernel, method 2 below:
length_scale = 0.889
variance = 1.43
Expected values for mean3, sig3, cov3 below:
mean3 = [2.51973149 2.51700401 2.51418648 2.51127741 2.50827525...1.55654875 1.55573191 1.55497964 1.55429276 1.55367207]
sig3 = [0.03393844 0.03096305 0.02803949 0.02517217 0.02236587... 0.0198764 0.01557175 0.01120256 0.00676925 0.00227229]
cov3 = [[ 1.15181746e-03 1.04912112e-03 9.45330624e-04 ... -1.83742307e-06 -1.09250665e-06 -3.60753668e-07] [ 1.04912112e-03 9.58710327e-04 8.66592363e-04 ... -1.71804751e-06 -1.02152758e-06 -3.37315859e-07] ... [-1.09250665e-06 -1.02152758e-06 -9.49045906e-07 ... 7.55800038e-05 4.58227379e-05 1.52962535e-05] [-3.60753668e-07 -3.37315859e-07 -3.13381883e-07 ... 2.51179533e-05 1.52962535e-05 5.16330684e-06]]
Expected values for mixed_mean3, mixed_intervals3 below:
mixed_mean3 = [2.50662827 2.50643839 2.5058699 2.50492583 2.5036112 ... 1.56617842 1.56330951 1.56045572 1.55761692 1.55479279]
mixed_intervals3 = [[2.50662827 2.50662827] [2.50643839 2.50643839] [2.5058699 2.50586991]...[1.56040869 1.56050275] [1.55757141 1.55766243] [1.55474875 1.55483682]]
#new object for N_s = N_l = 5 and same steps as before
ns = 5
nl = 5
obj3 = GP(g, ns, nl, ci=68, kernel="Matern", nu=1.5, error_model='informative')
obj_tr3 = obj3.training(plot=False)
mean3, sig3, cov3 = obj3.validate(plot=True)
mixed_mean3, mixed_intervals3 = obj3.plot_mix(g, plot_fdagger=True, plot_true=True, GP_mean=mean3, GP_var=np.square(sig3))
These results also look pretty good! We can say that this method is by far the best for our toy model, and we cannot wait to apply this to real nuclear physics applications! Stay tuned for the next release, where you will be able to input your own functions and test them in this sandbox!
Written by: Alexandra Semposki (07 June 2022)