These instructions will allow you to run this project on your local machine.
Once you have a virtual environment in Python, you can simply install necessary packages with: pip install -r requirements.txt
git clone https://github.com/edwisdom/bnn-hmc
Run the Bayesian neural net prior with:
python bnn_prior.py
Run the Bayesian neural net posterior (and Hamiltonian Monte Carlo) with:
python bnn_posterior.py
The samples from the prior, as Figure 1 shows, clearly depend heavily on the choice of activation functions. Whereas the relu prior samples are essentially two piecewise lines, the tanh prior samples look like sigmoid curves (which reflects the underlying activation functions).
Although these activation functions are simple, even with just one hidden layer of 2 units, different priors have a large range at each possible input value (except the relu activation at x=0, predictably so). This suggests that our priors are fairly flexible, and can fit a lot of different training data. Another clear trend is that with more hidden layers, individual prior samples become more complex with more local extrema.
Figure 5: Potential energies over Hamiltonian Monte Carlo iterations for 3 different chains
As we see in Figure 5, each of the chains of Hamiltonian Monte Carlo rapidly converge to low potential energy values, which means that the samples we're getting are from high probability-mass regions of the posterior. This is the primary reason why Hamiltonian Monte Carlo is preferred over Metropolis-style (MCMC) methods, since the latter is unlikely to explore a wide space while still remaining in high-probability regions.
To see an interactive demo of this principle, see here.
Figure 6: 10 samples from 3 different BNN posteriors sampled using HMC with epsilon=0.001 and L=25
As we can see in Figure 6, each of the posterior samples wraps tightly around the training data. However, the predictions that are significantly further away from any training data are much more variable.
Figure 7: 500 posterior function samples from a single BNN posterior trained with epsilon=0.001 and L=25
Figure 7 shows how our posterior has much greater uncertainty at input points that are far away from its training data. Its estimates of these values are largely dominated by the prior. This kind of model gives us an edge over traditional point-estimate neural networks because they give a distribution over our parameters and allow us to quantify our certainty about predictions. These models have the potential to be both more interpretable and more capable of detecting adversarial perturbations.
In the future, I would like to explore the following:
- Tuning hyperparameters epsilon and L more exhaustively and systematically
- Applying this model to real-world data and comparing it to neural networks that take similar time to train
- Implementing the NUTS (No U-Turn Sampler), which is currently the best known Monte Carlo sampling technique for Bayesian neural nets
A huge thanks to Prof. Michael Hughes, who supervised this work, and Daniel Dinjian and Julie Jiang for thinking through the technical nitty-gritty with me.














