This section is for data scientists.
PolyChord is a novel take on John Skilling’s nested sampling, developed over a two-three year period since inception. It has its own onboard computing engine and is a unique algorithm in its own right – we’ve been calling it “next generation” nested sampling. The tool has moved on quite a lot from the time the original paper was written. For your interest, we include the original paper published by PolyChord CTO, Dr Will Handley, CSO Prof. Michael Hobson and Academic Advisor Prof. Antony Lasenby.
PolyChord: next-generation nested sampling
PolyChord is a novel nested sampling algorithm tailored for high-dimensional parameter spaces. This paper coincides with the release of PolyChord v1.3, and provides an extensive account of the algorithm…
Will Handley, Michael Hobson, Anthony Lasenby
This next paper shows the theory behind using PolyChord to create a new kind of Neural Network which we’ve been calling “Principled machine Learning”, where we treat the fabric of a Neural Network as a complex data landscape which PolyChord can explore in order to make computed choices about weights and architectures within that network.
Bayesian sparse reconstruction: a bruteforce approach to astronomical imaging and machine learning
We present a principled Bayesian framework for signal reconstruction, in which the signal is modelled by basis functions whose number (and form, if required) is determined by the data themselves..
Edward Higson, Will Handley, Michael Hobson, Anthony Lasenby
PolyNet is a neural network optimiser based on the cutting-edge maths tool PolyChord, invented by a leading Cambridge University Astrophysics team, Prof Mike Hobson, Prof Anthony Lasenby and Dr Will Handley in 2017.
1.1 Neural Networks
A neural network is a mathematical model mapping observations onto predictions. The work-flow for neural network design and usage typically takes the form:
- Collect a set of labelled training data.
- Design/choose a network architecture.
- Train the network by determining the ‘best’ set of neuron weights.
- Evaluate the network’s quality, and if necessary return to a refined version of step 2.
PolyNet provides unique advantages for both Steps 3 and 4.
Step 3. is usually performed using a gradient-based optimiser such as TensorFlow. Optimising a neural network may be treated as a high-dimensional model fitting problem, which PolyNet approaches in a Bayesian fashion, using PolyChord as it’s engine. Step 4 is typically executed by using cross-validation techniques. PolyNet allows one to use Bayesian evidences to assess the quality of the network.
The underlying engine PolyChord is a “hands off” Bayesian optimizer and evidence calculator, representing the cutting-edge of nested sampling technology. PolyChord is widely used in astrophysics and cosmology, providing more accurate and reliable answers in comparison with all existing tools.
PolyChord is uniquely specialised for navigating high-dimensional multi-modal posteriors with complicated shapes and degeneracies, which typically cause industry-standard optimisers to founder. Moreover, PolyChord’s evidence calculation explicitly quantifies how good a model is relative to other architectures.
For academic papers detailing the original algorithm (PolyChord Lite) please see:
- PolyChord: nested sampling for cosmology – arXiv:1502.01856, MNRAS 450(1) L61-L65,
- PolyChord: next-generation nested sampling – arXiv:1506.00171, MNRAS 453(4) 4384-4398
Applying PolyChord Lite to neural network training:
- Bayesian sparse reconstruction: a brute-force approach to astronomical imaging and machine learning – arXiv:1809.04598
PolyChord has been used in a host of analyses across astronomy and particle physics.
1.3 The PolyNet approach
Fundamentally, PolyNet is a Bayesian neural network sampler and evidence calculator. The advantages it presents over existing technology are two-fold: improved sampling techniques and evidence calculation.
1.3.1 Nested sampling for training Bayesian neural networks
Nested sampling has a unique ability to navigate a-priori unknown curving degeneracies and multimodality in posterior distributions. These are precisely the kind of challenges found in Neural network training.
PolyChord represents the state of the art in high-dimensional nested sampling, putting it head-and-shoulders above existing approaches in its ability to train neural networks. We expect PolyNet to outperform existing approaches both in speed and quality of training though speed is not our main focus due to our specialisation in the field of ‘offline training’: models are trained offsite once-and-for-all and then port trained models used for their implementation.
1.3.2 Evidence calculation for assessing network architecture
PolyChord is capable of computing Bayesian evidences in high dimensions. This is essential for evaluating the Bayesian quality of a given neural network architecture in fitting the data. PolyChord computes the Bayesian evidence as a matter of course when fitting models (technically model fits are a by-product of its evidence computation). One can therefore use PolyChord’s outputs to navigate network architectures. Bayesian evidences may be used to construct a full likelihood loop that fits models, and then favourably adjust the network architecture. The final product is a two-step Markov Chain Monte Carlo (MCMC) algorithm. PolyChord is used to MCMC fit a given network, and then a step in ‘the network space’ is made, prioritising steps toward better networks. The final product is a weighted sum of Bayesian fits to the network, as well as greater insight into what network architectures are preferred.
1.4 Unique Advantages in Addressing Neural Net Training
The unique point of difference in PolyChord is that we have simultaneously both a better fitting algorithm and a genuine evidence calculation. Current attempts at evidence calculation in this field are based on brutal approximations, which has been the primary barrier for anything becoming a more standard tool. Our main competitor here is Google’s neural net optimiser TensorFlow that quickly evaluates gradients and perform variations on gradient descent such as stochastic gradient descent, momentum update, and adaptive learning rate. Such methods are optimised for speed of training rather than quality, and are beset by issues such as poor regularisation (an inability to determine the quality of fit) and false-minima (optimising to a locally good models, but missing the global best one). By looking at the whole space, PolyNet delivers greater efficiency and accuracy.