PolyFold – using PolyChord’s ability to fully explore complex data landscapes to better understand the way a protein folds

During the course of our work on developing PolyChord, from its origins in the Cambridge University Astrophysics community, we are making it drive a protein folding tool .

The way in which proteins fold and unfold is a  data problem in a high dimensional space.

We have recently started PolyFold, a protein folding initiative driven by the core technology PolyChord. Progress has been rapid and is ongoing.
Why is Polychord so extraordinarily effective in this area? Let’s look at one of the current leaders in the field, AlphaFold (both 1 and 2), who have been leading the industry standard CASP competition by some way in the last 2 years.

Essentially, AlphaFold has limitations. AlphaFold 1 uses neural networks to learn/learnt patterns in large protein databases to predict probability distributions of the backbone-torsion-angle and residue-residue distances for a given sequence They use these to construct an energy potential by combining them with a steric-clash prevention potential. This combined potential is a function of the protein geometry that must be minimised to gain understanding about how a protein may fold. AlphaFold 1 uses stochastic gradient descent to optimise their potential. This gradient descent has
to be performed many times to give them the best chance of finding the global minimum (and so the native folded state). This is computationally greedy and gives you very little understanding of mis-folded states of the protein, how accurate you are on your answer or the overall shape of the energy landscape. PolyFold on the other hand, offers a completely different, more direct approach.

In AlphaFold 2, a different approach has been taken and results have not yet been fully published. From publicly available material, this new iteration of AlphaFold relies upon learning from historical data from X-ray Crystallography and this has some limitations. PolyFold’s more thorough exploration provides more useful insights into:

  1. The sites of mis-foldings, useful in discovering more information on the development of diseases such as Alzheimer’s
  2. The vibrational frequency profiles within the protein, which could be a vital aid in drug discovery.

As our work in protein folding carries on, we will further develop our work on mis-foldings and vibrational frequency profiles, and will employ additional advanced mathematics into better understanding the shapes of proteins.

We are now starting to actively engage with drug discovery companies, and also companies interested in better understanding and exploring protein-protein interactions, and organisations who are interested in protein folding relating to enzyme and catalyst development.

If you are advanced in these areas yet could benefit from better tools which can handle fuller exploration of complex data landscapes, it would be good to hear from you.

We will be updating this page as further progress is made.

Please tell us some details about the nature of your enquiry