Summary: | <p>The number of published protein structures is vastly outnumbered by protein sequences - this gap is only increasing as sequencing technologies continue to advance. Experimental methods of protein structure determination are costly and slow; computational protein structure prediction could be a solution.</p> <p>A major problem with current methods of protein structure prediction is the large amount of computational time needed to generate an accurate structure for one protein. For some cases, notably smaller proteins, structures can now be reliably and accurately predicted. However, for a large number of proteins, especially longer proteins, the computational cost means that structure prediction remains infeasible using current methods.</p> <p>SAINT2 is a sequential, fragment-based <em>de novo</em> structure prediction method that has been demonstrated to be faster and comparably successful at predicting correct structures than other methods. The aim of this study is to increase the number of protein families that SAINT2 can predict correct structures for by utilising predicted contacts in a unique way.</p> <p>This thesis finds that in some cases, evaluating the long-range satisfied predicted contacts in a partial model can give an indication as to the final model quality. Discarding partial models predicted to lead to an incorrect final structure leads to overall enrichment of the decoy pool for these proteins. However, it is still unclear how to identify cases that our new version of SAINT2 would be better at predicting.</p> <p>We also find that during extrusion, correct and incorrect models can satisfy and break predicted contacts differently. Correct models may be forming stable sub-structures early on in extrusion, whereas incorrect models appear to be breaking contacts throughout extrusion. It may be possible to identify incorrect predicted contacts based on their rank by satisfaction in models. This may lead to an increase in model quality if contacts predicted to be incorrect are down-weighted in further SAINT2 runs.</p>
|