Designing more efficient compound libraries for crystallographic fragment screening

<p>Fragment-based drug design aims to explore the binding sites of protein targets and provide information towards bespoke, specific, high-affinity drugs. The small size of fragments facilitates an efficient search of chemical space; only a few efficient inter- actions need to be made with the...

Full description

Bibliographic Details
Main Author: Carbery, A
Other Authors: Deane, C
Format: Thesis
Language:English
Published: 2023
Subjects:
Description
Summary:<p>Fragment-based drug design aims to explore the binding sites of protein targets and provide information towards bespoke, specific, high-affinity drugs. The small size of fragments facilitates an efficient search of chemical space; only a few efficient inter- actions need to be made with the target for binding to occur, thus improving the hit rate and reducing the number of fragments that need to be screened in comparison to traditional screening of larger molecules. This approach aims to facilitate a much more efficient structure-based drug design pipeline, decreasing the cost and time taken per drug.</p> <p>Because of the small size of fragments, in established fragment screening campaigns, even with a good hit rate, the binding site is often not fully characterised. This re- duces the number of diverse lead compounds that can be progressed from the initial fragment hits, and makes it more expensive and difficult to develop a viable drug. We hypothesise that this occurs because fragments within screening libraries are tra- ditionally selected for diverse coverage of a broad chemical space, and do not consider functional information such as coverage of potential protein-ligand interactions. This research aims to use retrospective analysis of fragment screening data to develop methods for selecting fragment libraries that will maximise the information obtained from a target’s binding site.</p> <p>We first describe a survey of data availability and quality for XChem fragment screens. The curation procedure for fragment screening data presents many challenges, such as storage of raw crystallographic data and variability within the analysis pipelines. We found that the annotations applied to historic fragment screening data were unre- liable due to the complexities of building and validating ligand models. Additionally, locating complete datasets presented major difficulties. To address these issues, we propose the storage of XChem data in a cloud-based object store, coupled with auto- mated backup and curation of crystallographic electron density maps. This facilitates future automated ligand modelling and validation for the generation of large sets of homogenenous fragment screening data.</p> <p>Next, we use a set of positive (bound structures) and negative (structures for soaked fragments that didn’t show clear evidence for binding) data points extracted from the XChem fragment screening data to explore whether the library used is able to make diverse interactions with new protein targets, as intended. Based on these results, we propose a method for selecting a set of functionally diverse fragments from a larger fragment library. This method calculates the interactions each fragment makes with historic protein targets and ranks the fragment library in order of the number of novel interactions that fragments are able to make. We show that the top-ranked fragments recover information more efficiently from unseen targets compared with traditionally selected fragments.</p> <p>Finally, we consider what tools are required for the design of target-specific fragment libraries for novel targets. As we are unlikely to have experimental knowledge of a novel target’s binding site, we describe a new method for the prediction of ligand binding sites on a protein’s surface. This uses a combination of machine learning models and point cloud clustering. For the selection of target-specific fragments, we then explore possible strategies for pocket similarity. We assess the efficacy of a fragment-based binding site similarity tool, and examine the use of learnt represen- tations of proteins for interaction prediction and residue similarity.</p> <p>In this thesis we examine strategies for the development of more efficient fragment libraries. We use historic fragment screening data to select target-agnostic libraries, and apply recent machine learning methods to explore the potential for target-specific libraries.</p>