Exploiting whole-PDB analysis in novel bioinformatics applications

<p>The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximatel...

Full description

Bibliographic Details
Main Author:	Ramraj, V
Other Authors:	Esnouf, R
Format:	Thesis
Language:	English
Published:	2014
Subjects:	Membrane proteins Mathematical genetics and bioinformatics (statistics) Enzymes Bioinformatics (biochemistry) Chemistry & allied sciences Applications and algorithms Bioinformatics (technology) Program development and tools Polymers Amino acid and peptide chemistry Mass spectrometry Computational biochemistry Genetics (life sciences) Biomedical engineering Structural genomics Computationally-intensive statistics Computing Bioinformatics (life sciences) Computer science (mathematics) Theory and automated verification Protein chemistry Biology (medical sciences) Medical sciences Crystallography Scalable systems Medical Sciences Life Sciences NMR spectroscopy Software engineering Genetics (medical sciences) Protein folding Physical Sciences

_version_	1797074137378717696
author	Ramraj, V
author2	Esnouf, R
author_facet	Esnouf, R Ramraj, V
author_sort	Ramraj, V
collection	OXFORD
description	<p>The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximately 97,000 structures. This represents an expanding wealth of high-quality information but there seem to be few bioinformatics tools that consider and analyse these data as an ensemble. This thesis explores the development of three efficient, fast algorithms and software implementations to study protein structure using the entire PDB.</p> <p>The first project is a crystal-form matching tool that takes a unit cell and quickly (< 1 second) retrieves the most related matches from the PDB. The unit cell matches are combined with sequence alignments using a novel Family Clustering Algorithm to display the results in a user-friendly way. The software tool, Nearest-cell, has been incorporated into the X-ray data collection pipeline at the Diamond Light Source, and is also available as a public web service.</p> <p>The bulk of the thesis is devoted to the study and prediction of protein disorder. Initially, trying to update and extend an existing predictor, RONN, the limitations of the method were exposed and a novel predictor (called <em>MoreRONN</em>) was developed that incorporates a novel sequence-based clustering approach to disorder data inferred from the PDB and DisProt. <em>MoreRONN</em> is now clearly the best-in-class disorder predictor and will soon be offered as a public web service.</p> <p>The third project explores the development of a clustering algorithm for protein structural fragments that can work on the scale of the whole PDB. While protein structures have long been clustered into loose families, there has to date been no comprehensive analytical clustering of short (~6 residue) fragments. A novel fragment clustering tool was built that is now leading to a public database of fragment families and representative structural fragments that should prove extremely helpful for both basic understanding and experimentation.</p> <p>Together, these three projects exemplify how cutting-edge computational approaches applied to extensive protein structure libraries can provide user-friendly tools that address critical everyday issues for structural biologists.</p>
first_indexed	2024-03-06T23:32:00Z
format	Thesis
id	oxford-uuid:6c59c813-2a4c-440c-940b-d334c02dd075
institution	University of Oxford
language	English
last_indexed	2024-03-06T23:32:00Z
publishDate	2014
record_format	dspace
spelling	oxford-uuid:6c59c813-2a4c-440c-940b-d334c02dd0752022-03-26T19:10:16ZExploiting whole-PDB analysis in novel bioinformatics applicationsThesishttp://purl.org/coar/resource_type/c_db06uuid:6c59c813-2a4c-440c-940b-d334c02dd075Membrane proteinsMathematical genetics and bioinformatics (statistics)EnzymesBioinformatics (biochemistry)Chemistry & allied sciencesApplications and algorithmsBioinformatics (technology)Program development and toolsPolymers Amino acid and peptide chemistryMass spectrometryComputational biochemistryGenetics (life sciences)Biomedical engineeringStructural genomicsComputationally-intensive statisticsComputingBioinformatics (life sciences)Computer science (mathematics)Theory and automated verificationProtein chemistryBiology (medical sciences)Medical sciencesCrystallographyScalable systemsMedical SciencesLife SciencesNMR spectroscopySoftware engineeringGenetics (medical sciences)Protein foldingPhysical SciencesEnglishOxford University Research Archive - Valet2014Ramraj, VEsnouf, R<p>The Protein Data Bank (PDB) is the definitive electronic repository for experimentally-derived protein structures, composed mainly of those determined by X-ray crystallography. Approximately 200 new structures are added weekly to the PDB, and at the time of writing, it contains approximately 97,000 structures. This represents an expanding wealth of high-quality information but there seem to be few bioinformatics tools that consider and analyse these data as an ensemble. This thesis explores the development of three efficient, fast algorithms and software implementations to study protein structure using the entire PDB.</p> <p>The first project is a crystal-form matching tool that takes a unit cell and quickly (< 1 second) retrieves the most related matches from the PDB. The unit cell matches are combined with sequence alignments using a novel Family Clustering Algorithm to display the results in a user-friendly way. The software tool, Nearest-cell, has been incorporated into the X-ray data collection pipeline at the Diamond Light Source, and is also available as a public web service.</p> <p>The bulk of the thesis is devoted to the study and prediction of protein disorder. Initially, trying to update and extend an existing predictor, RONN, the limitations of the method were exposed and a novel predictor (called <em>MoreRONN</em>) was developed that incorporates a novel sequence-based clustering approach to disorder data inferred from the PDB and DisProt. <em>MoreRONN</em> is now clearly the best-in-class disorder predictor and will soon be offered as a public web service.</p> <p>The third project explores the development of a clustering algorithm for protein structural fragments that can work on the scale of the whole PDB. While protein structures have long been clustered into loose families, there has to date been no comprehensive analytical clustering of short (~6 residue) fragments. A novel fragment clustering tool was built that is now leading to a public database of fragment families and representative structural fragments that should prove extremely helpful for both basic understanding and experimentation.</p> <p>Together, these three projects exemplify how cutting-edge computational approaches applied to extensive protein structure libraries can provide user-friendly tools that address critical everyday issues for structural biologists.</p>
spellingShingle	Membrane proteins Mathematical genetics and bioinformatics (statistics) Enzymes Bioinformatics (biochemistry) Chemistry & allied sciences Applications and algorithms Bioinformatics (technology) Program development and tools Polymers Amino acid and peptide chemistry Mass spectrometry Computational biochemistry Genetics (life sciences) Biomedical engineering Structural genomics Computationally-intensive statistics Computing Bioinformatics (life sciences) Computer science (mathematics) Theory and automated verification Protein chemistry Biology (medical sciences) Medical sciences Crystallography Scalable systems Medical Sciences Life Sciences NMR spectroscopy Software engineering Genetics (medical sciences) Protein folding Physical Sciences Ramraj, V Exploiting whole-PDB analysis in novel bioinformatics applications
title	Exploiting whole-PDB analysis in novel bioinformatics applications
title_full	Exploiting whole-PDB analysis in novel bioinformatics applications
title_fullStr	Exploiting whole-PDB analysis in novel bioinformatics applications
title_full_unstemmed	Exploiting whole-PDB analysis in novel bioinformatics applications
title_short	Exploiting whole-PDB analysis in novel bioinformatics applications
title_sort	exploiting whole pdb analysis in novel bioinformatics applications
topic	Membrane proteins Mathematical genetics and bioinformatics (statistics) Enzymes Bioinformatics (biochemistry) Chemistry & allied sciences Applications and algorithms Bioinformatics (technology) Program development and tools Polymers Amino acid and peptide chemistry Mass spectrometry Computational biochemistry Genetics (life sciences) Biomedical engineering Structural genomics Computationally-intensive statistics Computing Bioinformatics (life sciences) Computer science (mathematics) Theory and automated verification Protein chemistry Biology (medical sciences) Medical sciences Crystallography Scalable systems Medical Sciences Life Sciences NMR spectroscopy Software engineering Genetics (medical sciences) Protein folding Physical Sciences
work_keys_str_mv	AT ramrajv exploitingwholepdbanalysisinnovelbioinformaticsapplications

Exploiting whole-PDB analysis in novel bioinformatics applications

Similar Items