Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides

Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical...

Full description

Bibliographic Details
Main Authors: Jensen, Kyle, Styczynski, Mark, Stephanopoulos, Gregory
Format: Article
Language:English
Published: 2005
Subjects:
Online Access:http://hdl.handle.net/1721.1/30388
_version_ 1826212579670753280
author Jensen, Kyle
Styczynski, Mark
Stephanopoulos, Gregory
author_facet Jensen, Kyle
Styczynski, Mark
Stephanopoulos, Gregory
author_sort Jensen, Kyle
collection MIT
description Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important.
first_indexed 2024-09-23T15:26:00Z
format Article
id mit-1721.1/30388
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T15:26:00Z
publishDate 2005
record_format dspace
spelling mit-1721.1/303882019-04-12T08:25:08Z Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides Jensen, Kyle Styczynski, Mark Stephanopoulos, Gregory Machine learning peptides modeling physio-chemical properties Peptide and protein sequences are most commonly represented as a strings: a series of letters selected from the twenty character alphabet of abbreviations for the naturally occurring amino acids. Here, we experiment with representations of small peptide sequences that incorporate more physiochemical information. Specifically, we develop three different physiochemical representations for a set of roughly 700 HIV–I protease substrates. These different representations are used as input to an array of six different machine learning models which are used to predict whether or not a given peptide is likely to be an acceptable substrate for the protease. Our results show that, in general, higher–dimensional physiochemical representations tend to have better performance than representations incorporating fewer dimensions selected on the basis of high information content. We contend that such representations are more biologically relevant than simple string–based representations and are likely to more accurately capture peptide characteristics that are functionally important. Singapore-MIT Alliance (SMA) 2005-12-16T14:52:55Z 2005-12-16T14:52:55Z 2006-01 Article http://hdl.handle.net/1721.1/30388 en Molecular Engineering of Biological and Chemical Systems (MEBCS) 331891 bytes application/pdf application/pdf
spellingShingle Machine learning
peptides
modeling
physio-chemical properties
Jensen, Kyle
Styczynski, Mark
Stephanopoulos, Gregory
Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides
title Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides
title_full Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides
title_fullStr Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides
title_full_unstemmed Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides
title_short Machine Learning Approaches to Modeling the Physiochemical Properties of Small Peptides
title_sort machine learning approaches to modeling the physiochemical properties of small peptides
topic Machine learning
peptides
modeling
physio-chemical properties
url http://hdl.handle.net/1721.1/30388
work_keys_str_mv AT jensenkyle machinelearningapproachestomodelingthephysiochemicalpropertiesofsmallpeptides
AT styczynskimark machinelearningapproachestomodelingthephysiochemicalpropertiesofsmallpeptides
AT stephanopoulosgregory machinelearningapproachestomodelingthephysiochemicalpropertiesofsmallpeptides