An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions

Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms...

Full description

Bibliographic Details
Main Authors: Hao Cai, Robert M. Vernon, Julie D. Forman-Kay
Format: Article
Language:English
Published: MDPI AG 2022-08-01
Series:Biomolecules
Subjects:
Online Access:https://www.mdpi.com/2218-273X/12/8/1131
_version_ 1797432349911154688
author Hao Cai
Robert M. Vernon
Julie D. Forman-Kay
author_facet Hao Cai
Robert M. Vernon
Julie D. Forman-Kay
author_sort Hao Cai
collection DOAJ
description Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi–pi contacts, and kinked beta-structures to the score, with electrostatics, cation–pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence’s phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.
first_indexed 2024-03-09T10:00:11Z
format Article
id doaj.art-2a824a7e044a4604a225289120d2eca6
institution Directory Open Access Journal
issn 2218-273X
language English
last_indexed 2024-03-09T10:00:11Z
publishDate 2022-08-01
publisher MDPI AG
record_format Article
series Biomolecules
spelling doaj.art-2a824a7e044a4604a225289120d2eca62023-12-01T23:29:18ZengMDPI AGBiomolecules2218-273X2022-08-01128113110.3390/biom12081131An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical InteractionsHao Cai0Robert M. Vernon1Julie D. Forman-Kay2Molecular Medicine Program, Hospital for Sick Children, Toronto, ON M5G 0A4, CanadaMolecular Medicine Program, Hospital for Sick Children, Toronto, ON M5G 0A4, CanadaMolecular Medicine Program, Hospital for Sick Children, Toronto, ON M5G 0A4, CanadaProtein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi–pi contacts, and kinked beta-structures to the score, with electrostatics, cation–pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence’s phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.https://www.mdpi.com/2218-273X/12/8/1131biomolecular condensatesmachine learningpredictorphysical interactionsintrinsically disordered proteinsphase separation
spellingShingle Hao Cai
Robert M. Vernon
Julie D. Forman-Kay
An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
Biomolecules
biomolecular condensates
machine learning
predictor
physical interactions
intrinsically disordered proteins
phase separation
title An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_full An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_fullStr An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_full_unstemmed An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_short An Interpretable Machine-Learning Algorithm to Predict Disordered Protein Phase Separation Based on Biophysical Interactions
title_sort interpretable machine learning algorithm to predict disordered protein phase separation based on biophysical interactions
topic biomolecular condensates
machine learning
predictor
physical interactions
intrinsically disordered proteins
phase separation
url https://www.mdpi.com/2218-273X/12/8/1131
work_keys_str_mv AT haocai aninterpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT robertmvernon aninterpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT juliedformankay aninterpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT haocai interpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT robertmvernon interpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions
AT juliedformankay interpretablemachinelearningalgorithmtopredictdisorderedproteinphaseseparationbasedonbiophysicalinteractions