Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
Rationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2019-04-01
|
Series: | Frontiers in Molecular Biosciences |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fmolb.2019.00024/full |
_version_ | 1818700260448403456 |
---|---|
author | Benjamin A. Helfrecht Piero Gasparotto Federico Giberti Michele Ceriotti |
author_facet | Benjamin A. Helfrecht Piero Gasparotto Federico Giberti Michele Ceriotti |
author_sort | Benjamin A. Helfrecht |
collection | DOAJ |
description | Rationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and mesoscopic structures. Over the past few decades, several automated procedures have been developed to identify these motifs in proteins given the atomic structure. Being based on a very precise understanding of the specific interactions, these heuristic criteria formulate the question in a way that implies the answer, by defining a list of motifs based on those that are known to be naturally occurring. This makes them less likely to identify unexpected phenomena, such as the occurrence of recurrent motifs in disordered segments of proteins, and less suitable to be applied to different polymers whose structure is not driven by hydrogen bonds, or even to polypeptides when appearing in unusual, non-biological conditions. Here we discuss how unsupervised machine learning schemes can be used to recognize patterns based exclusively on the frequency with which different motifs occur, taking high-resolution structures from the Protein Data Bank as benchmarks. We first discuss the application of a density-based motif recognition scheme in combination with traditional representations of protein structure (namely, interatomic distances and backbone dihedrals). Then, we proceed one step further toward an entirely unbiased scheme by using as input a structural representation based on the atomic density and by employing supervised classification to objectively assess the role played by the representation in determining the nature of atomic-scale patterns. |
first_indexed | 2024-12-17T15:02:07Z |
format | Article |
id | doaj.art-068268295d1549789cf8460d086c9b96 |
institution | Directory Open Access Journal |
issn | 2296-889X |
language | English |
last_indexed | 2024-12-17T15:02:07Z |
publishDate | 2019-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Molecular Biosciences |
spelling | doaj.art-068268295d1549789cf8460d086c9b962022-12-21T21:43:51ZengFrontiers Media S.A.Frontiers in Molecular Biosciences2296-889X2019-04-01610.3389/fmolb.2019.00024440764Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data BankBenjamin A. HelfrechtPiero GasparottoFederico GibertiMichele CeriottiRationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and mesoscopic structures. Over the past few decades, several automated procedures have been developed to identify these motifs in proteins given the atomic structure. Being based on a very precise understanding of the specific interactions, these heuristic criteria formulate the question in a way that implies the answer, by defining a list of motifs based on those that are known to be naturally occurring. This makes them less likely to identify unexpected phenomena, such as the occurrence of recurrent motifs in disordered segments of proteins, and less suitable to be applied to different polymers whose structure is not driven by hydrogen bonds, or even to polypeptides when appearing in unusual, non-biological conditions. Here we discuss how unsupervised machine learning schemes can be used to recognize patterns based exclusively on the frequency with which different motifs occur, taking high-resolution structures from the Protein Data Bank as benchmarks. We first discuss the application of a density-based motif recognition scheme in combination with traditional representations of protein structure (namely, interatomic distances and backbone dihedrals). Then, we proceed one step further toward an entirely unbiased scheme by using as input a structural representation based on the atomic density and by employing supervised classification to objectively assess the role played by the representation in determining the nature of atomic-scale patterns.https://www.frontiersin.org/article/10.3389/fmolb.2019.00024/fullatomistic and molecular simulationmachine learningbiomoleculesmolecular motifshydrogen bondssecondary structure |
spellingShingle | Benjamin A. Helfrecht Piero Gasparotto Federico Giberti Michele Ceriotti Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank Frontiers in Molecular Biosciences atomistic and molecular simulation machine learning biomolecules molecular motifs hydrogen bonds secondary structure |
title | Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank |
title_full | Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank |
title_fullStr | Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank |
title_full_unstemmed | Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank |
title_short | Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank |
title_sort | atomic motif recognition in bio polymers benchmarks from the protein data bank |
topic | atomistic and molecular simulation machine learning biomolecules molecular motifs hydrogen bonds secondary structure |
url | https://www.frontiersin.org/article/10.3389/fmolb.2019.00024/full |
work_keys_str_mv | AT benjaminahelfrecht atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank AT pierogasparotto atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank AT federicogiberti atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank AT micheleceriotti atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank |