Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank

Rationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and...

Full description

Bibliographic Details
Main Authors: Benjamin A. Helfrecht, Piero Gasparotto, Federico Giberti, Michele Ceriotti
Format: Article
Language:English
Published: Frontiers Media S.A. 2019-04-01
Series:Frontiers in Molecular Biosciences
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fmolb.2019.00024/full
_version_ 1818700260448403456
author Benjamin A. Helfrecht
Piero Gasparotto
Federico Giberti
Michele Ceriotti
author_facet Benjamin A. Helfrecht
Piero Gasparotto
Federico Giberti
Michele Ceriotti
author_sort Benjamin A. Helfrecht
collection DOAJ
description Rationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and mesoscopic structures. Over the past few decades, several automated procedures have been developed to identify these motifs in proteins given the atomic structure. Being based on a very precise understanding of the specific interactions, these heuristic criteria formulate the question in a way that implies the answer, by defining a list of motifs based on those that are known to be naturally occurring. This makes them less likely to identify unexpected phenomena, such as the occurrence of recurrent motifs in disordered segments of proteins, and less suitable to be applied to different polymers whose structure is not driven by hydrogen bonds, or even to polypeptides when appearing in unusual, non-biological conditions. Here we discuss how unsupervised machine learning schemes can be used to recognize patterns based exclusively on the frequency with which different motifs occur, taking high-resolution structures from the Protein Data Bank as benchmarks. We first discuss the application of a density-based motif recognition scheme in combination with traditional representations of protein structure (namely, interatomic distances and backbone dihedrals). Then, we proceed one step further toward an entirely unbiased scheme by using as input a structural representation based on the atomic density and by employing supervised classification to objectively assess the role played by the representation in determining the nature of atomic-scale patterns.
first_indexed 2024-12-17T15:02:07Z
format Article
id doaj.art-068268295d1549789cf8460d086c9b96
institution Directory Open Access Journal
issn 2296-889X
language English
last_indexed 2024-12-17T15:02:07Z
publishDate 2019-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Molecular Biosciences
spelling doaj.art-068268295d1549789cf8460d086c9b962022-12-21T21:43:51ZengFrontiers Media S.A.Frontiers in Molecular Biosciences2296-889X2019-04-01610.3389/fmolb.2019.00024440764Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data BankBenjamin A. HelfrechtPiero GasparottoFederico GibertiMichele CeriottiRationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and mesoscopic structures. Over the past few decades, several automated procedures have been developed to identify these motifs in proteins given the atomic structure. Being based on a very precise understanding of the specific interactions, these heuristic criteria formulate the question in a way that implies the answer, by defining a list of motifs based on those that are known to be naturally occurring. This makes them less likely to identify unexpected phenomena, such as the occurrence of recurrent motifs in disordered segments of proteins, and less suitable to be applied to different polymers whose structure is not driven by hydrogen bonds, or even to polypeptides when appearing in unusual, non-biological conditions. Here we discuss how unsupervised machine learning schemes can be used to recognize patterns based exclusively on the frequency with which different motifs occur, taking high-resolution structures from the Protein Data Bank as benchmarks. We first discuss the application of a density-based motif recognition scheme in combination with traditional representations of protein structure (namely, interatomic distances and backbone dihedrals). Then, we proceed one step further toward an entirely unbiased scheme by using as input a structural representation based on the atomic density and by employing supervised classification to objectively assess the role played by the representation in determining the nature of atomic-scale patterns.https://www.frontiersin.org/article/10.3389/fmolb.2019.00024/fullatomistic and molecular simulationmachine learningbiomoleculesmolecular motifshydrogen bondssecondary structure
spellingShingle Benjamin A. Helfrecht
Piero Gasparotto
Federico Giberti
Michele Ceriotti
Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
Frontiers in Molecular Biosciences
atomistic and molecular simulation
machine learning
biomolecules
molecular motifs
hydrogen bonds
secondary structure
title Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
title_full Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
title_fullStr Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
title_full_unstemmed Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
title_short Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank
title_sort atomic motif recognition in bio polymers benchmarks from the protein data bank
topic atomistic and molecular simulation
machine learning
biomolecules
molecular motifs
hydrogen bonds
secondary structure
url https://www.frontiersin.org/article/10.3389/fmolb.2019.00024/full
work_keys_str_mv AT benjaminahelfrecht atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank
AT pierogasparotto atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank
AT federicogiberti atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank
AT micheleceriotti atomicmotifrecognitioninbiopolymersbenchmarksfromtheproteindatabank