A database of calculated solution parameters for the AlphaFold predicted protein structures

Abstract Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ pro...

Full description

Bibliographic Details
Main Authors: Emre Brookes, Mattia Rocco
Format: Article
Language:English
Published: Nature Portfolio 2022-05-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-022-10607-z
_version_ 1818251267085959168
author Emre Brookes
Mattia Rocco
author_facet Emre Brookes
Mattia Rocco
author_sort Emre Brookes
collection DOAJ
description Abstract Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients ( $${D_{t(20,w)}^{0}}$$ D t ( 20 , w ) 0 , $${s_{{\left( {{20},w} \right)}}^{{0}} }$$ s 20 , w 0 ) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding $${D_{t(20,w)}^{0}}$$ D t ( 20 , w ) 0 , $${s_{{\left( {{20},w} \right)}}^{{0}} }$$ s 20 , w 0 , [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.
first_indexed 2024-12-12T16:05:34Z
format Article
id doaj.art-ebf850edfcd14f9eb989054c9cc9d90d
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-12-12T16:05:34Z
publishDate 2022-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-ebf850edfcd14f9eb989054c9cc9d90d2022-12-22T00:19:20ZengNature PortfolioScientific Reports2045-23222022-05-0112111310.1038/s41598-022-10607-zA database of calculated solution parameters for the AlphaFold predicted protein structuresEmre Brookes0Mattia Rocco1Department of Chemistry and Biochemistry, The University of MontanaProteomica e Spettrometria di Massa, IRCCS Ospedale Policlinico San MartinoAbstract Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients ( $${D_{t(20,w)}^{0}}$$ D t ( 20 , w ) 0 , $${s_{{\left( {{20},w} \right)}}^{{0}} }$$ s 20 , w 0 ) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding $${D_{t(20,w)}^{0}}$$ D t ( 20 , w ) 0 , $${s_{{\left( {{20},w} \right)}}^{{0}} }$$ s 20 , w 0 , [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.https://doi.org/10.1038/s41598-022-10607-z
spellingShingle Emre Brookes
Mattia Rocco
A database of calculated solution parameters for the AlphaFold predicted protein structures
Scientific Reports
title A database of calculated solution parameters for the AlphaFold predicted protein structures
title_full A database of calculated solution parameters for the AlphaFold predicted protein structures
title_fullStr A database of calculated solution parameters for the AlphaFold predicted protein structures
title_full_unstemmed A database of calculated solution parameters for the AlphaFold predicted protein structures
title_short A database of calculated solution parameters for the AlphaFold predicted protein structures
title_sort database of calculated solution parameters for the alphafold predicted protein structures
url https://doi.org/10.1038/s41598-022-10607-z
work_keys_str_mv AT emrebrookes adatabaseofcalculatedsolutionparametersforthealphafoldpredictedproteinstructures
AT mattiarocco adatabaseofcalculatedsolutionparametersforthealphafoldpredictedproteinstructures
AT emrebrookes databaseofcalculatedsolutionparametersforthealphafoldpredictedproteinstructures
AT mattiarocco databaseofcalculatedsolutionparametersforthealphafoldpredictedproteinstructures