Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment
Abstract Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a method called SPOT-1D-LM combines traditional on...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2022-05-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-022-11684-w |
_version_ | 1811253194068590592 |
---|---|
author | Jaspreet Singh Kuldip Paliwal Thomas Litfin Jaswinder Singh Yaoqi Zhou |
author_facet | Jaspreet Singh Kuldip Paliwal Thomas Litfin Jaswinder Singh Yaoqi Zhou |
author_sort | Jaspreet Singh |
collection | DOAJ |
description | Abstract Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a method called SPOT-1D-LM combines traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) for the input and yields a leap in accuracy over single-sequence-based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers for all six test sets (TEST2018, TEST2020, Neff1-2020, CASP12-FM, CASP13-FM and CASP14-FM). More significantly, it has a performance comparable to profile-based methods for those proteins with homologous sequences. For example, the accuracy for three-state secondary structure (SS3) prediction for TEST2018 and TEST2020 proteins are 86.7% and 79.8% by SPOT-1D-LM, compared to 74.3% and 73.4% by the single-sequence-based method SPOT-1D-Single and 86.2% and 80.5% by the profile-based method SPOT-1D, respectively. For proteins without homologous sequences (Neff1-2020) SS3 is 80.41% by SPOT-1D-LM which is 3.8% and 8.3% higher than SPOT-1D-Single and SPOT-1D, respectively. SPOT-1D-LM is expected to be useful for genome-wide analysis given its fast performance. Moreover, high-accuracy prediction of both secondary and tertiary structural properties such as backbone angles and solvent accessibility without sequence alignment suggests that highly accurate prediction of protein structures may be made without homologous sequences, the remaining obstacle in the post AlphaFold2 era. |
first_indexed | 2024-04-12T16:46:17Z |
format | Article |
id | doaj.art-a1de07e4ee8b45c1b1d1fb4f88b98bf4 |
institution | Directory Open Access Journal |
issn | 2045-2322 |
language | English |
last_indexed | 2024-04-12T16:46:17Z |
publishDate | 2022-05-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj.art-a1de07e4ee8b45c1b1d1fb4f88b98bf42022-12-22T03:24:33ZengNature PortfolioScientific Reports2045-23222022-05-011211910.1038/s41598-022-11684-wReaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignmentJaspreet Singh0Kuldip Paliwal1Thomas Litfin2Jaswinder Singh3Yaoqi Zhou4Signal Processing Laboratory, School of Engineering and Built Environment, Griffith UniversitySignal Processing Laboratory, School of Engineering and Built Environment, Griffith UniversitySignal Processing Laboratory, School of Engineering and Built Environment, Griffith UniversitySignal Processing Laboratory, School of Engineering and Built Environment, Griffith UniversityInstitute for Glycomics, Griffith UniversityAbstract Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a method called SPOT-1D-LM combines traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) for the input and yields a leap in accuracy over single-sequence-based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers for all six test sets (TEST2018, TEST2020, Neff1-2020, CASP12-FM, CASP13-FM and CASP14-FM). More significantly, it has a performance comparable to profile-based methods for those proteins with homologous sequences. For example, the accuracy for three-state secondary structure (SS3) prediction for TEST2018 and TEST2020 proteins are 86.7% and 79.8% by SPOT-1D-LM, compared to 74.3% and 73.4% by the single-sequence-based method SPOT-1D-Single and 86.2% and 80.5% by the profile-based method SPOT-1D, respectively. For proteins without homologous sequences (Neff1-2020) SS3 is 80.41% by SPOT-1D-LM which is 3.8% and 8.3% higher than SPOT-1D-Single and SPOT-1D, respectively. SPOT-1D-LM is expected to be useful for genome-wide analysis given its fast performance. Moreover, high-accuracy prediction of both secondary and tertiary structural properties such as backbone angles and solvent accessibility without sequence alignment suggests that highly accurate prediction of protein structures may be made without homologous sequences, the remaining obstacle in the post AlphaFold2 era.https://doi.org/10.1038/s41598-022-11684-w |
spellingShingle | Jaspreet Singh Kuldip Paliwal Thomas Litfin Jaswinder Singh Yaoqi Zhou Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment Scientific Reports |
title | Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment |
title_full | Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment |
title_fullStr | Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment |
title_full_unstemmed | Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment |
title_short | Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment |
title_sort | reaching alignment profile based accuracy in predicting protein secondary and tertiary structural properties without alignment |
url | https://doi.org/10.1038/s41598-022-11684-w |
work_keys_str_mv | AT jaspreetsingh reachingalignmentprofilebasedaccuracyinpredictingproteinsecondaryandtertiarystructuralpropertieswithoutalignment AT kuldippaliwal reachingalignmentprofilebasedaccuracyinpredictingproteinsecondaryandtertiarystructuralpropertieswithoutalignment AT thomaslitfin reachingalignmentprofilebasedaccuracyinpredictingproteinsecondaryandtertiarystructuralpropertieswithoutalignment AT jaswindersingh reachingalignmentprofilebasedaccuracyinpredictingproteinsecondaryandtertiarystructuralpropertieswithoutalignment AT yaoqizhou reachingalignmentprofilebasedaccuracyinpredictingproteinsecondaryandtertiarystructuralpropertieswithoutalignment |