FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction

Abstract Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new...

Full description

Bibliographic Details
Main Authors: Matsvei Tsishyn, Gabriel Cia, Pauline Hermans, Jean Kwasigroch, Marianne Rooman, Fabrizio Pucci
Format: Article
Language:English
Published: BMC 2024-04-01
Series:Human Genomics
Subjects:
Online Access:https://doi.org/10.1186/s40246-024-00605-9
_version_ 1797199326184734720
author Matsvei Tsishyn
Gabriel Cia
Pauline Hermans
Jean Kwasigroch
Marianne Rooman
Fabrizio Pucci
author_facet Matsvei Tsishyn
Gabriel Cia
Pauline Hermans
Jean Kwasigroch
Marianne Rooman
Fabrizio Pucci
author_sort Matsvei Tsishyn
collection DOAJ
description Abstract Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC’s robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC’s qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.
first_indexed 2024-04-24T07:13:58Z
format Article
id doaj.art-4857b80b30d148b5b93c8d5dcbd7d1b8
institution Directory Open Access Journal
issn 1479-7364
language English
last_indexed 2024-04-24T07:13:58Z
publishDate 2024-04-01
publisher BMC
record_format Article
series Human Genomics
spelling doaj.art-4857b80b30d148b5b93c8d5dcbd7d1b82024-04-21T11:24:49ZengBMCHuman Genomics1479-73642024-04-0118111010.1186/s40246-024-00605-9FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness predictionMatsvei Tsishyn0Gabriel Cia1Pauline Hermans2Jean Kwasigroch3Marianne Rooman4Fabrizio Pucci5Computational Biology and Bioinformatics, Université Libre de BruxellesComputational Biology and Bioinformatics, Université Libre de BruxellesComputational Biology and Bioinformatics, Université Libre de BruxellesComputational Biology and Bioinformatics, Université Libre de BruxellesComputational Biology and Bioinformatics, Université Libre de BruxellesComputational Biology and Bioinformatics, Université Libre de BruxellesAbstract Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC’s robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC’s qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.https://doi.org/10.1186/s40246-024-00605-9Protein variants interpretationFitnessCAGI6Pathogenicity
spellingShingle Matsvei Tsishyn
Gabriel Cia
Pauline Hermans
Jean Kwasigroch
Marianne Rooman
Fabrizio Pucci
FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
Human Genomics
Protein variants interpretation
Fitness
CAGI6
Pathogenicity
title FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
title_full FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
title_fullStr FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
title_full_unstemmed FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
title_short FiTMuSiC: leveraging structural and (co)evolutionary data for protein fitness prediction
title_sort fitmusic leveraging structural and co evolutionary data for protein fitness prediction
topic Protein variants interpretation
Fitness
CAGI6
Pathogenicity
url https://doi.org/10.1186/s40246-024-00605-9
work_keys_str_mv AT matsveitsishyn fitmusicleveragingstructuralandcoevolutionarydataforproteinfitnessprediction
AT gabrielcia fitmusicleveragingstructuralandcoevolutionarydataforproteinfitnessprediction
AT paulinehermans fitmusicleveragingstructuralandcoevolutionarydataforproteinfitnessprediction
AT jeankwasigroch fitmusicleveragingstructuralandcoevolutionarydataforproteinfitnessprediction
AT mariannerooman fitmusicleveragingstructuralandcoevolutionarydataforproteinfitnessprediction
AT fabriziopucci fitmusicleveragingstructuralandcoevolutionarydataforproteinfitnessprediction