Faster and more accurate pathogenic combination predictions with VarCoPP2.0

Abstract Background The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses an...

Full description

Bibliographic Details
Main Authors: Nassim Versbraegen, Barbara Gravel, Charlotte Nachtegael, Alexandre Renaux, Emma Verkinderen, Ann Nowé, Tom Lenaerts, Sofia Papadimitriou
Format: Article
Language:English
Published: BMC 2023-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05291-3
_version_ 1797831890627985408
author Nassim Versbraegen
Barbara Gravel
Charlotte Nachtegael
Alexandre Renaux
Emma Verkinderen
Ann Nowé
Tom Lenaerts
Sofia Papadimitriou
author_facet Nassim Versbraegen
Barbara Gravel
Charlotte Nachtegael
Alexandre Renaux
Emma Verkinderen
Ann Nowé
Tom Lenaerts
Sofia Papadimitriou
author_sort Nassim Versbraegen
collection DOAJ
description Abstract Background The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. Results We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. Conclusions Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.
first_indexed 2024-04-09T13:58:58Z
format Article
id doaj.art-7976771cac774d558b85ecec69028ce7
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-09T13:58:58Z
publishDate 2023-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-7976771cac774d558b85ecec69028ce72023-05-07T11:25:45ZengBMCBMC Bioinformatics1471-21052023-05-0124111910.1186/s12859-023-05291-3Faster and more accurate pathogenic combination predictions with VarCoPP2.0Nassim Versbraegen0Barbara Gravel1Charlotte Nachtegael2Alexandre Renaux3Emma Verkinderen4Ann Nowé5Tom Lenaerts6Sofia Papadimitriou7Machine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesInteruniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit BrusselMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesAbstract Background The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. Results We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. Conclusions Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.https://doi.org/10.1186/s12859-023-05291-3Oligogenic diseasesVariant combinationsPathogenicity predictorBalanced random forest
spellingShingle Nassim Versbraegen
Barbara Gravel
Charlotte Nachtegael
Alexandre Renaux
Emma Verkinderen
Ann Nowé
Tom Lenaerts
Sofia Papadimitriou
Faster and more accurate pathogenic combination predictions with VarCoPP2.0
BMC Bioinformatics
Oligogenic diseases
Variant combinations
Pathogenicity predictor
Balanced random forest
title Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_full Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_fullStr Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_full_unstemmed Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_short Faster and more accurate pathogenic combination predictions with VarCoPP2.0
title_sort faster and more accurate pathogenic combination predictions with varcopp2 0
topic Oligogenic diseases
Variant combinations
Pathogenicity predictor
Balanced random forest
url https://doi.org/10.1186/s12859-023-05291-3
work_keys_str_mv AT nassimversbraegen fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT barbaragravel fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT charlottenachtegael fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT alexandrerenaux fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT emmaverkinderen fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT annnowe fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT tomlenaerts fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20
AT sofiapapadimitriou fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20