Faster and more accurate pathogenic combination predictions with VarCoPP2.0
Abstract Background The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses an...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-05-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-023-05291-3 |
_version_ | 1797831890627985408 |
---|---|
author | Nassim Versbraegen Barbara Gravel Charlotte Nachtegael Alexandre Renaux Emma Verkinderen Ann Nowé Tom Lenaerts Sofia Papadimitriou |
author_facet | Nassim Versbraegen Barbara Gravel Charlotte Nachtegael Alexandre Renaux Emma Verkinderen Ann Nowé Tom Lenaerts Sofia Papadimitriou |
author_sort | Nassim Versbraegen |
collection | DOAJ |
description | Abstract Background The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. Results We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. Conclusions Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data. |
first_indexed | 2024-04-09T13:58:58Z |
format | Article |
id | doaj.art-7976771cac774d558b85ecec69028ce7 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-09T13:58:58Z |
publishDate | 2023-05-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-7976771cac774d558b85ecec69028ce72023-05-07T11:25:45ZengBMCBMC Bioinformatics1471-21052023-05-0124111910.1186/s12859-023-05291-3Faster and more accurate pathogenic combination predictions with VarCoPP2.0Nassim Versbraegen0Barbara Gravel1Charlotte Nachtegael2Alexandre Renaux3Emma Verkinderen4Ann Nowé5Tom Lenaerts6Sofia Papadimitriou7Machine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesInteruniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit BrusselMachine Learning Group, Université Libre de BruxellesMachine Learning Group, Université Libre de BruxellesAbstract Background The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. Results We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. Conclusions Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.https://doi.org/10.1186/s12859-023-05291-3Oligogenic diseasesVariant combinationsPathogenicity predictorBalanced random forest |
spellingShingle | Nassim Versbraegen Barbara Gravel Charlotte Nachtegael Alexandre Renaux Emma Verkinderen Ann Nowé Tom Lenaerts Sofia Papadimitriou Faster and more accurate pathogenic combination predictions with VarCoPP2.0 BMC Bioinformatics Oligogenic diseases Variant combinations Pathogenicity predictor Balanced random forest |
title | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_full | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_fullStr | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_full_unstemmed | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_short | Faster and more accurate pathogenic combination predictions with VarCoPP2.0 |
title_sort | faster and more accurate pathogenic combination predictions with varcopp2 0 |
topic | Oligogenic diseases Variant combinations Pathogenicity predictor Balanced random forest |
url | https://doi.org/10.1186/s12859-023-05291-3 |
work_keys_str_mv | AT nassimversbraegen fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT barbaragravel fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT charlottenachtegael fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT alexandrerenaux fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT emmaverkinderen fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT annnowe fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT tomlenaerts fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 AT sofiapapadimitriou fasterandmoreaccuratepathogeniccombinationpredictionswithvarcopp20 |