Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.

Machine learning (ML) algorithms can handle complex genomic data and identify predictive patterns that may not be apparent through traditional statistical methods. They become popular tools for medical applications including prediction, diagnosis or treatment of complex diseases like rheumatoid arth...

Full description

Bibliographic Details
Main Authors: Grzegorz Dudek, Sebastian Sakowski, Olga Brzezińska, Joanna Sarnik, Tomasz Budlewski, Grzegorz Dragan, Marta Poplawska, Tomasz Poplawski, Michał Bijak, Joanna Makowska
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300717&type=printable
_version_ 1797242910061625344
author Grzegorz Dudek
Sebastian Sakowski
Olga Brzezińska
Joanna Sarnik
Tomasz Budlewski
Grzegorz Dragan
Marta Poplawska
Tomasz Poplawski
Michał Bijak
Joanna Makowska
author_facet Grzegorz Dudek
Sebastian Sakowski
Olga Brzezińska
Joanna Sarnik
Tomasz Budlewski
Grzegorz Dragan
Marta Poplawska
Tomasz Poplawski
Michał Bijak
Joanna Makowska
author_sort Grzegorz Dudek
collection DOAJ
description Machine learning (ML) algorithms can handle complex genomic data and identify predictive patterns that may not be apparent through traditional statistical methods. They become popular tools for medical applications including prediction, diagnosis or treatment of complex diseases like rheumatoid arthritis (RA). RA is an autoimmune disease in which genetic factors play a major role. Among the most important genetic factors predisposing to the development of this disease and serving as genetic markers are HLA-DRB and non-HLA genes single nucleotide polymorphisms (SNPs). Another marker of RA is the presence of anticitrullinated peptide antibodies (ACPA) which is correlated with severity of RA. We use genetic data of SNPs in four non-HLA genes (PTPN22, STAT4, TRAF1, CD40 and PADI4) to predict the occurrence of ACPA positive RA in the Polish population. This work is a comprehensive comparative analysis, wherein we assess and juxtapose various ML classifiers. Our evaluation encompasses a range of models, including logistic regression, k-nearest neighbors, naïve Bayes, decision tree, boosted trees, multilayer perceptron, and support vector machines. The top-performing models demonstrated closely matched levels of accuracy, each distinguished by its particular strengths. Among these, we highly recommend the use of a decision tree as the foremost choice, given its exceptional performance and interpretability. The sensitivity and specificity of the ML models is about 70% that are satisfying. In addition, we introduce a novel feature importance estimation method characterized by its transparent interpretability and global optimality. This method allows us to thoroughly explore all conceivable combinations of polymorphisms, enabling us to pinpoint those possessing the highest predictive power. Taken together, these findings suggest that non-HLA SNPs allow to determine the group of individuals more prone to develop RA rheumatoid arthritis and further implement more precise preventive approach.
first_indexed 2024-04-24T18:46:43Z
format Article
id doaj.art-cacca4404d51460ea692419b62c432a3
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-24T18:46:43Z
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-cacca4404d51460ea692419b62c432a32024-03-27T05:32:31ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01193e030071710.1371/journal.pone.0300717Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.Grzegorz DudekSebastian SakowskiOlga BrzezińskaJoanna SarnikTomasz BudlewskiGrzegorz DraganMarta PoplawskaTomasz PoplawskiMichał BijakJoanna MakowskaMachine learning (ML) algorithms can handle complex genomic data and identify predictive patterns that may not be apparent through traditional statistical methods. They become popular tools for medical applications including prediction, diagnosis or treatment of complex diseases like rheumatoid arthritis (RA). RA is an autoimmune disease in which genetic factors play a major role. Among the most important genetic factors predisposing to the development of this disease and serving as genetic markers are HLA-DRB and non-HLA genes single nucleotide polymorphisms (SNPs). Another marker of RA is the presence of anticitrullinated peptide antibodies (ACPA) which is correlated with severity of RA. We use genetic data of SNPs in four non-HLA genes (PTPN22, STAT4, TRAF1, CD40 and PADI4) to predict the occurrence of ACPA positive RA in the Polish population. This work is a comprehensive comparative analysis, wherein we assess and juxtapose various ML classifiers. Our evaluation encompasses a range of models, including logistic regression, k-nearest neighbors, naïve Bayes, decision tree, boosted trees, multilayer perceptron, and support vector machines. The top-performing models demonstrated closely matched levels of accuracy, each distinguished by its particular strengths. Among these, we highly recommend the use of a decision tree as the foremost choice, given its exceptional performance and interpretability. The sensitivity and specificity of the ML models is about 70% that are satisfying. In addition, we introduce a novel feature importance estimation method characterized by its transparent interpretability and global optimality. This method allows us to thoroughly explore all conceivable combinations of polymorphisms, enabling us to pinpoint those possessing the highest predictive power. Taken together, these findings suggest that non-HLA SNPs allow to determine the group of individuals more prone to develop RA rheumatoid arthritis and further implement more precise preventive approach.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300717&type=printable
spellingShingle Grzegorz Dudek
Sebastian Sakowski
Olga Brzezińska
Joanna Sarnik
Tomasz Budlewski
Grzegorz Dragan
Marta Poplawska
Tomasz Poplawski
Michał Bijak
Joanna Makowska
Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.
PLoS ONE
title Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.
title_full Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.
title_fullStr Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.
title_full_unstemmed Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.
title_short Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphisms.
title_sort machine learning based prediction of rheumatoid arthritis with development of acpa autoantibodies in the presence of non hla genes polymorphisms
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0300717&type=printable
work_keys_str_mv AT grzegorzdudek machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT sebastiansakowski machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT olgabrzezinska machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT joannasarnik machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT tomaszbudlewski machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT grzegorzdragan machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT martapoplawska machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT tomaszpoplawski machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT michałbijak machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms
AT joannamakowska machinelearningbasedpredictionofrheumatoidarthritiswithdevelopmentofacpaautoantibodiesinthepresenceofnonhlagenespolymorphisms