Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease

Abstract Background This study aimed to search for blood biomarkers among the profiles of patients with RA-ILD by using machine learning classifiers and probe correlations between the markers and the characteristics of RA-ILD. Methods A total of 153 RA patients were enrolled, including 75 RA-ILD and...

Full description

Bibliographic Details
Main Authors: Yan Qin, Yanlin Wang, Fanxing Meng, Min Feng, Xiangcong Zhao, Chong Gao, Jing Luo
Format: Article
Language:English
Published: BMC 2022-05-01
Series:Arthritis Research & Therapy
Subjects:
Online Access:https://doi.org/10.1186/s13075-022-02800-2
_version_ 1818548719908290560
author Yan Qin
Yanlin Wang
Fanxing Meng
Min Feng
Xiangcong Zhao
Chong Gao
Jing Luo
author_facet Yan Qin
Yanlin Wang
Fanxing Meng
Min Feng
Xiangcong Zhao
Chong Gao
Jing Luo
author_sort Yan Qin
collection DOAJ
description Abstract Background This study aimed to search for blood biomarkers among the profiles of patients with RA-ILD by using machine learning classifiers and probe correlations between the markers and the characteristics of RA-ILD. Methods A total of 153 RA patients were enrolled, including 75 RA-ILD and 78 RA-non-ILD. Routine laboratory data, the levels of tumor markers and autoantibodies, and clinical manifestations were recorded. Univariate analysis, least absolute shrinkage and selection operator (LASSO), random forest (RF), and partial least square (PLS) were performed, and the receiver operating characteristic (ROC) curves were plotted. Results Univariate analysis showed that, compared to RA-non-ILD, patients with RA-ILD were older (p < 0.001), had higher white blood cell (p = 0.003) and neutrophil counts (p = 0.017), had higher erythrocyte sedimentation rate (p = 0.003) and C-reactive protein (p = 0.003), had higher levels of KL-6 (p < 0.001), D-dimer (p < 0.001), fibrinogen (p < 0.001), fibrinogen degradation products (p < 0.001), lactate dehydrogenase (p < 0.001), hydroxybutyrate dehydrogenase (p < 0.001), carbohydrate antigen (CA) 19–9 (p < 0.001), carcinoembryonic antigen (p = 0.001), and CA242 (p < 0.001), but a significantly lower albumin level (p = 0.003). The areas under the curves (AUCs) of the LASSO, RF, and PLS models attained 0.95 in terms of differentiating patients with RA-ILD from those without. When data from the univariate analysis and the top 10 indicators of the three machine learning models were combined, the most discriminatory markers were age and the KL-6, D-dimer, and CA19-9, with AUCs of 0.814 [95% confidence interval (CI) 0.731–0.880], 0.749 (95% CI 0.660–0.824), 0.749 (95% CI 0.660–0.824), and 0.727 (95% CI 0.637–0.805), respectively. When all four markers were combined, the AUC reached 0.928 (95% CI 0.865–0.968). Notably, neither the KL-6 nor the CA19-9 level correlated with disease activity in RA-ILD group. Conclusions The levels of KL-6, D-dimer, and tumor markers greatly aided RA-ILD identification. Machine learning algorithms combined with traditional biostatistical analysis can diagnose patients with RA-ILD and identify biomarkers potentially associated with the disease.
first_indexed 2024-12-12T08:23:57Z
format Article
id doaj.art-604090d284f6459baf7abd1d633280d6
institution Directory Open Access Journal
issn 1478-6362
language English
last_indexed 2024-12-12T08:23:57Z
publishDate 2022-05-01
publisher BMC
record_format Article
series Arthritis Research & Therapy
spelling doaj.art-604090d284f6459baf7abd1d633280d62022-12-22T00:31:18ZengBMCArthritis Research & Therapy1478-63622022-05-0124111210.1186/s13075-022-02800-2Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung diseaseYan Qin0Yanlin Wang1Fanxing Meng2Min Feng3Xiangcong Zhao4Chong Gao5Jing Luo6Department of Rheumatology, Second Hospital of Shanxi Medical UniversityDepartment of Rheumatology, Second Hospital of Shanxi Medical UniversityThe Shanxi Medical UniversityDepartment of Rheumatology, Second Hospital of Shanxi Medical UniversityDepartment of Rheumatology, Second Hospital of Shanxi Medical UniversityDepartment of Pathology, Brigham and Women’s Hospital, Harvard Medical SchoolDepartment of Rheumatology, Second Hospital of Shanxi Medical UniversityAbstract Background This study aimed to search for blood biomarkers among the profiles of patients with RA-ILD by using machine learning classifiers and probe correlations between the markers and the characteristics of RA-ILD. Methods A total of 153 RA patients were enrolled, including 75 RA-ILD and 78 RA-non-ILD. Routine laboratory data, the levels of tumor markers and autoantibodies, and clinical manifestations were recorded. Univariate analysis, least absolute shrinkage and selection operator (LASSO), random forest (RF), and partial least square (PLS) were performed, and the receiver operating characteristic (ROC) curves were plotted. Results Univariate analysis showed that, compared to RA-non-ILD, patients with RA-ILD were older (p < 0.001), had higher white blood cell (p = 0.003) and neutrophil counts (p = 0.017), had higher erythrocyte sedimentation rate (p = 0.003) and C-reactive protein (p = 0.003), had higher levels of KL-6 (p < 0.001), D-dimer (p < 0.001), fibrinogen (p < 0.001), fibrinogen degradation products (p < 0.001), lactate dehydrogenase (p < 0.001), hydroxybutyrate dehydrogenase (p < 0.001), carbohydrate antigen (CA) 19–9 (p < 0.001), carcinoembryonic antigen (p = 0.001), and CA242 (p < 0.001), but a significantly lower albumin level (p = 0.003). The areas under the curves (AUCs) of the LASSO, RF, and PLS models attained 0.95 in terms of differentiating patients with RA-ILD from those without. When data from the univariate analysis and the top 10 indicators of the three machine learning models were combined, the most discriminatory markers were age and the KL-6, D-dimer, and CA19-9, with AUCs of 0.814 [95% confidence interval (CI) 0.731–0.880], 0.749 (95% CI 0.660–0.824), 0.749 (95% CI 0.660–0.824), and 0.727 (95% CI 0.637–0.805), respectively. When all four markers were combined, the AUC reached 0.928 (95% CI 0.865–0.968). Notably, neither the KL-6 nor the CA19-9 level correlated with disease activity in RA-ILD group. Conclusions The levels of KL-6, D-dimer, and tumor markers greatly aided RA-ILD identification. Machine learning algorithms combined with traditional biostatistical analysis can diagnose patients with RA-ILD and identify biomarkers potentially associated with the disease.https://doi.org/10.1186/s13075-022-02800-2Interstitial lung diseaseRheumatoid arthritisKrebs von den Lungen-6D-dimerTumor markersMachine learning algorithm
spellingShingle Yan Qin
Yanlin Wang
Fanxing Meng
Min Feng
Xiangcong Zhao
Chong Gao
Jing Luo
Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease
Arthritis Research & Therapy
Interstitial lung disease
Rheumatoid arthritis
Krebs von den Lungen-6
D-dimer
Tumor markers
Machine learning algorithm
title Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease
title_full Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease
title_fullStr Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease
title_full_unstemmed Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease
title_short Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease
title_sort identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis associated interstitial lung disease
topic Interstitial lung disease
Rheumatoid arthritis
Krebs von den Lungen-6
D-dimer
Tumor markers
Machine learning algorithm
url https://doi.org/10.1186/s13075-022-02800-2
work_keys_str_mv AT yanqin identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease
AT yanlinwang identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease
AT fanxingmeng identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease
AT minfeng identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease
AT xiangcongzhao identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease
AT chonggao identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease
AT jingluo identificationofbiomarkersbymachinelearningclassifierstoassistdiagnoserheumatoidarthritisassociatedinterstitiallungdisease