Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in th...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2022-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0276116 |
_version_ | 1811337510966525952 |
---|---|
author | Sara Domínguez-Rodríguez Miquel Serna-Pascual Andrea Oletto Shaun Barnabas Peter Zuidewind Els Dobbels Siva Danaviah Osee Behuhuma Maria Grazia Lain Paula Vaz Sheila Fernández-Luis Tacilta Nhampossa Elisa Lopez-Varela Kennedy Otwombe Afaaf Liberty Avy Violari Almoustapha Issiaka Maiga Paolo Rossi Carlo Giaquinto Louise Kuhn Pablo Rojo Alfredo Tagarro EPIICAL Consortium |
author_facet | Sara Domínguez-Rodríguez Miquel Serna-Pascual Andrea Oletto Shaun Barnabas Peter Zuidewind Els Dobbels Siva Danaviah Osee Behuhuma Maria Grazia Lain Paula Vaz Sheila Fernández-Luis Tacilta Nhampossa Elisa Lopez-Varela Kennedy Otwombe Afaaf Liberty Avy Violari Almoustapha Issiaka Maiga Paolo Rossi Carlo Giaquinto Louise Kuhn Pablo Rojo Alfredo Tagarro EPIICAL Consortium |
author_sort | Sara Domínguez-Rodríguez |
collection | DOAJ |
description | Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression. |
first_indexed | 2024-04-13T17:56:01Z |
format | Article |
id | doaj.art-385a786b264542a8aba8b054515dc0fb |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-04-13T17:56:01Z |
publishDate | 2022-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-385a786b264542a8aba8b054515dc0fb2022-12-22T02:36:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011710e027611610.1371/journal.pone.0276116Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.Sara Domínguez-RodríguezMiquel Serna-PascualAndrea OlettoShaun BarnabasPeter ZuidewindEls DobbelsSiva DanaviahOsee BehuhumaMaria Grazia LainPaula VazSheila Fernández-LuisTacilta NhampossaElisa Lopez-VarelaKennedy OtwombeAfaaf LibertyAvy ViolariAlmoustapha Issiaka MaigaPaolo RossiCarlo GiaquintoLouise KuhnPablo RojoAlfredo TagarroEPIICAL ConsortiumLogistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.https://doi.org/10.1371/journal.pone.0276116 |
spellingShingle | Sara Domínguez-Rodríguez Miquel Serna-Pascual Andrea Oletto Shaun Barnabas Peter Zuidewind Els Dobbels Siva Danaviah Osee Behuhuma Maria Grazia Lain Paula Vaz Sheila Fernández-Luis Tacilta Nhampossa Elisa Lopez-Varela Kennedy Otwombe Afaaf Liberty Avy Violari Almoustapha Issiaka Maiga Paolo Rossi Carlo Giaquinto Louise Kuhn Pablo Rojo Alfredo Tagarro EPIICAL Consortium Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. PLoS ONE |
title | Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. |
title_full | Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. |
title_fullStr | Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. |
title_full_unstemmed | Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. |
title_short | Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. |
title_sort | machine learning outperformed logistic regression classification even with limit sample size a model to predict pediatric hiv mortality and clinical progression to aids |
url | https://doi.org/10.1371/journal.pone.0276116 |
work_keys_str_mv | AT saradominguezrodriguez machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT miquelsernapascual machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT andreaoletto machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT shaunbarnabas machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT peterzuidewind machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT elsdobbels machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT sivadanaviah machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT oseebehuhuma machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT mariagrazialain machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT paulavaz machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT sheilafernandezluis machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT taciltanhampossa machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT elisalopezvarela machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT kennedyotwombe machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT afaafliberty machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT avyviolari machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT almoustaphaissiakamaiga machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT paolorossi machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT carlogiaquinto machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT louisekuhn machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT pablorojo machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT alfredotagarro machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids AT epiicalconsortium machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids |