Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.

Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in th...

Full description

Bibliographic Details
Main Authors: Sara Domínguez-Rodríguez, Miquel Serna-Pascual, Andrea Oletto, Shaun Barnabas, Peter Zuidewind, Els Dobbels, Siva Danaviah, Osee Behuhuma, Maria Grazia Lain, Paula Vaz, Sheila Fernández-Luis, Tacilta Nhampossa, Elisa Lopez-Varela, Kennedy Otwombe, Afaaf Liberty, Avy Violari, Almoustapha Issiaka Maiga, Paolo Rossi, Carlo Giaquinto, Louise Kuhn, Pablo Rojo, Alfredo Tagarro, EPIICAL Consortium
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0276116
_version_ 1811337510966525952
author Sara Domínguez-Rodríguez
Miquel Serna-Pascual
Andrea Oletto
Shaun Barnabas
Peter Zuidewind
Els Dobbels
Siva Danaviah
Osee Behuhuma
Maria Grazia Lain
Paula Vaz
Sheila Fernández-Luis
Tacilta Nhampossa
Elisa Lopez-Varela
Kennedy Otwombe
Afaaf Liberty
Avy Violari
Almoustapha Issiaka Maiga
Paolo Rossi
Carlo Giaquinto
Louise Kuhn
Pablo Rojo
Alfredo Tagarro
EPIICAL Consortium
author_facet Sara Domínguez-Rodríguez
Miquel Serna-Pascual
Andrea Oletto
Shaun Barnabas
Peter Zuidewind
Els Dobbels
Siva Danaviah
Osee Behuhuma
Maria Grazia Lain
Paula Vaz
Sheila Fernández-Luis
Tacilta Nhampossa
Elisa Lopez-Varela
Kennedy Otwombe
Afaaf Liberty
Avy Violari
Almoustapha Issiaka Maiga
Paolo Rossi
Carlo Giaquinto
Louise Kuhn
Pablo Rojo
Alfredo Tagarro
EPIICAL Consortium
author_sort Sara Domínguez-Rodríguez
collection DOAJ
description Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.
first_indexed 2024-04-13T17:56:01Z
format Article
id doaj.art-385a786b264542a8aba8b054515dc0fb
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-13T17:56:01Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-385a786b264542a8aba8b054515dc0fb2022-12-22T02:36:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011710e027611610.1371/journal.pone.0276116Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.Sara Domínguez-RodríguezMiquel Serna-PascualAndrea OlettoShaun BarnabasPeter ZuidewindEls DobbelsSiva DanaviahOsee BehuhumaMaria Grazia LainPaula VazSheila Fernández-LuisTacilta NhampossaElisa Lopez-VarelaKennedy OtwombeAfaaf LibertyAvy ViolariAlmoustapha Issiaka MaigaPaolo RossiCarlo GiaquintoLouise KuhnPablo RojoAlfredo TagarroEPIICAL ConsortiumLogistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.https://doi.org/10.1371/journal.pone.0276116
spellingShingle Sara Domínguez-Rodríguez
Miquel Serna-Pascual
Andrea Oletto
Shaun Barnabas
Peter Zuidewind
Els Dobbels
Siva Danaviah
Osee Behuhuma
Maria Grazia Lain
Paula Vaz
Sheila Fernández-Luis
Tacilta Nhampossa
Elisa Lopez-Varela
Kennedy Otwombe
Afaaf Liberty
Avy Violari
Almoustapha Issiaka Maiga
Paolo Rossi
Carlo Giaquinto
Louise Kuhn
Pablo Rojo
Alfredo Tagarro
EPIICAL Consortium
Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
PLoS ONE
title Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
title_full Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
title_fullStr Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
title_full_unstemmed Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
title_short Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.
title_sort machine learning outperformed logistic regression classification even with limit sample size a model to predict pediatric hiv mortality and clinical progression to aids
url https://doi.org/10.1371/journal.pone.0276116
work_keys_str_mv AT saradominguezrodriguez machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT miquelsernapascual machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT andreaoletto machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT shaunbarnabas machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT peterzuidewind machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT elsdobbels machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT sivadanaviah machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT oseebehuhuma machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT mariagrazialain machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT paulavaz machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT sheilafernandezluis machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT taciltanhampossa machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT elisalopezvarela machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT kennedyotwombe machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT afaafliberty machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT avyviolari machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT almoustaphaissiakamaiga machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT paolorossi machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT carlogiaquinto machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT louisekuhn machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT pablorojo machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT alfredotagarro machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids
AT epiicalconsortium machinelearningoutperformedlogisticregressionclassificationevenwithlimitsamplesizeamodeltopredictpediatrichivmortalityandclinicalprogressiontoaids