Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the <i>Mycobacterium tuberculosis</i> (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that hea...

Full description

Bibliographic Details
Main Authors: Maicon Herverton Lino Ferreira da Silva Barros, Geovanne Oliveira Alves, Lubnnia Morais Florêncio Souza, Elisson da Silva Rocha, João Fausto Lorenzato de Oliveira, Theo Lynn, Vanderson Sampaio, Patricia Takako Endo
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Informatics
Subjects:
Online Access:https://www.mdpi.com/2227-9709/8/2/27
_version_ 1797537596387098624
author Maicon Herverton Lino Ferreira da Silva Barros
Geovanne Oliveira Alves
Lubnnia Morais Florêncio Souza
Elisson da Silva Rocha
João Fausto Lorenzato de Oliveira
Theo Lynn
Vanderson Sampaio
Patricia Takako Endo
author_facet Maicon Herverton Lino Ferreira da Silva Barros
Geovanne Oliveira Alves
Lubnnia Morais Florêncio Souza
Elisson da Silva Rocha
João Fausto Lorenzato de Oliveira
Theo Lynn
Vanderson Sampaio
Patricia Takako Endo
author_sort Maicon Herverton Lino Ferreira da Silva Barros
collection DOAJ
description Tuberculosis (TB) is an airborne infectious disease caused by organisms in the <i>Mycobacterium tuberculosis</i> (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.
first_indexed 2024-03-10T12:18:28Z
format Article
id doaj.art-8a358e90567e498886c5703afa5ca90f
institution Directory Open Access Journal
issn 2227-9709
language English
last_indexed 2024-03-10T12:18:28Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Informatics
spelling doaj.art-8a358e90567e498886c5703afa5ca90f2023-11-21T15:42:28ZengMDPI AGInformatics2227-97092021-04-01822710.3390/informatics8020027Benchmarking Machine Learning Models to Assist in the Prognosis of TuberculosisMaicon Herverton Lino Ferreira da Silva Barros0Geovanne Oliveira Alves1Lubnnia Morais Florêncio Souza2Elisson da Silva Rocha3João Fausto Lorenzato de Oliveira4Theo Lynn5Vanderson Sampaio6Patricia Takako Endo7Programa de Pós-Graduação em Engenharia de Computação (PPGEC), Universidade de Pernambuco, Recife 50720-001, Pernambuco, BrazilPrograma de Pós-Graduação em Engenharia de Computação (PPGEC), Universidade de Pernambuco, Recife 50720-001, Pernambuco, BrazilPrograma de Pós-Graduação em Engenharia de Computação (PPGEC), Universidade de Pernambuco, Recife 50720-001, Pernambuco, BrazilPrograma de Pós-Graduação em Engenharia de Computação (PPGEC), Universidade de Pernambuco, Recife 50720-001, Pernambuco, BrazilPrograma de Pós-Graduação em Engenharia de Computação (PPGEC), Universidade de Pernambuco, Recife 50720-001, Pernambuco, BrazilBusiness School, Dublin City University, Dublin 9, Dublin, IrelandFundação de Medicina Tropical Doutor Heitor Vieira Dourado, Manaus 69040-000, Amazonas, BrazilPrograma de Pós-Graduação em Engenharia de Computação (PPGEC), Universidade de Pernambuco, Recife 50720-001, Pernambuco, BrazilTuberculosis (TB) is an airborne infectious disease caused by organisms in the <i>Mycobacterium tuberculosis</i> (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.https://www.mdpi.com/2227-9709/8/2/27tuberculosisneglected tropical diseaseprognosismachine learningensemble modelimbalanced data sets
spellingShingle Maicon Herverton Lino Ferreira da Silva Barros
Geovanne Oliveira Alves
Lubnnia Morais Florêncio Souza
Elisson da Silva Rocha
João Fausto Lorenzato de Oliveira
Theo Lynn
Vanderson Sampaio
Patricia Takako Endo
Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis
Informatics
tuberculosis
neglected tropical disease
prognosis
machine learning
ensemble model
imbalanced data sets
title Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis
title_full Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis
title_fullStr Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis
title_full_unstemmed Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis
title_short Benchmarking Machine Learning Models to Assist in the Prognosis of Tuberculosis
title_sort benchmarking machine learning models to assist in the prognosis of tuberculosis
topic tuberculosis
neglected tropical disease
prognosis
machine learning
ensemble model
imbalanced data sets
url https://www.mdpi.com/2227-9709/8/2/27
work_keys_str_mv AT maiconhervertonlinoferreiradasilvabarros benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT geovanneoliveiraalves benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT lubnniamoraisflorenciosouza benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT elissondasilvarocha benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT joaofaustolorenzatodeoliveira benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT theolynn benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT vandersonsampaio benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis
AT patriciatakakoendo benchmarkingmachinelearningmodelstoassistintheprognosisoftuberculosis