Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP

In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP...

Full description

Bibliographic Details
Main Authors: Carlos Matias Scavuzzo, Juan Manuel Scavuzzo, Micaela Natalia Campero, Melaku Anegagrie, Aranzazu Amor Aramendia, Agustín Benito, Victoria Periago
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2022-03-01
Series:Infectious Disease Modelling
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2468042722000045
_version_ 1797203436401328128
author Carlos Matias Scavuzzo
Juan Manuel Scavuzzo
Micaela Natalia Campero
Melaku Anegagrie
Aranzazu Amor Aramendia
Agustín Benito
Victoria Periago
author_facet Carlos Matias Scavuzzo
Juan Manuel Scavuzzo
Micaela Natalia Campero
Melaku Anegagrie
Aranzazu Amor Aramendia
Agustín Benito
Victoria Periago
author_sort Carlos Matias Scavuzzo
collection DOAJ
description In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
first_indexed 2024-04-24T08:19:18Z
format Article
id doaj.art-74a9e4c6389c4a92b40f592540558a8e
institution Directory Open Access Journal
issn 2468-0427
language English
last_indexed 2024-04-24T08:19:18Z
publishDate 2022-03-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Infectious Disease Modelling
spelling doaj.art-74a9e4c6389c4a92b40f592540558a8e2024-04-17T02:00:48ZengKeAi Communications Co., Ltd.Infectious Disease Modelling2468-04272022-03-0171262276Feature importance: Opening a soil-transmitted helminth machine learning model via SHAPCarlos Matias Scavuzzo0Juan Manuel Scavuzzo1Micaela Natalia Campero2Melaku Anegagrie3Aranzazu Amor Aramendia4Agustín Benito5Victoria Periago6Instituto de Altos Estudios Espaciales Mario Gulich, Univesidad Nacional de Córdoba-Comisión Nacional de Actividades Espaciales, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina; Corresponding author. Instituto de Altos Estudios Espaciales Mario Gulich, Univesidad Nacional de Córdoba-Comisión Nacional de Actividades Espaciales, Spain.Instituto de Altos Estudios Espaciales Mario Gulich, Univesidad Nacional de Córdoba-Comisión Nacional de Actividades Espaciales, ArgentinaInstituto de Altos Estudios Espaciales Mario Gulich, Univesidad Nacional de Córdoba-Comisión Nacional de Actividades Espaciales, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, ArgentinaFundación Mundo Sano, Madrid, Spain; National Centre for Tropical Medicine, Institute of Health Carlos III, Madrid, SpainFundación Mundo Sano, Madrid, Spain; National Centre for Tropical Medicine, Institute of Health Carlos III, Madrid, SpainNational Centre for Tropical Medicine, Institute of Health Carlos III, Madrid, SpainFundación Mundo Sano, Buenos Aires, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, ArgentinaIn the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.http://www.sciencedirect.com/science/article/pii/S2468042722000045ShapShapleyMachine learningRemote sensingHookwormEthiopia
spellingShingle Carlos Matias Scavuzzo
Juan Manuel Scavuzzo
Micaela Natalia Campero
Melaku Anegagrie
Aranzazu Amor Aramendia
Agustín Benito
Victoria Periago
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
Infectious Disease Modelling
Shap
Shapley
Machine learning
Remote sensing
Hookworm
Ethiopia
title Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_full Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_fullStr Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_full_unstemmed Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_short Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_sort feature importance opening a soil transmitted helminth machine learning model via shap
topic Shap
Shapley
Machine learning
Remote sensing
Hookworm
Ethiopia
url http://www.sciencedirect.com/science/article/pii/S2468042722000045
work_keys_str_mv AT carlosmatiasscavuzzo featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT juanmanuelscavuzzo featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT micaelanataliacampero featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT melakuanegagrie featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT aranzazuamoraramendia featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT agustinbenito featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap
AT victoriaperiago featureimportanceopeningasoiltransmittedhelminthmachinelearningmodelviashap