Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review

Background Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable dise...

Full description

Bibliographic Details
Main Authors: Birdi, Sharon, Rabet, Roxana, Durant, Steve, Patel, Atushi, Vosoughi, Tina, Shergill, Mahek, Costanian, Christy, Ziegler, Carolyn P., Ali, Shehzad, Buckeridge, David, Ghassemi, Marzyeh, Gibson, Jennifer, John-Baptiste, Ava, Macklin, Jillian, McCradden, Melissa, McKenzie, Kwame
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: BioMed Central 2025
Online Access:https://hdl.handle.net/1721.1/157937
_version_ 1824458090273046528
author Birdi, Sharon
Rabet, Roxana
Durant, Steve
Patel, Atushi
Vosoughi, Tina
Shergill, Mahek
Costanian, Christy
Ziegler, Carolyn P.
Ali, Shehzad
Buckeridge, David
Ghassemi, Marzyeh
Gibson, Jennifer
John-Baptiste, Ava
Macklin, Jillian
McCradden, Melissa
McKenzie, Kwame
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Birdi, Sharon
Rabet, Roxana
Durant, Steve
Patel, Atushi
Vosoughi, Tina
Shergill, Mahek
Costanian, Christy
Ziegler, Carolyn P.
Ali, Shehzad
Buckeridge, David
Ghassemi, Marzyeh
Gibson, Jennifer
John-Baptiste, Ava
Macklin, Jillian
McCradden, Melissa
McKenzie, Kwame
author_sort Birdi, Sharon
collection MIT
description Background Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases. Methods We searched the peer-reviewed, indexed literature using Medline, Embase, Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews, CINAHL, Scopus, ACM Digital Library, Inspec, Web of Science’s Science Citation Index, Social Sciences Citation Index, and the Emerging Sources Citation Index, up to March 2022. Results The search identified 27 310 studies and 65 were included. Study aims were separated into algorithm comparison (n = 13, 20%) or disease modelling for population-health-related outputs (n = 52, 80%). We extracted data on NCD type, data sources, technical approach, possible algorithmic bias, and jurisdiction. Type 2 diabetes was the most studied NCD. The most common use of ML was for risk modeling. Mitigating bias was not extensively addressed, with most methods focused on mitigating sex-related bias. Conclusion This review examines current applications of ML in NCDs, highlighting potential biases and strategies for mitigation. Future research should focus on communicable diseases and the transferability of ML models in low and middle-income settings. Our findings can guide the development of guidelines for the equitable use of ML to improve population health outcomes.
first_indexed 2025-02-19T04:20:22Z
format Article
id mit-1721.1/157937
institution Massachusetts Institute of Technology
language English
last_indexed 2025-02-19T04:20:22Z
publishDate 2025
publisher BioMed Central
record_format dspace
spelling mit-1721.1/1579372025-02-14T16:01:29Z Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review Birdi, Sharon Rabet, Roxana Durant, Steve Patel, Atushi Vosoughi, Tina Shergill, Mahek Costanian, Christy Ziegler, Carolyn P. Ali, Shehzad Buckeridge, David Ghassemi, Marzyeh Gibson, Jennifer John-Baptiste, Ava Macklin, Jillian McCradden, Melissa McKenzie, Kwame Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Institute for Medical Engineering & Science Background Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases. Methods We searched the peer-reviewed, indexed literature using Medline, Embase, Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews, CINAHL, Scopus, ACM Digital Library, Inspec, Web of Science’s Science Citation Index, Social Sciences Citation Index, and the Emerging Sources Citation Index, up to March 2022. Results The search identified 27 310 studies and 65 were included. Study aims were separated into algorithm comparison (n = 13, 20%) or disease modelling for population-health-related outputs (n = 52, 80%). We extracted data on NCD type, data sources, technical approach, possible algorithmic bias, and jurisdiction. Type 2 diabetes was the most studied NCD. The most common use of ML was for risk modeling. Mitigating bias was not extensively addressed, with most methods focused on mitigating sex-related bias. Conclusion This review examines current applications of ML in NCDs, highlighting potential biases and strategies for mitigation. Future research should focus on communicable diseases and the transferability of ML models in low and middle-income settings. Our findings can guide the development of guidelines for the equitable use of ML to improve population health outcomes. 2025-01-02T22:36:23Z 2025-01-02T22:36:23Z 2024-12-28 2024-12-29T04:18:11Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/157937 Birdi, S., Rabet, R., Durant, S. et al. Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review. BMC Public Health 24, 3599 (2024). PUBLISHER_CC en https://doi.org/10.1186/s12889-024-21081-9 BMC Public Health Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf BioMed Central BioMed Central
spellingShingle Birdi, Sharon
Rabet, Roxana
Durant, Steve
Patel, Atushi
Vosoughi, Tina
Shergill, Mahek
Costanian, Christy
Ziegler, Carolyn P.
Ali, Shehzad
Buckeridge, David
Ghassemi, Marzyeh
Gibson, Jennifer
John-Baptiste, Ava
Macklin, Jillian
McCradden, Melissa
McKenzie, Kwame
Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
title Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
title_full Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
title_fullStr Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
title_full_unstemmed Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
title_short Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
title_sort bias in machine learning applications to address non communicable diseases at a population level a scoping review
url https://hdl.handle.net/1721.1/157937
work_keys_str_mv AT birdisharon biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT rabetroxana biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT durantsteve biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT patelatushi biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT vosoughitina biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT shergillmahek biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT costanianchristy biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT zieglercarolynp biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT alishehzad biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT buckeridgedavid biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT ghassemimarzyeh biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT gibsonjennifer biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT johnbaptisteava biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT macklinjillian biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT mccraddenmelissa biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview
AT mckenziekwame biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview