Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review
Background Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable dise...
Main Authors: | , , , , , , , , , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
BioMed Central
2025
|
Online Access: | https://hdl.handle.net/1721.1/157937 |
_version_ | 1824458090273046528 |
---|---|
author | Birdi, Sharon Rabet, Roxana Durant, Steve Patel, Atushi Vosoughi, Tina Shergill, Mahek Costanian, Christy Ziegler, Carolyn P. Ali, Shehzad Buckeridge, David Ghassemi, Marzyeh Gibson, Jennifer John-Baptiste, Ava Macklin, Jillian McCradden, Melissa McKenzie, Kwame |
author2 | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science |
author_facet | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Birdi, Sharon Rabet, Roxana Durant, Steve Patel, Atushi Vosoughi, Tina Shergill, Mahek Costanian, Christy Ziegler, Carolyn P. Ali, Shehzad Buckeridge, David Ghassemi, Marzyeh Gibson, Jennifer John-Baptiste, Ava Macklin, Jillian McCradden, Melissa McKenzie, Kwame |
author_sort | Birdi, Sharon |
collection | MIT |
description | Background Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases. Methods We searched the peer-reviewed, indexed literature using Medline, Embase, Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews, CINAHL, Scopus, ACM Digital Library, Inspec, Web of Science’s Science Citation Index, Social Sciences Citation Index, and the Emerging Sources Citation Index, up to March 2022. Results The search identified 27 310 studies and 65 were included. Study aims were separated into algorithm comparison (n = 13, 20%) or disease modelling for population-health-related outputs (n = 52, 80%). We extracted data on NCD type, data sources, technical approach, possible algorithmic bias, and jurisdiction. Type 2 diabetes was the most studied NCD. The most common use of ML was for risk modeling. Mitigating bias was not extensively addressed, with most methods focused on mitigating sex-related bias. Conclusion This review examines current applications of ML in NCDs, highlighting potential biases and strategies for mitigation. Future research should focus on communicable diseases and the transferability of ML models in low and middle-income settings. Our findings can guide the development of guidelines for the equitable use of ML to improve population health outcomes. |
first_indexed | 2025-02-19T04:20:22Z |
format | Article |
id | mit-1721.1/157937 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2025-02-19T04:20:22Z |
publishDate | 2025 |
publisher | BioMed Central |
record_format | dspace |
spelling | mit-1721.1/1579372025-02-14T16:01:29Z Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review Birdi, Sharon Rabet, Roxana Durant, Steve Patel, Atushi Vosoughi, Tina Shergill, Mahek Costanian, Christy Ziegler, Carolyn P. Ali, Shehzad Buckeridge, David Ghassemi, Marzyeh Gibson, Jennifer John-Baptiste, Ava Macklin, Jillian McCradden, Melissa McKenzie, Kwame Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Institute for Medical Engineering & Science Background Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases. Methods We searched the peer-reviewed, indexed literature using Medline, Embase, Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews, CINAHL, Scopus, ACM Digital Library, Inspec, Web of Science’s Science Citation Index, Social Sciences Citation Index, and the Emerging Sources Citation Index, up to March 2022. Results The search identified 27 310 studies and 65 were included. Study aims were separated into algorithm comparison (n = 13, 20%) or disease modelling for population-health-related outputs (n = 52, 80%). We extracted data on NCD type, data sources, technical approach, possible algorithmic bias, and jurisdiction. Type 2 diabetes was the most studied NCD. The most common use of ML was for risk modeling. Mitigating bias was not extensively addressed, with most methods focused on mitigating sex-related bias. Conclusion This review examines current applications of ML in NCDs, highlighting potential biases and strategies for mitigation. Future research should focus on communicable diseases and the transferability of ML models in low and middle-income settings. Our findings can guide the development of guidelines for the equitable use of ML to improve population health outcomes. 2025-01-02T22:36:23Z 2025-01-02T22:36:23Z 2024-12-28 2024-12-29T04:18:11Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/157937 Birdi, S., Rabet, R., Durant, S. et al. Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review. BMC Public Health 24, 3599 (2024). PUBLISHER_CC en https://doi.org/10.1186/s12889-024-21081-9 BMC Public Health Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/ The Author(s) application/pdf BioMed Central BioMed Central |
spellingShingle | Birdi, Sharon Rabet, Roxana Durant, Steve Patel, Atushi Vosoughi, Tina Shergill, Mahek Costanian, Christy Ziegler, Carolyn P. Ali, Shehzad Buckeridge, David Ghassemi, Marzyeh Gibson, Jennifer John-Baptiste, Ava Macklin, Jillian McCradden, Melissa McKenzie, Kwame Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review |
title | Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review |
title_full | Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review |
title_fullStr | Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review |
title_full_unstemmed | Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review |
title_short | Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review |
title_sort | bias in machine learning applications to address non communicable diseases at a population level a scoping review |
url | https://hdl.handle.net/1721.1/157937 |
work_keys_str_mv | AT birdisharon biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT rabetroxana biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT durantsteve biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT patelatushi biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT vosoughitina biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT shergillmahek biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT costanianchristy biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT zieglercarolynp biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT alishehzad biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT buckeridgedavid biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT ghassemimarzyeh biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT gibsonjennifer biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT johnbaptisteava biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT macklinjillian biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT mccraddenmelissa biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview AT mckenziekwame biasinmachinelearningapplicationstoaddressnoncommunicablediseasesatapopulationlevelascopingreview |