Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing th...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151662 |
_version_ | 1826210199304667136 |
---|---|
author | Atia, Dina |
author2 | Ghassemi, Marzyeh |
author_facet | Ghassemi, Marzyeh Atia, Dina |
author_sort | Atia, Dina |
collection | MIT |
description | As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing their performance across various age, sex, and geographic groups. This serves not only as an audit of this particular set of models but also contributes to the development of a meaningful standard for fairness in the space of Machine Learning for Healthcare.
We analyze the 10 highest-performing models from a competition hosted by the Radiological Society of North America in 2022. In this competition, teams competed to design and train machine learning models to detect and locate cervical spine fractures, a severe injury with high mortality rates, with high accuracy.
We split the data into subgroups across the categories of sex, age, and continent, then compare them across seven performance metrics. We find the models to be fair overall, with similar performance across the given metrics. Additionally, we perform an intersectional analysis, where we compare the same metrics, but instead split the data based on intersections of the above attributes, and again find fair overall performance.
Taking a holistic look at the results, the models appear to be fair under a variety of comparative metrics. However, future work is needed to determine whether or not the models we studied would in fact be fair for a more representative population. |
first_indexed | 2024-09-23T14:46:01Z |
format | Thesis |
id | mit-1721.1/151662 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T14:46:01Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1516622023-08-01T03:20:38Z Evaluating Bias in Machine Learning-Enabled Radiology Image Classification Atia, Dina Ghassemi, Marzyeh Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing their performance across various age, sex, and geographic groups. This serves not only as an audit of this particular set of models but also contributes to the development of a meaningful standard for fairness in the space of Machine Learning for Healthcare. We analyze the 10 highest-performing models from a competition hosted by the Radiological Society of North America in 2022. In this competition, teams competed to design and train machine learning models to detect and locate cervical spine fractures, a severe injury with high mortality rates, with high accuracy. We split the data into subgroups across the categories of sex, age, and continent, then compare them across seven performance metrics. We find the models to be fair overall, with similar performance across the given metrics. Additionally, we perform an intersectional analysis, where we compare the same metrics, but instead split the data based on intersections of the above attributes, and again find fair overall performance. Taking a holistic look at the results, the models appear to be fair under a variety of comparative metrics. However, future work is needed to determine whether or not the models we studied would in fact be fair for a more representative population. MNG 2023-07-31T19:57:15Z 2023-07-31T19:57:15Z 2023-06 2023-06-06T16:35:06.827Z Thesis https://hdl.handle.net/1721.1/151662 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Atia, Dina Evaluating Bias in Machine Learning-Enabled Radiology Image Classification |
title | Evaluating Bias in Machine Learning-Enabled
Radiology Image Classification |
title_full | Evaluating Bias in Machine Learning-Enabled
Radiology Image Classification |
title_fullStr | Evaluating Bias in Machine Learning-Enabled
Radiology Image Classification |
title_full_unstemmed | Evaluating Bias in Machine Learning-Enabled
Radiology Image Classification |
title_short | Evaluating Bias in Machine Learning-Enabled
Radiology Image Classification |
title_sort | evaluating bias in machine learning enabled radiology image classification |
url | https://hdl.handle.net/1721.1/151662 |
work_keys_str_mv | AT atiadina evaluatingbiasinmachinelearningenabledradiologyimageclassification |