Evaluating Bias in Machine Learning-Enabled Radiology Image Classification

As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing th...

Full description

Bibliographic Details
Main Author: Atia, Dina
Other Authors: Ghassemi, Marzyeh
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/151662
_version_ 1826210199304667136
author Atia, Dina
author2 Ghassemi, Marzyeh
author_facet Ghassemi, Marzyeh
Atia, Dina
author_sort Atia, Dina
collection MIT
description As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing their performance across various age, sex, and geographic groups. This serves not only as an audit of this particular set of models but also contributes to the development of a meaningful standard for fairness in the space of Machine Learning for Healthcare. We analyze the 10 highest-performing models from a competition hosted by the Radiological Society of North America in 2022. In this competition, teams competed to design and train machine learning models to detect and locate cervical spine fractures, a severe injury with high mortality rates, with high accuracy. We split the data into subgroups across the categories of sex, age, and continent, then compare them across seven performance metrics. We find the models to be fair overall, with similar performance across the given metrics. Additionally, we perform an intersectional analysis, where we compare the same metrics, but instead split the data based on intersections of the above attributes, and again find fair overall performance. Taking a holistic look at the results, the models appear to be fair under a variety of comparative metrics. However, future work is needed to determine whether or not the models we studied would in fact be fair for a more representative population.
first_indexed 2024-09-23T14:46:01Z
format Thesis
id mit-1721.1/151662
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T14:46:01Z
publishDate 2023
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1516622023-08-01T03:20:38Z Evaluating Bias in Machine Learning-Enabled Radiology Image Classification Atia, Dina Ghassemi, Marzyeh Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing their performance across various age, sex, and geographic groups. This serves not only as an audit of this particular set of models but also contributes to the development of a meaningful standard for fairness in the space of Machine Learning for Healthcare. We analyze the 10 highest-performing models from a competition hosted by the Radiological Society of North America in 2022. In this competition, teams competed to design and train machine learning models to detect and locate cervical spine fractures, a severe injury with high mortality rates, with high accuracy. We split the data into subgroups across the categories of sex, age, and continent, then compare them across seven performance metrics. We find the models to be fair overall, with similar performance across the given metrics. Additionally, we perform an intersectional analysis, where we compare the same metrics, but instead split the data based on intersections of the above attributes, and again find fair overall performance. Taking a holistic look at the results, the models appear to be fair under a variety of comparative metrics. However, future work is needed to determine whether or not the models we studied would in fact be fair for a more representative population. MNG 2023-07-31T19:57:15Z 2023-07-31T19:57:15Z 2023-06 2023-06-06T16:35:06.827Z Thesis https://hdl.handle.net/1721.1/151662 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Atia, Dina
Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_full Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_fullStr Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_full_unstemmed Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_short Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_sort evaluating bias in machine learning enabled radiology image classification
url https://hdl.handle.net/1721.1/151662
work_keys_str_mv AT atiadina evaluatingbiasinmachinelearningenabledradiologyimageclassification