Evaluating Bias in Machine Learning-Enabled Radiology Image Classification

As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing th...

Full description

Bibliographic Details
Main Author:	Atia, Dina
Other Authors:	Ghassemi, Marzyeh
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/151662

_version_	1826210199304667136
author	Atia, Dina
author2	Ghassemi, Marzyeh
author_facet	Ghassemi, Marzyeh Atia, Dina
author_sort	Atia, Dina
collection	MIT
description	As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing their performance across various age, sex, and geographic groups. This serves not only as an audit of this particular set of models but also contributes to the development of a meaningful standard for fairness in the space of Machine Learning for Healthcare. We analyze the 10 highest-performing models from a competition hosted by the Radiological Society of North America in 2022. In this competition, teams competed to design and train machine learning models to detect and locate cervical spine fractures, a severe injury with high mortality rates, with high accuracy. We split the data into subgroups across the categories of sex, age, and continent, then compare them across seven performance metrics. We find the models to be fair overall, with similar performance across the given metrics. Additionally, we perform an intersectional analysis, where we compare the same metrics, but instead split the data based on intersections of the above attributes, and again find fair overall performance. Taking a holistic look at the results, the models appear to be fair under a variety of comparative metrics. However, future work is needed to determine whether or not the models we studied would in fact be fair for a more representative population.
first_indexed	2024-09-23T14:46:01Z
format	Thesis
id	mit-1721.1/151662
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:46:01Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1516622023-08-01T03:20:38Z Evaluating Bias in Machine Learning-Enabled Radiology Image Classification Atia, Dina Ghassemi, Marzyeh Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science As machine learning grows more prevalent in the medical field, it is important to ensure that fairness is considered as a central criterion in the evaluation of algorithms and models. Building upon previous work, we study a set of machine learning models used to detect spinal fractures, comparing their performance across various age, sex, and geographic groups. This serves not only as an audit of this particular set of models but also contributes to the development of a meaningful standard for fairness in the space of Machine Learning for Healthcare. We analyze the 10 highest-performing models from a competition hosted by the Radiological Society of North America in 2022. In this competition, teams competed to design and train machine learning models to detect and locate cervical spine fractures, a severe injury with high mortality rates, with high accuracy. We split the data into subgroups across the categories of sex, age, and continent, then compare them across seven performance metrics. We find the models to be fair overall, with similar performance across the given metrics. Additionally, we perform an intersectional analysis, where we compare the same metrics, but instead split the data based on intersections of the above attributes, and again find fair overall performance. Taking a holistic look at the results, the models appear to be fair under a variety of comparative metrics. However, future work is needed to determine whether or not the models we studied would in fact be fair for a more representative population. MNG 2023-07-31T19:57:15Z 2023-07-31T19:57:15Z 2023-06 2023-06-06T16:35:06.827Z Thesis https://hdl.handle.net/1721.1/151662 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Atia, Dina Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title	Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_full	Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_fullStr	Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_full_unstemmed	Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_short	Evaluating Bias in Machine Learning-Enabled Radiology Image Classification
title_sort	evaluating bias in machine learning enabled radiology image classification
url	https://hdl.handle.net/1721.1/151662
work_keys_str_mv	AT atiadina evaluatingbiasinmachinelearningenabledradiologyimageclassification

Evaluating Bias in Machine Learning-Enabled Radiology Image Classification

Similar Items