Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study

Background Despite wide use of severity scoring systems for case-mix determination and benchmarking in the intensive care unit (ICU), the possibility of scoring bias across ethnicities has not been examined. Guidelines on the use of illness severity scores to inform triage decisions for allocation...

Full description

Bibliographic Details
Main Authors: Sarkar, Rahuldeb, Martin, Christopher, Mattie, Heather, Gichoya, Judy Wawira, Stone, David J, Celi, Leo Anthony G.
Other Authors: Massachusetts Institute of Technology. Institute for Medical Engineering & Science
Format: Article
Published: Elsevier BV 2021
Online Access:https://hdl.handle.net/1721.1/130358
_version_ 1811087735679614976
author Sarkar, Rahuldeb
Martin, Christopher
Mattie, Heather
Gichoya, Judy Wawira
Stone, David J
Celi, Leo Anthony G.
author2 Massachusetts Institute of Technology. Institute for Medical Engineering & Science
author_facet Massachusetts Institute of Technology. Institute for Medical Engineering & Science
Sarkar, Rahuldeb
Martin, Christopher
Mattie, Heather
Gichoya, Judy Wawira
Stone, David J
Celi, Leo Anthony G.
author_sort Sarkar, Rahuldeb
collection MIT
description Background Despite wide use of severity scoring systems for case-mix determination and benchmarking in the intensive care unit (ICU), the possibility of scoring bias across ethnicities has not been examined. Guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources, such as mechanical ventilation, during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of the severity scoring systems Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA) across four ethnicities in two large ICU databases to identify possible ethnicity-based bias. Methods Data from the electronic ICU Collaborative Research Database (eICU-CRD) and the Medical Information Mart for Intensive Care III (MIMIC-III) database, built from patient episodes in the USA from 2014–15 and 2001–12, respectively, were analysed for score performance in Asian, Black, Hispanic, and White people after appropriate exclusions. Hospital mortality was the outcome of interest. Discrimination and calibration were determined for all three scoring systems in all four groups, using area under receiver operating characteristic (AUROC) curve for different ethnicities to assess discrimination, and standardised mortality ratio (SMR) or proxy measures to assess calibration. Findings We analysed 166 751 participants (122 919 eICU-CRD and 43 832 MIMIC-III). Although measurements of discrimination were significantly different among the groups (AUROC ranging from 0·86 to 0·89 [p=0·016] with APACHE IVa and from 0·75 to 0·77 [p=0·85] with OASIS), they did not display any discernible systematic patterns of bias. However, measurements of calibration indicated persistent, and in some cases statistically significant, patterns of difference between Hispanic people (SMR 0·73 with APACHE IVa and 0·64 with OASIS) and Black people (0·67 and 0·68) versus Asian people (0·77 and 0·95) and White people (0·76 and 0·81). Although calibrations were imperfect for all groups, the scores consistently showed a pattern of overpredicting mortality for Black people and Hispanic people. Similar results were seen using SOFA scores across the two databases. Interpretation The systematic differences in calibration across ethnicities suggest that illness severity scores reflect statistical bias in their predictions of mortality.
first_indexed 2024-09-23T13:51:11Z
format Article
id mit-1721.1/130358
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T13:51:11Z
publishDate 2021
publisher Elsevier BV
record_format dspace
spelling mit-1721.1/1303582022-09-28T16:37:59Z Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study Sarkar, Rahuldeb Martin, Christopher Mattie, Heather Gichoya, Judy Wawira Stone, David J Celi, Leo Anthony G. Massachusetts Institute of Technology. Institute for Medical Engineering & Science Harvard--MIT Program in Health Sciences and Technology. Laboratory for Computational Physiology Background Despite wide use of severity scoring systems for case-mix determination and benchmarking in the intensive care unit (ICU), the possibility of scoring bias across ethnicities has not been examined. Guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources, such as mechanical ventilation, during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of the severity scoring systems Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA) across four ethnicities in two large ICU databases to identify possible ethnicity-based bias. Methods Data from the electronic ICU Collaborative Research Database (eICU-CRD) and the Medical Information Mart for Intensive Care III (MIMIC-III) database, built from patient episodes in the USA from 2014–15 and 2001–12, respectively, were analysed for score performance in Asian, Black, Hispanic, and White people after appropriate exclusions. Hospital mortality was the outcome of interest. Discrimination and calibration were determined for all three scoring systems in all four groups, using area under receiver operating characteristic (AUROC) curve for different ethnicities to assess discrimination, and standardised mortality ratio (SMR) or proxy measures to assess calibration. Findings We analysed 166 751 participants (122 919 eICU-CRD and 43 832 MIMIC-III). Although measurements of discrimination were significantly different among the groups (AUROC ranging from 0·86 to 0·89 [p=0·016] with APACHE IVa and from 0·75 to 0·77 [p=0·85] with OASIS), they did not display any discernible systematic patterns of bias. However, measurements of calibration indicated persistent, and in some cases statistically significant, patterns of difference between Hispanic people (SMR 0·73 with APACHE IVa and 0·64 with OASIS) and Black people (0·67 and 0·68) versus Asian people (0·77 and 0·95) and White people (0·76 and 0·81). Although calibrations were imperfect for all groups, the scores consistently showed a pattern of overpredicting mortality for Black people and Hispanic people. Similar results were seen using SOFA scores across the two databases. Interpretation The systematic differences in calibration across ethnicities suggest that illness severity scores reflect statistical bias in their predictions of mortality. 2021-04-05T14:25:56Z 2021-04-05T14:25:56Z 2021-04 Article http://purl.org/eprint/type/JournalArticle 2589-7500 https://hdl.handle.net/1721.1/130358 Sarkar, Rahuldeb et al. "Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study." Lancet Digital Health 3, 4 (April 2021): e241-e249 © 2021 The Author(s) https://doi.org/10.1016/S2589-7500(21)00022-4 Lancet Digital Health Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf Elsevier BV Elsevier
spellingShingle Sarkar, Rahuldeb
Martin, Christopher
Mattie, Heather
Gichoya, Judy Wawira
Stone, David J
Celi, Leo Anthony G.
Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
title Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
title_full Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
title_fullStr Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
title_full_unstemmed Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
title_short Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
title_sort performance of intensive care unit severity scoring systems across different ethnicities in the usa a retrospective observational study
url https://hdl.handle.net/1721.1/130358
work_keys_str_mv AT sarkarrahuldeb performanceofintensivecareunitseverityscoringsystemsacrossdifferentethnicitiesintheusaaretrospectiveobservationalstudy
AT martinchristopher performanceofintensivecareunitseverityscoringsystemsacrossdifferentethnicitiesintheusaaretrospectiveobservationalstudy
AT mattieheather performanceofintensivecareunitseverityscoringsystemsacrossdifferentethnicitiesintheusaaretrospectiveobservationalstudy
AT gichoyajudywawira performanceofintensivecareunitseverityscoringsystemsacrossdifferentethnicitiesintheusaaretrospectiveobservationalstudy
AT stonedavidj performanceofintensivecareunitseverityscoringsystemsacrossdifferentethnicitiesintheusaaretrospectiveobservationalstudy
AT celileoanthonyg performanceofintensivecareunitseverityscoringsystemsacrossdifferentethnicitiesintheusaaretrospectiveobservationalstudy