Considerations in the reliability and fairness audits of predictive models for advance care planning

Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we condu...

Full description

Bibliographic Details
Main Authors: Jonathan Lu, Amelia Sattler, Samantha Wang, Ali Raza Khaki, Alison Callahan, Scott Fleming, Rebecca Fong, Benjamin Ehlert, Ron C. Li, Lisa Shieh, Kavitha Ramchandran, Michael F. Gensheimer, Sarah Chobot, Stephen Pfohl, Siyun Li, Kenny Shum, Nitin Parikh, Priya Desai, Briththa Seevaratnam, Melanie Hanson, Margaret Smith, Yizhe Xu, Arjun Gokhale, Steven Lin, Michael A. Pfeffer, Winifred Teuteberg, Nigam H. Shah
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-09-01
Series:Frontiers in Digital Health
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fdgth.2022.943768/full
_version_ 1797990495663685632
author Jonathan Lu
Amelia Sattler
Samantha Wang
Ali Raza Khaki
Alison Callahan
Scott Fleming
Rebecca Fong
Benjamin Ehlert
Ron C. Li
Lisa Shieh
Kavitha Ramchandran
Michael F. Gensheimer
Sarah Chobot
Stephen Pfohl
Siyun Li
Kenny Shum
Nitin Parikh
Priya Desai
Briththa Seevaratnam
Melanie Hanson
Margaret Smith
Yizhe Xu
Arjun Gokhale
Steven Lin
Michael A. Pfeffer
Michael A. Pfeffer
Winifred Teuteberg
Nigam H. Shah
Nigam H. Shah
Nigam H. Shah
author_facet Jonathan Lu
Amelia Sattler
Samantha Wang
Ali Raza Khaki
Alison Callahan
Scott Fleming
Rebecca Fong
Benjamin Ehlert
Ron C. Li
Lisa Shieh
Kavitha Ramchandran
Michael F. Gensheimer
Sarah Chobot
Stephen Pfohl
Siyun Li
Kenny Shum
Nitin Parikh
Priya Desai
Briththa Seevaratnam
Melanie Hanson
Margaret Smith
Yizhe Xu
Arjun Gokhale
Steven Lin
Michael A. Pfeffer
Michael A. Pfeffer
Winifred Teuteberg
Nigam H. Shah
Nigam H. Shah
Nigam H. Shah
author_sort Jonathan Lu
collection DOAJ
description Multiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.
first_indexed 2024-04-11T08:36:29Z
format Article
id doaj.art-415e3b31e32b4bb2a73dd32950a37998
institution Directory Open Access Journal
issn 2673-253X
language English
last_indexed 2024-04-11T08:36:29Z
publishDate 2022-09-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Digital Health
spelling doaj.art-415e3b31e32b4bb2a73dd32950a379982022-12-22T04:34:18ZengFrontiers Media S.A.Frontiers in Digital Health2673-253X2022-09-01410.3389/fdgth.2022.943768943768Considerations in the reliability and fairness audits of predictive models for advance care planningJonathan Lu0Amelia Sattler1Samantha Wang2Ali Raza Khaki3Alison Callahan4Scott Fleming5Rebecca Fong6Benjamin Ehlert7Ron C. Li8Lisa Shieh9Kavitha Ramchandran10Michael F. Gensheimer11Sarah Chobot12Stephen Pfohl13Siyun Li14Kenny Shum15Nitin Parikh16Priya Desai17Briththa Seevaratnam18Melanie Hanson19Margaret Smith20Yizhe Xu21Arjun Gokhale22Steven Lin23Michael A. Pfeffer24Michael A. Pfeffer25Winifred Teuteberg26Nigam H. Shah27Nigam H. Shah28Nigam H. Shah29Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesStanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDivision of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDivision of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesSerious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDivision of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDivision of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDivision of Oncology, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDepartment of Radiation Oncology, Stanford University School of Medicine, Palo Alto, United StatesInpatient Palliative Care, Stanford Health Care, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesTechnology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United StatesTechnology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United StatesTechnology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United StatesSerious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesSerious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesStanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesStanford Healthcare AI Applied Research Team, Division of Primary Care and Population Health, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesDivision of Hospital Medicine, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesTechnology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United StatesSerious Illness Care Program, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesCenter for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Palo Alto, United StatesTechnology / Digital Solutions, Stanford Health Care and Stanford University School of Medicine, Palo Alto, United StatesClinical Excellence Research Center, Stanford University School of Medicine, Palo Alto, United StatesMultiple reporting guidelines for artificial intelligence (AI) models in healthcare recommend that models be audited for reliability and fairness. However, there is a gap of operational guidance for performing reliability and fairness audits in practice. Following guideline recommendations, we conducted a reliability audit of two models based on model performance and calibration as well as a fairness audit based on summary statistics, subgroup performance and subgroup calibration. We assessed the Epic End-of-Life (EOL) Index model and an internally developed Stanford Hospital Medicine (HM) Advance Care Planning (ACP) model in 3 practice settings: Primary Care, Inpatient Oncology and Hospital Medicine, using clinicians' answers to the surprise question (“Would you be surprised if [patient X] passed away in [Y years]?”) as a surrogate outcome. For performance, the models had positive predictive value (PPV) at or above 0.76 in all settings. In Hospital Medicine and Inpatient Oncology, the Stanford HM ACP model had higher sensitivity (0.69, 0.89 respectively) than the EOL model (0.20, 0.27), and better calibration (O/E 1.5, 1.7) than the EOL model (O/E 2.5, 3.0). The Epic EOL model flagged fewer patients (11%, 21% respectively) than the Stanford HM ACP model (38%, 75%). There were no differences in performance and calibration by sex. Both models had lower sensitivity in Hispanic/Latino male patients with Race listed as “Other.” 10 clinicians were surveyed after a presentation summarizing the audit. 10/10 reported that summary statistics, overall performance, and subgroup performance would affect their decision to use the model to guide care; 9/10 said the same for overall and subgroup calibration. The most commonly identified barriers for routinely conducting such reliability and fairness audits were poor demographic data quality and lack of data access. This audit required 115 person-hours across 8–10 months. Our recommendations for performing reliability and fairness audits include verifying data validity, analyzing model performance on intersectional subgroups, and collecting clinician-patient linkages as necessary for label generation by clinicians. Those responsible for AI models should require such audits before model deployment and mediate between model auditors and impacted stakeholders.https://www.frontiersin.org/articles/10.3389/fdgth.2022.943768/fullmodel reporting guidelineelectronic health recordartificial intelligenceadvance care planningfairnessaudit
spellingShingle Jonathan Lu
Amelia Sattler
Samantha Wang
Ali Raza Khaki
Alison Callahan
Scott Fleming
Rebecca Fong
Benjamin Ehlert
Ron C. Li
Lisa Shieh
Kavitha Ramchandran
Michael F. Gensheimer
Sarah Chobot
Stephen Pfohl
Siyun Li
Kenny Shum
Nitin Parikh
Priya Desai
Briththa Seevaratnam
Melanie Hanson
Margaret Smith
Yizhe Xu
Arjun Gokhale
Steven Lin
Michael A. Pfeffer
Michael A. Pfeffer
Winifred Teuteberg
Nigam H. Shah
Nigam H. Shah
Nigam H. Shah
Considerations in the reliability and fairness audits of predictive models for advance care planning
Frontiers in Digital Health
model reporting guideline
electronic health record
artificial intelligence
advance care planning
fairness
audit
title Considerations in the reliability and fairness audits of predictive models for advance care planning
title_full Considerations in the reliability and fairness audits of predictive models for advance care planning
title_fullStr Considerations in the reliability and fairness audits of predictive models for advance care planning
title_full_unstemmed Considerations in the reliability and fairness audits of predictive models for advance care planning
title_short Considerations in the reliability and fairness audits of predictive models for advance care planning
title_sort considerations in the reliability and fairness audits of predictive models for advance care planning
topic model reporting guideline
electronic health record
artificial intelligence
advance care planning
fairness
audit
url https://www.frontiersin.org/articles/10.3389/fdgth.2022.943768/full
work_keys_str_mv AT jonathanlu considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT ameliasattler considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT samanthawang considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT alirazakhaki considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT alisoncallahan considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT scottfleming considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT rebeccafong considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT benjaminehlert considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT roncli considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT lisashieh considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT kavitharamchandran considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT michaelfgensheimer considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT sarahchobot considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT stephenpfohl considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT siyunli considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT kennyshum considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT nitinparikh considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT priyadesai considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT briththaseevaratnam considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT melaniehanson considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT margaretsmith considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT yizhexu considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT arjungokhale considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT stevenlin considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT michaelapfeffer considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT michaelapfeffer considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT winifredteuteberg considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT nigamhshah considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT nigamhshah considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning
AT nigamhshah considerationsinthereliabilityandfairnessauditsofpredictivemodelsforadvancecareplanning