Evaluating large language models as agents in the clinic
Recent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conver...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-04-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-024-01083-y |
_version_ | 1797219557194072064 |
---|---|
author | Nikita Mehandru Brenda Y. Miao Eduardo Rodriguez Almaraz Madhumita Sushil Atul J. Butte Ahmed Alaa |
author_facet | Nikita Mehandru Brenda Y. Miao Eduardo Rodriguez Almaraz Madhumita Sushil Atul J. Butte Ahmed Alaa |
author_sort | Nikita Mehandru |
collection | DOAJ |
description | Recent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model’s ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as “Artificial Intelligence Structured Clinical Examinations” (“AI-SCE”), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings. |
first_indexed | 2024-04-24T12:35:32Z |
format | Article |
id | doaj.art-d5c48901ff664ec585eb34e9723b1362 |
institution | Directory Open Access Journal |
issn | 2398-6352 |
language | English |
last_indexed | 2024-04-24T12:35:32Z |
publishDate | 2024-04-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Digital Medicine |
spelling | doaj.art-d5c48901ff664ec585eb34e9723b13622024-04-07T11:31:52ZengNature Portfolionpj Digital Medicine2398-63522024-04-01711310.1038/s41746-024-01083-yEvaluating large language models as agents in the clinicNikita Mehandru0Brenda Y. Miao1Eduardo Rodriguez Almaraz2Madhumita Sushil3Atul J. Butte4Ahmed Alaa5University of California, BerkeleyBakar Computational Health Sciences Institute, University of California San FranciscoNeurosurgery Department Division of Neuro-Oncology, University of California San FranciscoBakar Computational Health Sciences Institute, University of California San FranciscoBakar Computational Health Sciences Institute, University of California San FranciscoUniversity of California, BerkeleyRecent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model’s ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as “Artificial Intelligence Structured Clinical Examinations” (“AI-SCE”), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings.https://doi.org/10.1038/s41746-024-01083-y |
spellingShingle | Nikita Mehandru Brenda Y. Miao Eduardo Rodriguez Almaraz Madhumita Sushil Atul J. Butte Ahmed Alaa Evaluating large language models as agents in the clinic npj Digital Medicine |
title | Evaluating large language models as agents in the clinic |
title_full | Evaluating large language models as agents in the clinic |
title_fullStr | Evaluating large language models as agents in the clinic |
title_full_unstemmed | Evaluating large language models as agents in the clinic |
title_short | Evaluating large language models as agents in the clinic |
title_sort | evaluating large language models as agents in the clinic |
url | https://doi.org/10.1038/s41746-024-01083-y |
work_keys_str_mv | AT nikitamehandru evaluatinglargelanguagemodelsasagentsintheclinic AT brendaymiao evaluatinglargelanguagemodelsasagentsintheclinic AT eduardorodriguezalmaraz evaluatinglargelanguagemodelsasagentsintheclinic AT madhumitasushil evaluatinglargelanguagemodelsasagentsintheclinic AT atuljbutte evaluatinglargelanguagemodelsasagentsintheclinic AT ahmedalaa evaluatinglargelanguagemodelsasagentsintheclinic |