A study of generative large language model for medical research and healthcare

Abstract There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billi...

Full description

Bibliographic Details
Main Authors: Cheng Peng, Xi Yang, Aokun Chen, Kaleb E. Smith, Nima PourNejatian, Anthony B. Costa, Cheryl Martin, Mona G. Flores, Ying Zhang, Tanja Magoc, Gloria Lipori, Duane A. Mitchell, Naykky S. Ospina, Mustafa M. Ahmed, William R. Hogan, Elizabeth A. Shenkman, Yi Guo, Jiang Bian, Yonghui Wu
Format: Article
Language:English
Published: Nature Portfolio 2023-11-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-023-00958-w
_version_ 1797556268108349440
author Cheng Peng
Xi Yang
Aokun Chen
Kaleb E. Smith
Nima PourNejatian
Anthony B. Costa
Cheryl Martin
Mona G. Flores
Ying Zhang
Tanja Magoc
Gloria Lipori
Duane A. Mitchell
Naykky S. Ospina
Mustafa M. Ahmed
William R. Hogan
Elizabeth A. Shenkman
Yi Guo
Jiang Bian
Yonghui Wu
author_facet Cheng Peng
Xi Yang
Aokun Chen
Kaleb E. Smith
Nima PourNejatian
Anthony B. Costa
Cheryl Martin
Mona G. Flores
Ying Zhang
Tanja Magoc
Gloria Lipori
Duane A. Mitchell
Naykky S. Ospina
Mustafa M. Ahmed
William R. Hogan
Elizabeth A. Shenkman
Yi Guo
Jiang Bian
Yonghui Wu
author_sort Cheng Peng
collection DOAJ
description Abstract There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians’ Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.
first_indexed 2024-03-10T17:00:24Z
format Article
id doaj.art-ef2ddec4ebcf44af961dc0a1bd072263
institution Directory Open Access Journal
issn 2398-6352
language English
last_indexed 2024-03-10T17:00:24Z
publishDate 2023-11-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj.art-ef2ddec4ebcf44af961dc0a1bd0722632023-11-20T11:00:52ZengNature Portfolionpj Digital Medicine2398-63522023-11-016111010.1038/s41746-023-00958-wA study of generative large language model for medical research and healthcareCheng Peng0Xi Yang1Aokun Chen2Kaleb E. Smith3Nima PourNejatian4Anthony B. Costa5Cheryl Martin6Mona G. Flores7Ying Zhang8Tanja Magoc9Gloria Lipori10Duane A. Mitchell11Naykky S. Ospina12Mustafa M. Ahmed13William R. Hogan14Elizabeth A. Shenkman15Yi Guo16Jiang Bian17Yonghui Wu18Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaNVIDIANVIDIANVIDIANVIDIANVIDIAResearch Computing, University of FloridaIntegrated Data Repository Research Services, University of FloridaIntegrated Data Repository Research Services, University of FloridaLillian S. Wells Department of Neurosurgery, Clinical and Translational Science Institute, University of FloridaDivision of Endocrinology, Department of Medicine, College of Medicine, University of FloridaDivision of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of FloridaAbstract There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians’ Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.https://doi.org/10.1038/s41746-023-00958-w
spellingShingle Cheng Peng
Xi Yang
Aokun Chen
Kaleb E. Smith
Nima PourNejatian
Anthony B. Costa
Cheryl Martin
Mona G. Flores
Ying Zhang
Tanja Magoc
Gloria Lipori
Duane A. Mitchell
Naykky S. Ospina
Mustafa M. Ahmed
William R. Hogan
Elizabeth A. Shenkman
Yi Guo
Jiang Bian
Yonghui Wu
A study of generative large language model for medical research and healthcare
npj Digital Medicine
title A study of generative large language model for medical research and healthcare
title_full A study of generative large language model for medical research and healthcare
title_fullStr A study of generative large language model for medical research and healthcare
title_full_unstemmed A study of generative large language model for medical research and healthcare
title_short A study of generative large language model for medical research and healthcare
title_sort study of generative large language model for medical research and healthcare
url https://doi.org/10.1038/s41746-023-00958-w
work_keys_str_mv AT chengpeng astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT xiyang astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT aokunchen astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT kalebesmith astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT nimapournejatian astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT anthonybcosta astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT cherylmartin astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT monagflores astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT yingzhang astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT tanjamagoc astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT glorialipori astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT duaneamitchell astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT naykkysospina astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT mustafamahmed astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT williamrhogan astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT elizabethashenkman astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT yiguo astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT jiangbian astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT yonghuiwu astudyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT chengpeng studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT xiyang studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT aokunchen studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT kalebesmith studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT nimapournejatian studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT anthonybcosta studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT cherylmartin studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT monagflores studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT yingzhang studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT tanjamagoc studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT glorialipori studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT duaneamitchell studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT naykkysospina studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT mustafamahmed studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT williamrhogan studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT elizabethashenkman studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT yiguo studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT jiangbian studyofgenerativelargelanguagemodelformedicalresearchandhealthcare
AT yonghuiwu studyofgenerativelargelanguagemodelformedicalresearchandhealthcare