Learning to Ask Like a Physician

Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and, as a result, fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ ques...

Full description

Bibliographic Details
Main Author:	Lehman, Eric
Other Authors:	Szolovits, Peter
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/144613

_version_	1811078279770144768
author	Lehman, Eric
author2	Szolovits, Peter
author_facet	Szolovits, Peter Lehman, Eric
author_sort	Lehman, Eric
collection	MIT
description	Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and, as a result, fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are generated by medical experts from 100+ MIMIC-III discharge summaries. We analyze this dataset to characterize the types of information sought by medical experts. We also train baseline models for trigger detection and question generation (QG), paired with unsupervised answer retrieval over EHRs. Our baseline model is able to generate high quality questions in over 62% of cases when prompted with human selected triggers. We will release this dataset (and all code to reproduce baseline model results) to facilitate further research into realistic clinical QA and QG.
first_indexed	2024-09-23T10:57:10Z
format	Thesis
id	mit-1721.1/144613
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T10:57:10Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1446132022-08-30T03:45:31Z Learning to Ask Like a Physician Lehman, Eric Szolovits, Peter Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Existing question answering (QA) datasets derived from electronic health records (EHR) are artificially generated and, as a result, fail to capture realistic physician information needs. We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question. The questions are generated by medical experts from 100+ MIMIC-III discharge summaries. We analyze this dataset to characterize the types of information sought by medical experts. We also train baseline models for trigger detection and question generation (QG), paired with unsupervised answer retrieval over EHRs. Our baseline model is able to generate high quality questions in over 62% of cases when prompted with human selected triggers. We will release this dataset (and all code to reproduce baseline model results) to facilitate further research into realistic clinical QA and QG. S.M. 2022-08-29T15:59:38Z 2022-08-29T15:59:38Z 2022-05 2022-06-21T19:25:42.893Z Thesis https://hdl.handle.net/1721.1/144613 0000-0001-9919-2257 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lehman, Eric Learning to Ask Like a Physician
title	Learning to Ask Like a Physician
title_full	Learning to Ask Like a Physician
title_fullStr	Learning to Ask Like a Physician
title_full_unstemmed	Learning to Ask Like a Physician
title_short	Learning to Ask Like a Physician
title_sort	learning to ask like a physician
url	https://hdl.handle.net/1721.1/144613
work_keys_str_mv	AT lehmaneric learningtoasklikeaphysician

Learning to Ask Like a Physician

Similar Items