A dataset of simulated patient-physician medical interviews with a focus on respiratory cases

Abstract Artificial Intelligence (AI) is playing a major role in medical education, diagnosis, and outbreak detection through Natural Language Processing (NLP), machine learning models and deep learning tools. However, in order to train AI to facilitate these medical fields, well-documented and accu...

Full description

Bibliographic Details
Main Authors: Faiha Fareez, Tishya Parikh, Christopher Wavell, Saba Shahab, Meghan Chevalier, Scott Good, Isabella De Blasi, Rafik Rhouma, Christopher McMahon, Jean-Paul Lam, Thomas Lo, Christopher W. Smith
Format: Article
Language:English
Published: Nature Portfolio 2022-06-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-022-01423-1
_version_ 1797818206070505472
author Faiha Fareez
Tishya Parikh
Christopher Wavell
Saba Shahab
Meghan Chevalier
Scott Good
Isabella De Blasi
Rafik Rhouma
Christopher McMahon
Jean-Paul Lam
Thomas Lo
Christopher W. Smith
author_facet Faiha Fareez
Tishya Parikh
Christopher Wavell
Saba Shahab
Meghan Chevalier
Scott Good
Isabella De Blasi
Rafik Rhouma
Christopher McMahon
Jean-Paul Lam
Thomas Lo
Christopher W. Smith
author_sort Faiha Fareez
collection DOAJ
description Abstract Artificial Intelligence (AI) is playing a major role in medical education, diagnosis, and outbreak detection through Natural Language Processing (NLP), machine learning models and deep learning tools. However, in order to train AI to facilitate these medical fields, well-documented and accurate medical conversations are needed. The dataset presented covers a series of medical conversations in the format of Objective Structured Clinical Examinations (OSCE), with a focus on respiratory cases in audio format and corresponding text documents. These cases were simulated, recorded, transcribed, and manually corrected with the underlying aim of providing a comprehensive set of medical conversation data to the academic and industry community. Potential applications include speech recognition detection for speech-to-text errors, training NLP models to extract symptoms, detecting diseases, or for educational purposes, including training an avatar to converse with healthcare professional students as a standardized patient during clinical examinations. The application opportunities for the presented dataset are vast, given that this calibre of data is difficult to access and costly to develop.
first_indexed 2024-03-13T09:04:43Z
format Article
id doaj.art-56d168ddbcbc47e183a66e82e71fa6c5
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-03-13T09:04:43Z
publishDate 2022-06-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-56d168ddbcbc47e183a66e82e71fa6c52023-05-28T11:08:17ZengNature PortfolioScientific Data2052-44632022-06-01911710.1038/s41597-022-01423-1A dataset of simulated patient-physician medical interviews with a focus on respiratory casesFaiha Fareez0Tishya Parikh1Christopher Wavell2Saba Shahab3Meghan Chevalier4Scott Good5Isabella De Blasi6Rafik Rhouma7Christopher McMahon8Jean-Paul Lam9Thomas Lo10Christopher W. Smith11Western UniversityWestern UniversityWestern UniversityWestern UniversityWestern UniversityWestern UniversityWestern UniversityGoodlabs StudioGoodlabs StudioGoodlabs StudioGoodlabs StudioWestern UniversityAbstract Artificial Intelligence (AI) is playing a major role in medical education, diagnosis, and outbreak detection through Natural Language Processing (NLP), machine learning models and deep learning tools. However, in order to train AI to facilitate these medical fields, well-documented and accurate medical conversations are needed. The dataset presented covers a series of medical conversations in the format of Objective Structured Clinical Examinations (OSCE), with a focus on respiratory cases in audio format and corresponding text documents. These cases were simulated, recorded, transcribed, and manually corrected with the underlying aim of providing a comprehensive set of medical conversation data to the academic and industry community. Potential applications include speech recognition detection for speech-to-text errors, training NLP models to extract symptoms, detecting diseases, or for educational purposes, including training an avatar to converse with healthcare professional students as a standardized patient during clinical examinations. The application opportunities for the presented dataset are vast, given that this calibre of data is difficult to access and costly to develop.https://doi.org/10.1038/s41597-022-01423-1
spellingShingle Faiha Fareez
Tishya Parikh
Christopher Wavell
Saba Shahab
Meghan Chevalier
Scott Good
Isabella De Blasi
Rafik Rhouma
Christopher McMahon
Jean-Paul Lam
Thomas Lo
Christopher W. Smith
A dataset of simulated patient-physician medical interviews with a focus on respiratory cases
Scientific Data
title A dataset of simulated patient-physician medical interviews with a focus on respiratory cases
title_full A dataset of simulated patient-physician medical interviews with a focus on respiratory cases
title_fullStr A dataset of simulated patient-physician medical interviews with a focus on respiratory cases
title_full_unstemmed A dataset of simulated patient-physician medical interviews with a focus on respiratory cases
title_short A dataset of simulated patient-physician medical interviews with a focus on respiratory cases
title_sort dataset of simulated patient physician medical interviews with a focus on respiratory cases
url https://doi.org/10.1038/s41597-022-01423-1
work_keys_str_mv AT faihafareez adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT tishyaparikh adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT christopherwavell adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT sabashahab adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT meghanchevalier adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT scottgood adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT isabelladeblasi adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT rafikrhouma adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT christophermcmahon adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT jeanpaullam adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT thomaslo adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT christopherwsmith adatasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT faihafareez datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT tishyaparikh datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT christopherwavell datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT sabashahab datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT meghanchevalier datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT scottgood datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT isabelladeblasi datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT rafikrhouma datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT christophermcmahon datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT jeanpaullam datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT thomaslo datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases
AT christopherwsmith datasetofsimulatedpatientphysicianmedicalinterviewswithafocusonrespiratorycases