Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study

BackgroundReal-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiative...

Full description

Bibliographic Details
Main Authors: Andrew J McMurry, Amy R Zipursky, Alon Geva, Karen L Olson, James R Jones, Vladimir Ignatov, Timothy A Miller, Kenneth D Mandl
Format: Article
Language:English
Published: JMIR Publications 2024-04-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2024/1/e53367
_version_ 1797222425096617984
author Andrew J McMurry
Amy R Zipursky
Alon Geva
Karen L Olson
James R Jones
Vladimir Ignatov
Timothy A Miller
Kenneth D Mandl
author_facet Andrew J McMurry
Amy R Zipursky
Alon Geva
Karen L Olson
James R Jones
Vladimir Ignatov
Timothy A Miller
Kenneth D Mandl
author_sort Andrew J McMurry
collection DOAJ
description BackgroundReal-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. ObjectiveThis study sought to validate and test an artificial intelligence (AI)–based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. MethodsSubjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children’s hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. ResultsThere were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. ConclusionsThis study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.
first_indexed 2024-04-24T13:21:07Z
format Article
id doaj.art-14718324a39a4111a4b2adf14e91cf2c
institution Directory Open Access Journal
issn 1438-8871
language English
last_indexed 2024-04-24T13:21:07Z
publishDate 2024-04-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj.art-14718324a39a4111a4b2adf14e91cf2c2024-04-04T14:00:36ZengJMIR PublicationsJournal of Medical Internet Research1438-88712024-04-0126e5336710.2196/53367Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort StudyAndrew J McMurryhttps://orcid.org/0000-0001-5604-0704Amy R Zipurskyhttps://orcid.org/0000-0002-3003-2818Alon Gevahttps://orcid.org/0000-0002-8574-0133Karen L Olsonhttps://orcid.org/0000-0002-5124-6129James R Joneshttps://orcid.org/0009-0001-2940-3634Vladimir Ignatovhttps://orcid.org/0009-0009-5743-1825Timothy A Millerhttps://orcid.org/0000-0003-4513-403XKenneth D Mandlhttps://orcid.org/0000-0002-9781-0477 BackgroundReal-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records. ObjectiveThis study sought to validate and test an artificial intelligence (AI)–based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak. MethodsSubjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children’s hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras. ResultsThere were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP: F1-score=0.828 and ICD-10: F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras. ConclusionsThis study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.https://www.jmir.org/2024/1/e53367
spellingShingle Andrew J McMurry
Amy R Zipursky
Alon Geva
Karen L Olson
James R Jones
Vladimir Ignatov
Timothy A Miller
Kenneth D Mandl
Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study
Journal of Medical Internet Research
title Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study
title_full Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study
title_fullStr Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study
title_full_unstemmed Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study
title_short Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study
title_sort moving biosurveillance beyond coded data using ai for symptom detection from physician notes retrospective cohort study
url https://www.jmir.org/2024/1/e53367
work_keys_str_mv AT andrewjmcmurry movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT amyrzipursky movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT alongeva movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT karenlolson movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT jamesrjones movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT vladimirignatov movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT timothyamiller movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy
AT kennethdmandl movingbiosurveillancebeyondcodeddatausingaiforsymptomdetectionfromphysiciannotesretrospectivecohortstudy