Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machi...

Full description

Bibliographic Details
Main Authors: Kirk Roberts, PhD, Aaron T. Chin, MD, Klaus Loewy, MS, Lisa Pompeii, PhD, Harold Shin, MS, Nicholas L. Rider, DO
Format: Article
Language:English
Published: Elsevier 2024-05-01
Series:Journal of Allergy and Clinical Immunology: Global
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772829324000201
_version_ 1797289646064926720
author Kirk Roberts, PhD
Aaron T. Chin, MD
Klaus Loewy, MS
Lisa Pompeii, PhD
Harold Shin, MS
Nicholas L. Rider, DO
author_facet Kirk Roberts, PhD
Aaron T. Chin, MD
Klaus Loewy, MS
Lisa Pompeii, PhD
Harold Shin, MS
Nicholas L. Rider, DO
author_sort Kirk Roberts, PhD
collection DOAJ
description Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date. Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey. Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center’s electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis. Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI. Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.
first_indexed 2024-03-07T19:08:08Z
format Article
id doaj.art-29e51555a77543379d8757b0f61c0338
institution Directory Open Access Journal
issn 2772-8293
language English
last_indexed 2024-03-07T19:08:08Z
publishDate 2024-05-01
publisher Elsevier
record_format Article
series Journal of Allergy and Clinical Immunology: Global
spelling doaj.art-29e51555a77543379d8757b0f61c03382024-03-01T05:07:45ZengElsevierJournal of Allergy and Clinical Immunology: Global2772-82932024-05-0132100224Natural language processing of clinical notes enables early inborn error of immunity risk ascertainmentKirk Roberts, PhD0Aaron T. Chin, MD1Klaus Loewy, MS2Lisa Pompeii, PhD3Harold Shin, MS4Nicholas L. Rider, DO5McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TexDivision of Immunology, Allergy, and Rheumatology, University of California, Los Angeles, CalifTexas Children’s Hospital, Houston, TexDepartment of Patient Services, Cincinnati Children’s Hospital Medical Center, Cincinnati, OhioCollege of Osteopathic Medicine, Liberty University, Lynchburg, VaDivision of Health System & Implementation Science, Virginia Tech Carilion School of Medicine, Roanoke, Va; Section of Allergy and Immunology, Carilion Clinic, Roanoke, Va; Corresponding author: Nicholas L. Rider, DO, Virginia Tech Carilion School of Medicine, 1 Riverside Circle, 249 Roanoke, VA 24016.Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date. Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey. Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center’s electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis. Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI. Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.http://www.sciencedirect.com/science/article/pii/S2772829324000201Natural language processingmachine learningtext mininginborn errors of immunityprimary immunodeficiencydiagnosis
spellingShingle Kirk Roberts, PhD
Aaron T. Chin, MD
Klaus Loewy, MS
Lisa Pompeii, PhD
Harold Shin, MS
Nicholas L. Rider, DO
Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
Journal of Allergy and Clinical Immunology: Global
Natural language processing
machine learning
text mining
inborn errors of immunity
primary immunodeficiency
diagnosis
title Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
title_full Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
title_fullStr Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
title_full_unstemmed Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
title_short Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
title_sort natural language processing of clinical notes enables early inborn error of immunity risk ascertainment
topic Natural language processing
machine learning
text mining
inborn errors of immunity
primary immunodeficiency
diagnosis
url http://www.sciencedirect.com/science/article/pii/S2772829324000201
work_keys_str_mv AT kirkrobertsphd naturallanguageprocessingofclinicalnotesenablesearlyinbornerrorofimmunityriskascertainment
AT aarontchinmd naturallanguageprocessingofclinicalnotesenablesearlyinbornerrorofimmunityriskascertainment
AT klausloewyms naturallanguageprocessingofclinicalnotesenablesearlyinbornerrorofimmunityriskascertainment
AT lisapompeiiphd naturallanguageprocessingofclinicalnotesenablesearlyinbornerrorofimmunityriskascertainment
AT haroldshinms naturallanguageprocessingofclinicalnotesenablesearlyinbornerrorofimmunityriskascertainment
AT nicholaslriderdo naturallanguageprocessingofclinicalnotesenablesearlyinbornerrorofimmunityriskascertainment