Natural language processing of clinical notes enables early inborn error of immunity risk ascertainment

Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machi...

Full description

Bibliographic Details
Main Authors: Kirk Roberts, PhD, Aaron T. Chin, MD, Klaus Loewy, MS, Lisa Pompeii, PhD, Harold Shin, MS, Nicholas L. Rider, DO
Format: Article
Language:English
Published: Elsevier 2024-05-01
Series:Journal of Allergy and Clinical Immunology: Global
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772829324000201
Description
Summary:Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date. Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey. Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center’s electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis. Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI. Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.
ISSN:2772-8293