Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context

Summary: Background: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the ele...

Full description

Bibliographic Details
Main Authors: Carolin V. Schneider, Tang Li, David Zhang, Anya I. Mezina, Puru Rattan, Helen Huang, Kate Townsend Creasy, Eleonora Scorletti, Inuk Zandvakili, Marijana Vujkovic, Leonida Hehl, Jacob Fiksel, Joseph Park, Kirk Wangensteen, Marjorie Risman, Kyong-Mi Chang, Marina Serper, Rotonya M. Carr, Kai Markus Schneider, Jinbo Chen, Daniel J. Rader
Format: Article
Language:English
Published: Elsevier 2023-08-01
Series:EClinicalMedicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2589537023003267
_version_ 1797738928170598400
author Carolin V. Schneider
Tang Li
David Zhang
Anya I. Mezina
Puru Rattan
Helen Huang
Kate Townsend Creasy
Eleonora Scorletti
Inuk Zandvakili
Marijana Vujkovic
Leonida Hehl
Jacob Fiksel
Joseph Park
Kirk Wangensteen
Marjorie Risman
Kyong-Mi Chang
Marina Serper
Rotonya M. Carr
Kai Markus Schneider
Jinbo Chen
Daniel J. Rader
author_facet Carolin V. Schneider
Tang Li
David Zhang
Anya I. Mezina
Puru Rattan
Helen Huang
Kate Townsend Creasy
Eleonora Scorletti
Inuk Zandvakili
Marijana Vujkovic
Leonida Hehl
Jacob Fiksel
Joseph Park
Kirk Wangensteen
Marjorie Risman
Kyong-Mi Chang
Marina Serper
Rotonya M. Carr
Kai Markus Schneider
Jinbo Chen
Daniel J. Rader
author_sort Carolin V. Schneider
collection DOAJ
description Summary: Background: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports. Methods: A rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%). Findings: After exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis. Interpretation: This study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes. Funding: The study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process.
first_indexed 2024-03-12T13:50:56Z
format Article
id doaj.art-54d545ab8c4f481f8d80186030f7ad3b
institution Directory Open Access Journal
issn 2589-5370
language English
last_indexed 2024-03-12T13:50:56Z
publishDate 2023-08-01
publisher Elsevier
record_format Article
series EClinicalMedicine
spelling doaj.art-54d545ab8c4f481f8d80186030f7ad3b2023-08-23T04:34:08ZengElsevierEClinicalMedicine2589-53702023-08-0162102149Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in contextCarolin V. Schneider0Tang Li1David Zhang2Anya I. Mezina3Puru Rattan4Helen Huang5Kate Townsend Creasy6Eleonora Scorletti7Inuk Zandvakili8Marijana Vujkovic9Leonida Hehl10Jacob Fiksel11Joseph Park12Kirk Wangensteen13Marjorie Risman14Kyong-Mi Chang15Marina Serper16Rotonya M. Carr17Kai Markus Schneider18Jinbo Chen19Daniel J. Rader20Division of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Medicine III, RWTH Aachen University, Aachen, Germany; Corresponding author. RWTH Aachen University, Pauwelsstr.30, Aachen 52074, Germany.Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Division of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Division of Digestive Diseases, Department of Internal Medicine, College of Medicine, University of Cincinnati, Cincinnati, OH 45267, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA 19104, USADepartment of Medicine III, RWTH Aachen University, Aachen, GermanyDepartment of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADepartment of Medicine, Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN 55902, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA 19104, USADivision of Gastroenterology and Hepatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Corporal Michael J. Crescenz VA Medical Center, Philadelphia, PA 19104, USADepartment of Medicine, Division of Gastroenterology, University of Washington, Seattle, WA 98195, USADepartment of Medicine III, RWTH Aachen University, Aachen, GermanyDepartment of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USADivision of Translational Medicine and Human Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USASummary: Background: Nonalcoholic fatty liver disease (NAFLD) is a major cause of liver-related morbidity in people with and without diabetes, but it is underdiagnosed, posing challenges for research and clinical management. Here, we determine if natural language processing (NLP) of data in the electronic health record (EHR) could identify undiagnosed patients with hepatic steatosis based on pathology and radiology reports. Methods: A rule-based NLP algorithm was built using a Linguamatics literature text mining tool to search 2.15 million pathology report and 2.7 million imaging reports in the Penn Medicine EHR from November 2014, through December 2020, for evidence of hepatic steatosis. For quality control, two independent physicians manually reviewed randomly chosen biopsy and imaging reports (n = 353, PPV 99.7%). Findings: After exclusion of individuals with other causes of hepatic steatosis, 3007 patients with biopsy-proven NAFLD and 42,083 patients with imaging-proven NAFLD were identified. Interestingly, elevated ALT was not a sensitive predictor of the presence of steatosis, and only half of the biopsied patients with steatosis ever received an ICD diagnosis code for the presence of NAFLD/NASH. There was a robust association for PNPLA3 and TM6SF2 risk alleles and steatosis identified by NLP. We identified 234 disorders that were significantly over- or underrepresented in all subjects with steatosis and identified changes in serum markers (e.g., GGT) associated with presence of steatosis. Interpretation: This study demonstrates clear feasibility of NLP-based approaches to identify patients whose steatosis was indicated in imaging and pathology reports within a large healthcare system and uncovers undercoding of NAFLD in the general population. Identification of patients at risk could link them to improved care and outcomes. Funding: The study was funded by US and German funding sources that did provide financial support only and had no influence or control over the research process.http://www.sciencedirect.com/science/article/pii/S2589537023003267Liver diseaseNAFLDBiopsyEHRNatural language processing
spellingShingle Carolin V. Schneider
Tang Li
David Zhang
Anya I. Mezina
Puru Rattan
Helen Huang
Kate Townsend Creasy
Eleonora Scorletti
Inuk Zandvakili
Marijana Vujkovic
Leonida Hehl
Jacob Fiksel
Joseph Park
Kirk Wangensteen
Marjorie Risman
Kyong-Mi Chang
Marina Serper
Rotonya M. Carr
Kai Markus Schneider
Jinbo Chen
Daniel J. Rader
Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
EClinicalMedicine
Liver disease
NAFLD
Biopsy
EHR
Natural language processing
title Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
title_full Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
title_fullStr Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
title_full_unstemmed Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
title_short Large-scale identification of undiagnosed hepatic steatosis using natural language processingResearch in context
title_sort large scale identification of undiagnosed hepatic steatosis using natural language processingresearch in context
topic Liver disease
NAFLD
Biopsy
EHR
Natural language processing
url http://www.sciencedirect.com/science/article/pii/S2589537023003267
work_keys_str_mv AT carolinvschneider largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT tangli largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT davidzhang largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT anyaimezina largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT pururattan largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT helenhuang largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT katetownsendcreasy largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT eleonorascorletti largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT inukzandvakili largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT marijanavujkovic largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT leonidahehl largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT jacobfiksel largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT josephpark largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT kirkwangensteen largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT marjorierisman largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT kyongmichang largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT marinaserper largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT rotonyamcarr largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT kaimarkusschneider largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT jinbochen largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext
AT danieljrader largescaleidentificationofundiagnosedhepaticsteatosisusingnaturallanguageprocessingresearchincontext