Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach

BackgroundFamily health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible...

Full description

Bibliographic Details
Main Authors: Jianlin Shi, Keaton L Morgan, Richard L Bradshaw, Se-Hee Jung, Wendy Kohlmann, Kimberly A Kaphingst, Kensaku Kawamoto, Guilherme Del Fiol
Format: Article
Language:English
Published: JMIR Publications 2022-08-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2022/8/e37842
_version_ 1827858064973234176
author Jianlin Shi
Keaton L Morgan
Richard L Bradshaw
Se-Hee Jung
Wendy Kohlmann
Kimberly A Kaphingst
Kensaku Kawamoto
Guilherme Del Fiol
author_facet Jianlin Shi
Keaton L Morgan
Richard L Bradshaw
Se-Hee Jung
Wendy Kohlmann
Kimberly A Kaphingst
Kensaku Kawamoto
Guilherme Del Fiol
author_sort Jianlin Shi
collection DOAJ
description BackgroundFamily health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. ObjectiveThe aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. MethodsAlgorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. ResultsRegarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. ConclusionsCompared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers.
first_indexed 2024-03-12T12:49:30Z
format Article
id doaj.art-5ef1875d4cde4cf98c231945763af9e9
institution Directory Open Access Journal
issn 2291-9694
language English
last_indexed 2024-03-12T12:49:30Z
publishDate 2022-08-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj.art-5ef1875d4cde4cf98c231945763af9e92023-08-28T22:52:14ZengJMIR PublicationsJMIR Medical Informatics2291-96942022-08-01108e3784210.2196/37842Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing ApproachJianlin Shihttps://orcid.org/0000-0003-2950-8038Keaton L Morganhttps://orcid.org/0000-0001-9140-4454Richard L Bradshawhttps://orcid.org/0000-0001-7363-0327Se-Hee Junghttps://orcid.org/0000-0001-8149-0993Wendy Kohlmannhttps://orcid.org/0000-0002-9134-9640Kimberly A Kaphingsthttps://orcid.org/0000-0003-2668-9080Kensaku Kawamotohttps://orcid.org/0000-0003-4282-9338Guilherme Del Fiolhttps://orcid.org/0000-0001-9954-6799 BackgroundFamily health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. ObjectiveThe aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. MethodsAlgorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. ResultsRegarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. ConclusionsCompared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers.https://medinform.jmir.org/2022/8/e37842
spellingShingle Jianlin Shi
Keaton L Morgan
Richard L Bradshaw
Se-Hee Jung
Wendy Kohlmann
Kimberly A Kaphingst
Kensaku Kawamoto
Guilherme Del Fiol
Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
JMIR Medical Informatics
title Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_full Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_fullStr Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_full_unstemmed Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_short Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_sort identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the electronic health record natural language processing approach
url https://medinform.jmir.org/2022/8/e37842
work_keys_str_mv AT jianlinshi identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT keatonlmorgan identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT richardlbradshaw identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT seheejung identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT wendykohlmann identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT kimberlyakaphingst identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT kensakukawamoto identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT guilhermedelfiol identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach