Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
BackgroundFamily health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2022-08-01
|
Series: | JMIR Medical Informatics |
Online Access: | https://medinform.jmir.org/2022/8/e37842 |
_version_ | 1827858064973234176 |
---|---|
author | Jianlin Shi Keaton L Morgan Richard L Bradshaw Se-Hee Jung Wendy Kohlmann Kimberly A Kaphingst Kensaku Kawamoto Guilherme Del Fiol |
author_facet | Jianlin Shi Keaton L Morgan Richard L Bradshaw Se-Hee Jung Wendy Kohlmann Kimberly A Kaphingst Kensaku Kawamoto Guilherme Del Fiol |
author_sort | Jianlin Shi |
collection | DOAJ |
description |
BackgroundFamily health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive.
ObjectiveThe aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP.
MethodsAlgorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR.
ResultsRegarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria.
ConclusionsCompared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers. |
first_indexed | 2024-03-12T12:49:30Z |
format | Article |
id | doaj.art-5ef1875d4cde4cf98c231945763af9e9 |
institution | Directory Open Access Journal |
issn | 2291-9694 |
language | English |
last_indexed | 2024-03-12T12:49:30Z |
publishDate | 2022-08-01 |
publisher | JMIR Publications |
record_format | Article |
series | JMIR Medical Informatics |
spelling | doaj.art-5ef1875d4cde4cf98c231945763af9e92023-08-28T22:52:14ZengJMIR PublicationsJMIR Medical Informatics2291-96942022-08-01108e3784210.2196/37842Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing ApproachJianlin Shihttps://orcid.org/0000-0003-2950-8038Keaton L Morganhttps://orcid.org/0000-0001-9140-4454Richard L Bradshawhttps://orcid.org/0000-0001-7363-0327Se-Hee Junghttps://orcid.org/0000-0001-8149-0993Wendy Kohlmannhttps://orcid.org/0000-0002-9134-9640Kimberly A Kaphingsthttps://orcid.org/0000-0003-2668-9080Kensaku Kawamotohttps://orcid.org/0000-0003-4282-9338Guilherme Del Fiolhttps://orcid.org/0000-0001-9954-6799 BackgroundFamily health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. ObjectiveThe aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. MethodsAlgorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. ResultsRegarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. ConclusionsCompared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers.https://medinform.jmir.org/2022/8/e37842 |
spellingShingle | Jianlin Shi Keaton L Morgan Richard L Bradshaw Se-Hee Jung Wendy Kohlmann Kimberly A Kaphingst Kensaku Kawamoto Guilherme Del Fiol Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach JMIR Medical Informatics |
title | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_full | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_fullStr | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_full_unstemmed | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_short | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_sort | identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the electronic health record natural language processing approach |
url | https://medinform.jmir.org/2022/8/e37842 |
work_keys_str_mv | AT jianlinshi identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT keatonlmorgan identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT richardlbradshaw identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT seheejung identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT wendykohlmann identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT kimberlyakaphingst identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT kensakukawamoto identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT guilhermedelfiol identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach |