Systematic review of natural language processing for recurrent cancer detection from electronic medical records

This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies...

Full description

Bibliographic Details
Main Authors: Ekapob Sangariyavanich, Wanchana Ponthongmak, Amarit Tansawet, Nawanan Theera-Ampornpunt, Pawin Numthavaj, Gareth J. McKay, John Attia, Ammarin Thakkinstian
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914823001727
_version_ 1827823007476744192
author Ekapob Sangariyavanich
Wanchana Ponthongmak
Amarit Tansawet
Nawanan Theera-Ampornpunt
Pawin Numthavaj
Gareth J. McKay
John Attia
Ammarin Thakkinstian
author_facet Ekapob Sangariyavanich
Wanchana Ponthongmak
Amarit Tansawet
Nawanan Theera-Ampornpunt
Pawin Numthavaj
Gareth J. McKay
John Attia
Ammarin Thakkinstian
author_sort Ekapob Sangariyavanich
collection DOAJ
description This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies were identified from PubMed, Scopus, ACM Digital Library, and IEEE databases since inception to August 18, 2022. Data, including text representation methods, model algorithms and performance, and type of clinical notes, were extracted from individual studies by two independent reviewers. Study risk of bias was assessed using the prediction model risk of bias assessment tool. Of the 412 studies identified, 17 were eligible for inclusion, with 15 representing models that were not externally validated. Three text representations were used: statistical, context-free, and contextual representations (bidirectional encoder representations from transformers (BERT) and its variants), from 12, 6, and 3 studies, respectively. The corresponding median harmonic precision and recall means (F1 scores) for these representations were 0.43, 0.87, and 0.72, respectively. The algorithms applied included rule-based, machine learning, and deep learning approaches with median F1 scores of 0.71, 0.43, and 0.76, respectively. In conclusion, this systematic review suggests that deep learning models that use PubMedBERT as a text representation perform best. These findings are clinically informative for the selection of appropriate approaches for the detection of recurrent cancer from electronic medical records.
first_indexed 2024-03-12T02:06:15Z
format Article
id doaj.art-7fe23bea98254d5fb8e14a7bae66dba4
institution Directory Open Access Journal
issn 2352-9148
language English
last_indexed 2024-03-12T02:06:15Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series Informatics in Medicine Unlocked
spelling doaj.art-7fe23bea98254d5fb8e14a7bae66dba42023-09-07T04:44:16ZengElsevierInformatics in Medicine Unlocked2352-91482023-01-0141101326Systematic review of natural language processing for recurrent cancer detection from electronic medical recordsEkapob Sangariyavanich0Wanchana Ponthongmak1Amarit Tansawet2Nawanan Theera-Ampornpunt3Pawin Numthavaj4Gareth J. McKay5John Attia6Ammarin Thakkinstian7Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; National Cancer Institute, Department of Medical Services, Ministry of Public Health, Bangkok, ThailandDepartment of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; Corresponding author. Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Sukho Place Building 4th floor, Sukhothai Rd., Dusit District, Bangkok, 10300, Thailand.Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; Department of Surgery, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok, Thailand; Corresponding author. Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Sukho Place Building 4th floor, Sukhothai Rd., Dusit District, Bangkok, 10300, Thailand.Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, ThailandDepartment of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, ThailandCentre for Public Health, Queen's University Belfast, Belfast, United KingdomCentre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, University of Newcastle, Newcastle, NSW, AustraliaDepartment of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, ThailandThis systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies were identified from PubMed, Scopus, ACM Digital Library, and IEEE databases since inception to August 18, 2022. Data, including text representation methods, model algorithms and performance, and type of clinical notes, were extracted from individual studies by two independent reviewers. Study risk of bias was assessed using the prediction model risk of bias assessment tool. Of the 412 studies identified, 17 were eligible for inclusion, with 15 representing models that were not externally validated. Three text representations were used: statistical, context-free, and contextual representations (bidirectional encoder representations from transformers (BERT) and its variants), from 12, 6, and 3 studies, respectively. The corresponding median harmonic precision and recall means (F1 scores) for these representations were 0.43, 0.87, and 0.72, respectively. The algorithms applied included rule-based, machine learning, and deep learning approaches with median F1 scores of 0.71, 0.43, and 0.76, respectively. In conclusion, this systematic review suggests that deep learning models that use PubMedBERT as a text representation perform best. These findings are clinically informative for the selection of appropriate approaches for the detection of recurrent cancer from electronic medical records.http://www.sciencedirect.com/science/article/pii/S2352914823001727RecurrenceCancerNatural language processingClinical notesDeep learning
spellingShingle Ekapob Sangariyavanich
Wanchana Ponthongmak
Amarit Tansawet
Nawanan Theera-Ampornpunt
Pawin Numthavaj
Gareth J. McKay
John Attia
Ammarin Thakkinstian
Systematic review of natural language processing for recurrent cancer detection from electronic medical records
Informatics in Medicine Unlocked
Recurrence
Cancer
Natural language processing
Clinical notes
Deep learning
title Systematic review of natural language processing for recurrent cancer detection from electronic medical records
title_full Systematic review of natural language processing for recurrent cancer detection from electronic medical records
title_fullStr Systematic review of natural language processing for recurrent cancer detection from electronic medical records
title_full_unstemmed Systematic review of natural language processing for recurrent cancer detection from electronic medical records
title_short Systematic review of natural language processing for recurrent cancer detection from electronic medical records
title_sort systematic review of natural language processing for recurrent cancer detection from electronic medical records
topic Recurrence
Cancer
Natural language processing
Clinical notes
Deep learning
url http://www.sciencedirect.com/science/article/pii/S2352914823001727
work_keys_str_mv AT ekapobsangariyavanich systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT wanchanaponthongmak systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT amarittansawet systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT nawanantheeraampornpunt systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT pawinnumthavaj systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT garethjmckay systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT johnattia systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords
AT ammarinthakkinstian systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords