Systematic review of natural language processing for recurrent cancer detection from electronic medical records
This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-01-01
|
Series: | Informatics in Medicine Unlocked |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352914823001727 |
_version_ | 1827823007476744192 |
---|---|
author | Ekapob Sangariyavanich Wanchana Ponthongmak Amarit Tansawet Nawanan Theera-Ampornpunt Pawin Numthavaj Gareth J. McKay John Attia Ammarin Thakkinstian |
author_facet | Ekapob Sangariyavanich Wanchana Ponthongmak Amarit Tansawet Nawanan Theera-Ampornpunt Pawin Numthavaj Gareth J. McKay John Attia Ammarin Thakkinstian |
author_sort | Ekapob Sangariyavanich |
collection | DOAJ |
description | This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies were identified from PubMed, Scopus, ACM Digital Library, and IEEE databases since inception to August 18, 2022. Data, including text representation methods, model algorithms and performance, and type of clinical notes, were extracted from individual studies by two independent reviewers. Study risk of bias was assessed using the prediction model risk of bias assessment tool. Of the 412 studies identified, 17 were eligible for inclusion, with 15 representing models that were not externally validated. Three text representations were used: statistical, context-free, and contextual representations (bidirectional encoder representations from transformers (BERT) and its variants), from 12, 6, and 3 studies, respectively. The corresponding median harmonic precision and recall means (F1 scores) for these representations were 0.43, 0.87, and 0.72, respectively. The algorithms applied included rule-based, machine learning, and deep learning approaches with median F1 scores of 0.71, 0.43, and 0.76, respectively. In conclusion, this systematic review suggests that deep learning models that use PubMedBERT as a text representation perform best. These findings are clinically informative for the selection of appropriate approaches for the detection of recurrent cancer from electronic medical records. |
first_indexed | 2024-03-12T02:06:15Z |
format | Article |
id | doaj.art-7fe23bea98254d5fb8e14a7bae66dba4 |
institution | Directory Open Access Journal |
issn | 2352-9148 |
language | English |
last_indexed | 2024-03-12T02:06:15Z |
publishDate | 2023-01-01 |
publisher | Elsevier |
record_format | Article |
series | Informatics in Medicine Unlocked |
spelling | doaj.art-7fe23bea98254d5fb8e14a7bae66dba42023-09-07T04:44:16ZengElsevierInformatics in Medicine Unlocked2352-91482023-01-0141101326Systematic review of natural language processing for recurrent cancer detection from electronic medical recordsEkapob Sangariyavanich0Wanchana Ponthongmak1Amarit Tansawet2Nawanan Theera-Ampornpunt3Pawin Numthavaj4Gareth J. McKay5John Attia6Ammarin Thakkinstian7Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; National Cancer Institute, Department of Medical Services, Ministry of Public Health, Bangkok, ThailandDepartment of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; Corresponding author. Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Sukho Place Building 4th floor, Sukhothai Rd., Dusit District, Bangkok, 10300, Thailand.Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand; Department of Surgery, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok, Thailand; Corresponding author. Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Sukho Place Building 4th floor, Sukhothai Rd., Dusit District, Bangkok, 10300, Thailand.Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, ThailandDepartment of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, ThailandCentre for Public Health, Queen's University Belfast, Belfast, United KingdomCentre for Clinical Epidemiology and Biostatistics, School of Medicine and Public Health, University of Newcastle, Newcastle, NSW, AustraliaDepartment of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, ThailandThis systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies were identified from PubMed, Scopus, ACM Digital Library, and IEEE databases since inception to August 18, 2022. Data, including text representation methods, model algorithms and performance, and type of clinical notes, were extracted from individual studies by two independent reviewers. Study risk of bias was assessed using the prediction model risk of bias assessment tool. Of the 412 studies identified, 17 were eligible for inclusion, with 15 representing models that were not externally validated. Three text representations were used: statistical, context-free, and contextual representations (bidirectional encoder representations from transformers (BERT) and its variants), from 12, 6, and 3 studies, respectively. The corresponding median harmonic precision and recall means (F1 scores) for these representations were 0.43, 0.87, and 0.72, respectively. The algorithms applied included rule-based, machine learning, and deep learning approaches with median F1 scores of 0.71, 0.43, and 0.76, respectively. In conclusion, this systematic review suggests that deep learning models that use PubMedBERT as a text representation perform best. These findings are clinically informative for the selection of appropriate approaches for the detection of recurrent cancer from electronic medical records.http://www.sciencedirect.com/science/article/pii/S2352914823001727RecurrenceCancerNatural language processingClinical notesDeep learning |
spellingShingle | Ekapob Sangariyavanich Wanchana Ponthongmak Amarit Tansawet Nawanan Theera-Ampornpunt Pawin Numthavaj Gareth J. McKay John Attia Ammarin Thakkinstian Systematic review of natural language processing for recurrent cancer detection from electronic medical records Informatics in Medicine Unlocked Recurrence Cancer Natural language processing Clinical notes Deep learning |
title | Systematic review of natural language processing for recurrent cancer detection from electronic medical records |
title_full | Systematic review of natural language processing for recurrent cancer detection from electronic medical records |
title_fullStr | Systematic review of natural language processing for recurrent cancer detection from electronic medical records |
title_full_unstemmed | Systematic review of natural language processing for recurrent cancer detection from electronic medical records |
title_short | Systematic review of natural language processing for recurrent cancer detection from electronic medical records |
title_sort | systematic review of natural language processing for recurrent cancer detection from electronic medical records |
topic | Recurrence Cancer Natural language processing Clinical notes Deep learning |
url | http://www.sciencedirect.com/science/article/pii/S2352914823001727 |
work_keys_str_mv | AT ekapobsangariyavanich systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT wanchanaponthongmak systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT amarittansawet systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT nawanantheeraampornpunt systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT pawinnumthavaj systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT garethjmckay systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT johnattia systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords AT ammarinthakkinstian systematicreviewofnaturallanguageprocessingforrecurrentcancerdetectionfromelectronicmedicalrecords |