Systematic review of natural language processing for recurrent cancer detection from electronic medical records

This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies...

全面介绍

书目详细资料
Main Authors: Ekapob Sangariyavanich, Wanchana Ponthongmak, Amarit Tansawet, Nawanan Theera-Ampornpunt, Pawin Numthavaj, Gareth J. McKay, John Attia, Ammarin Thakkinstian
格式: 文件
语言:English
出版: Elsevier 2023-01-01
丛编:Informatics in Medicine Unlocked
主题:
在线阅读:http://www.sciencedirect.com/science/article/pii/S2352914823001727
实物特征
总结:This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies were identified from PubMed, Scopus, ACM Digital Library, and IEEE databases since inception to August 18, 2022. Data, including text representation methods, model algorithms and performance, and type of clinical notes, were extracted from individual studies by two independent reviewers. Study risk of bias was assessed using the prediction model risk of bias assessment tool. Of the 412 studies identified, 17 were eligible for inclusion, with 15 representing models that were not externally validated. Three text representations were used: statistical, context-free, and contextual representations (bidirectional encoder representations from transformers (BERT) and its variants), from 12, 6, and 3 studies, respectively. The corresponding median harmonic precision and recall means (F1 scores) for these representations were 0.43, 0.87, and 0.72, respectively. The algorithms applied included rule-based, machine learning, and deep learning approaches with median F1 scores of 0.71, 0.43, and 0.76, respectively. In conclusion, this systematic review suggests that deep learning models that use PubMedBERT as a text representation perform best. These findings are clinically informative for the selection of appropriate approaches for the detection of recurrent cancer from electronic medical records.
ISSN:2352-9148