Automatic Classification of Cancer Pathology Reports: A Systematic Review

Pathology reports primarily consist of unstructured free text and thus the clinical information contained in the reports is not trivial to access or query. Multiple natural language processing (NLP) techniques have been proposed to automate the coding of pathology reports via text classification. In...

Full description

Bibliographic Details
Main Authors: Thiago Santos, Amara Tariq, Judy Wawira Gichoya, Hari Trivedi, Imon Banerjee
Format: Article
Language:English
Published: Elsevier 2022-01-01
Series:Journal of Pathology Informatics
Online Access:http://www.sciencedirect.com/science/article/pii/S2153353922000037
_version_ 1828086779508424704
author Thiago Santos
Amara Tariq
Judy Wawira Gichoya
Hari Trivedi
Imon Banerjee
author_facet Thiago Santos
Amara Tariq
Judy Wawira Gichoya
Hari Trivedi
Imon Banerjee
author_sort Thiago Santos
collection DOAJ
description Pathology reports primarily consist of unstructured free text and thus the clinical information contained in the reports is not trivial to access or query. Multiple natural language processing (NLP) techniques have been proposed to automate the coding of pathology reports via text classification. In this systematic review, we follow the guidelines proposed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2020: BMJ.) to identify the NLP systems for classifying pathology reports published between the years of 2010 and 2021. Based on our search criteria, a total of 3445 records were retrieved, and 25 articles met the final review criteria. We benchmarked the systems based on methodology, complexity of the prediction task and core types of NLP models: i) Rule-based and Intelligent systems, ii) statistical machine learning, and iii) deep learning. While certain tasks are well addressed by these models, many others have limitations and remain as open challenges, such as, extraction of many cancer characteristics (size, shape, type of cancer, others) from pathology reports. We investigated the final set of papers (25) and addressed their potential as well as their limitations. We hope that this systematic review helps researchers prioritize the development of innovated approaches to tackle the current limitations and help the advancement of cancer research.
first_indexed 2024-04-11T05:00:23Z
format Article
id doaj.art-ee60c341a22c4d3db53a40e233d05b15
institution Directory Open Access Journal
issn 2153-3539
language English
last_indexed 2024-04-11T05:00:23Z
publishDate 2022-01-01
publisher Elsevier
record_format Article
series Journal of Pathology Informatics
spelling doaj.art-ee60c341a22c4d3db53a40e233d05b152022-12-26T04:07:55ZengElsevierJournal of Pathology Informatics2153-35392022-01-0113100003Automatic Classification of Cancer Pathology Reports: A Systematic ReviewThiago Santos0Amara Tariq1Judy Wawira Gichoya2Hari Trivedi3Imon Banerjee4Department of Computer Science, Emory University, Atlanta, GA, USA; Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA; Corresponding author.Department of Radiology, Mayo Clinic, Phoenix, AZ, USADepartment of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA; Department of Radiology, Emory School of Medicine, Atlanta, GA, USADepartment of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA; Department of Radiology, Emory School of Medicine, Atlanta, GA, USADepartment of Radiology, Mayo Clinic, Phoenix, AZ, USA; Department of Computer Engineering, Arizona State University, AZ, USAPathology reports primarily consist of unstructured free text and thus the clinical information contained in the reports is not trivial to access or query. Multiple natural language processing (NLP) techniques have been proposed to automate the coding of pathology reports via text classification. In this systematic review, we follow the guidelines proposed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2020: BMJ.) to identify the NLP systems for classifying pathology reports published between the years of 2010 and 2021. Based on our search criteria, a total of 3445 records were retrieved, and 25 articles met the final review criteria. We benchmarked the systems based on methodology, complexity of the prediction task and core types of NLP models: i) Rule-based and Intelligent systems, ii) statistical machine learning, and iii) deep learning. While certain tasks are well addressed by these models, many others have limitations and remain as open challenges, such as, extraction of many cancer characteristics (size, shape, type of cancer, others) from pathology reports. We investigated the final set of papers (25) and addressed their potential as well as their limitations. We hope that this systematic review helps researchers prioritize the development of innovated approaches to tackle the current limitations and help the advancement of cancer research.http://www.sciencedirect.com/science/article/pii/S2153353922000037
spellingShingle Thiago Santos
Amara Tariq
Judy Wawira Gichoya
Hari Trivedi
Imon Banerjee
Automatic Classification of Cancer Pathology Reports: A Systematic Review
Journal of Pathology Informatics
title Automatic Classification of Cancer Pathology Reports: A Systematic Review
title_full Automatic Classification of Cancer Pathology Reports: A Systematic Review
title_fullStr Automatic Classification of Cancer Pathology Reports: A Systematic Review
title_full_unstemmed Automatic Classification of Cancer Pathology Reports: A Systematic Review
title_short Automatic Classification of Cancer Pathology Reports: A Systematic Review
title_sort automatic classification of cancer pathology reports a systematic review
url http://www.sciencedirect.com/science/article/pii/S2153353922000037
work_keys_str_mv AT thiagosantos automaticclassificationofcancerpathologyreportsasystematicreview
AT amaratariq automaticclassificationofcancerpathologyreportsasystematicreview
AT judywawiragichoya automaticclassificationofcancerpathologyreportsasystematicreview
AT haritrivedi automaticclassificationofcancerpathologyreportsasystematicreview
AT imonbanerjee automaticclassificationofcancerpathologyreportsasystematicreview