Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing

Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extrac...

Full description

Bibliographic Details
Main Authors: Wee-Ming Tan, Kean-Hooi Teoh, Mogana Darshini Ganggayah, Nur Aishah Taib, Hana Salwani Zaini, Sarinder Kaur Dhillon
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/12/4/879
_version_ 1797436064490586112
author Wee-Ming Tan
Kean-Hooi Teoh
Mogana Darshini Ganggayah
Nur Aishah Taib
Hana Salwani Zaini
Sarinder Kaur Dhillon
author_facet Wee-Ming Tan
Kean-Hooi Teoh
Mogana Darshini Ganggayah
Nur Aishah Taib
Hana Salwani Zaini
Sarinder Kaur Dhillon
author_sort Wee-Ming Tan
collection DOAJ
description Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians’ needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.
first_indexed 2024-03-09T10:57:17Z
format Article
id doaj.art-6cda3e9684284de49c9004c1ff6d2d91
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-09T10:57:17Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-6cda3e9684284de49c9004c1ff6d2d912023-12-01T01:32:20ZengMDPI AGDiagnostics2075-44182022-04-0112487910.3390/diagnostics12040879Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language ProcessingWee-Ming Tan0Kean-Hooi Teoh1Mogana Darshini Ganggayah2Nur Aishah Taib3Hana Salwani Zaini4Sarinder Kaur Dhillon5Data Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur 50603, MalaysiaLaboratory Department, Sunway Medical Centre, Bandar Sunway 47500, MalaysiaData Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur 50603, MalaysiaDepartment of Surgery, Faculty of Medicine, University of Malaya, Kuala Lumpur 50603, MalaysiaDepartment of Information Technology, University Malaya Medical Centre, Kuala Lumpur 50603, MalaysiaData Science & Bioinformatics Laboratory, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur 50603, MalaysiaPathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians’ needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.https://www.mdpi.com/2075-4418/12/4/879pathology reportingsynoptic reportinginformation extractiontext miningnatural language processingrule based
spellingShingle Wee-Ming Tan
Kean-Hooi Teoh
Mogana Darshini Ganggayah
Nur Aishah Taib
Hana Salwani Zaini
Sarinder Kaur Dhillon
Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
Diagnostics
pathology reporting
synoptic reporting
information extraction
text mining
natural language processing
rule based
title Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
title_full Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
title_fullStr Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
title_full_unstemmed Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
title_short Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing
title_sort automated generation of synoptic reports from narrative pathology reports in university malaya medical centre using natural language processing
topic pathology reporting
synoptic reporting
information extraction
text mining
natural language processing
rule based
url https://www.mdpi.com/2075-4418/12/4/879
work_keys_str_mv AT weemingtan automatedgenerationofsynopticreportsfromnarrativepathologyreportsinuniversitymalayamedicalcentreusingnaturallanguageprocessing
AT keanhooiteoh automatedgenerationofsynopticreportsfromnarrativepathologyreportsinuniversitymalayamedicalcentreusingnaturallanguageprocessing
AT moganadarshiniganggayah automatedgenerationofsynopticreportsfromnarrativepathologyreportsinuniversitymalayamedicalcentreusingnaturallanguageprocessing
AT nuraishahtaib automatedgenerationofsynopticreportsfromnarrativepathologyreportsinuniversitymalayamedicalcentreusingnaturallanguageprocessing
AT hanasalwanizaini automatedgenerationofsynopticreportsfromnarrativepathologyreportsinuniversitymalayamedicalcentreusingnaturallanguageprocessing
AT sarinderkaurdhillon automatedgenerationofsynopticreportsfromnarrativepathologyreportsinuniversitymalayamedicalcentreusingnaturallanguageprocessing