Extracting cancer concepts from clinical notes using natural language processing: a systematic review

Abstract Background Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that use...

Full description

Bibliographic Details
Main Authors: Maryam Gholipour, Reza Khajouei, Parastoo Amiri, Sadrieh Hajesmaeel Gohari, Leila Ahmadian
Format: Article
Language:English
Published: BMC 2023-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05480-0
_version_ 1797647117411418112
author Maryam Gholipour
Reza Khajouei
Parastoo Amiri
Sadrieh Hajesmaeel Gohari
Leila Ahmadian
author_facet Maryam Gholipour
Reza Khajouei
Parastoo Amiri
Sadrieh Hajesmaeel Gohari
Leila Ahmadian
author_sort Maryam Gholipour
collection DOAJ
description Abstract Background Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically. Methods PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning “Cancer”, “NLP”, “Coding”, and “Registries” until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review. Results Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%). Conclusion The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.
first_indexed 2024-03-11T15:12:44Z
format Article
id doaj.art-5bab94b6f9104d32abb2ceb885ab1597
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-11T15:12:44Z
publishDate 2023-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-5bab94b6f9104d32abb2ceb885ab15972023-10-29T12:38:06ZengBMCBMC Bioinformatics1471-21052023-10-0124111610.1186/s12859-023-05480-0Extracting cancer concepts from clinical notes using natural language processing: a systematic reviewMaryam Gholipour0Reza Khajouei1Parastoo Amiri2Sadrieh Hajesmaeel Gohari3Leila Ahmadian4Student Research Committee, Kerman University of Medical SciencesDepartment of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical SciencesStudent Research Committee, Kerman University of Medical SciencesMedical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical SciencesDepartment of Health Information Sciences, Faculty of Management and Medical Information Sciences, Kerman University of Medical SciencesAbstract Background Extracting information from free texts using natural language processing (NLP) can save time and reduce the hassle of manually extracting large quantities of data from incredibly complex clinical notes of cancer patients. This study aimed to systematically review studies that used NLP methods to identify cancer concepts from clinical notes automatically. Methods PubMed, Scopus, Web of Science, and Embase were searched for English language papers using a combination of the terms concerning “Cancer”, “NLP”, “Coding”, and “Registries” until June 29, 2021. Two reviewers independently assessed the eligibility of papers for inclusion in the review. Results Most of the software programs used for concept extraction reported were developed by the researchers (n = 7). Rule-based algorithms were the most frequently used algorithms for developing these programs. In most articles, the criteria of accuracy (n = 14) and sensitivity (n = 12) were used to evaluate the algorithms. In addition, Systematized Nomenclature of Medicine-Clinical Terms (SNOMED-CT) and Unified Medical Language System (UMLS) were the most commonly used terminologies to identify concepts. Most studies focused on breast cancer (n = 4, 19%) and lung cancer (n = 4, 19%). Conclusion The use of NLP for extracting the concepts and symptoms of cancer has increased in recent years. The rule-based algorithms are well-liked algorithms by developers. Due to these algorithms' high accuracy and sensitivity in identifying and extracting cancer concepts, we suggested that future studies use these algorithms to extract the concepts of other diseases as well.https://doi.org/10.1186/s12859-023-05480-0NeoplasmsNatural language processingNLPMachine learningTerminologyInformation system
spellingShingle Maryam Gholipour
Reza Khajouei
Parastoo Amiri
Sadrieh Hajesmaeel Gohari
Leila Ahmadian
Extracting cancer concepts from clinical notes using natural language processing: a systematic review
BMC Bioinformatics
Neoplasms
Natural language processing
NLP
Machine learning
Terminology
Information system
title Extracting cancer concepts from clinical notes using natural language processing: a systematic review
title_full Extracting cancer concepts from clinical notes using natural language processing: a systematic review
title_fullStr Extracting cancer concepts from clinical notes using natural language processing: a systematic review
title_full_unstemmed Extracting cancer concepts from clinical notes using natural language processing: a systematic review
title_short Extracting cancer concepts from clinical notes using natural language processing: a systematic review
title_sort extracting cancer concepts from clinical notes using natural language processing a systematic review
topic Neoplasms
Natural language processing
NLP
Machine learning
Terminology
Information system
url https://doi.org/10.1186/s12859-023-05480-0
work_keys_str_mv AT maryamgholipour extractingcancerconceptsfromclinicalnotesusingnaturallanguageprocessingasystematicreview
AT rezakhajouei extractingcancerconceptsfromclinicalnotesusingnaturallanguageprocessingasystematicreview
AT parastooamiri extractingcancerconceptsfromclinicalnotesusingnaturallanguageprocessingasystematicreview
AT sadriehhajesmaeelgohari extractingcancerconceptsfromclinicalnotesusingnaturallanguageprocessingasystematicreview
AT leilaahmadian extractingcancerconceptsfromclinicalnotesusingnaturallanguageprocessingasystematicreview