Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer

BackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model usi...

Full description

Bibliographic Details
Main Authors: Wenjuan Liu, Xi Zhang, Han Lv, Jia Li, Yawen Liu, Zhenghan Yang, Xutao Weng, Yucong Lin, Hong Song, Zhenchang Wang
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-11-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2022.913806/full
_version_ 1797984058736640000
author Wenjuan Liu
Xi Zhang
Han Lv
Jia Li
Yawen Liu
Zhenghan Yang
Xutao Weng
Yucong Lin
Hong Song
Zhenchang Wang
author_facet Wenjuan Liu
Xi Zhang
Han Lv
Jia Li
Yawen Liu
Zhenghan Yang
Xutao Weng
Yucong Lin
Hong Song
Zhenchang Wang
author_sort Wenjuan Liu
collection DOAJ
description BackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC).MethodsRadiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies.ResultsThe dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning.ConclusionsThe learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.
first_indexed 2024-04-11T06:55:39Z
format Article
id doaj.art-4ef5b71420aa438c813aabedec4e7e47
institution Directory Open Access Journal
issn 2234-943X
language English
last_indexed 2024-04-11T06:55:39Z
publishDate 2022-11-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj.art-4ef5b71420aa438c813aabedec4e7e472022-12-22T04:39:02ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2022-11-011210.3389/fonc.2022.913806913806Using a classification model for determining the value of liver radiological reports of patients with colorectal cancerWenjuan Liu0Xi Zhang1Han Lv2Jia Li3Yawen Liu4Zhenghan Yang5Xutao Weng6Yucong Lin7Hong Song8Zhenchang Wang9Department of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaDepartment of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, ChinaDepartment of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, ChinaSchool of Biological Science and Medical Engineering, Beihang University, Beijing, ChinaDepartment of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaDepartment of Radiology, Beijing Friendship Hospital, Capital Medical University, Beijing, ChinaBackgroundMedical imaging is critical in clinical practice, and high value radiological reports can positively assist clinicians. However, there is a lack of methods for determining the value of reports.ObjectiveThe purpose of this study was to establish an ensemble learning classification model using natural language processing (NLP) applied to the Chinese free text of radiological reports to determine their value for liver lesion detection in patients with colorectal cancer (CRC).MethodsRadiological reports of upper abdominal computed tomography (CT) and magnetic resonance imaging (MRI) were divided into five categories according to the results of liver lesion detection in patients with CRC. The NLP methods including word segmentation, stop word removal, and n-gram language model establishment were applied for each dataset. Then, a word-bag model was built, high-frequency words were selected as features, and an ensemble learning classification model was constructed. Several machine learning methods were applied, including logistic regression (LR), random forest (RF), and so on. We compared the accuracy between priori choosing pertinent word strings and our machine language methodologies.ResultsThe dataset of 2790 patients included CT without contrast (10.2%), CT with/without contrast (73.3%), MRI without contrast (1.8%), and MRI with/without contrast (14.6%). The ensemble learning classification model determined the value of reports effectively, reaching 95.91% in the CT with/without contrast dataset using XGBoost. The logistic regression, random forest, and support vector machine also achieved good classification accuracy, reaching 95.89%, 95.04%, and 95.00% respectively. The results of XGBoost were visualized using a confusion matrix. The numbers of errors in categories I, II and V were very small. ELI5 was used to select important words for each category. Words such as “no abnormality”, “suggest”, “fatty liver”, and “transfer” showed a relatively large degree of positive correlation with classification accuracy. The accuracy based on string pattern search method model was lower than that of machine learning.ConclusionsThe learning classification model based on NLP was an effective tool for determining the value of radiological reports focused on liver lesions. The study made it possible to analyze the value of medical imaging examinations on a large scale.https://www.frontiersin.org/articles/10.3389/fonc.2022.913806/fullnatural language processingcolorectal cancerliver lesionmedical imaging reportclassification model
spellingShingle Wenjuan Liu
Xi Zhang
Han Lv
Jia Li
Yawen Liu
Zhenghan Yang
Xutao Weng
Yucong Lin
Hong Song
Zhenchang Wang
Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
Frontiers in Oncology
natural language processing
colorectal cancer
liver lesion
medical imaging report
classification model
title Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_full Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_fullStr Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_full_unstemmed Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_short Using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
title_sort using a classification model for determining the value of liver radiological reports of patients with colorectal cancer
topic natural language processing
colorectal cancer
liver lesion
medical imaging report
classification model
url https://www.frontiersin.org/articles/10.3389/fonc.2022.913806/full
work_keys_str_mv AT wenjuanliu usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT xizhang usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT hanlv usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT jiali usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT yawenliu usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT zhenghanyang usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT xutaoweng usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT yuconglin usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT hongsong usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer
AT zhenchangwang usingaclassificationmodelfordeterminingthevalueofliverradiologicalreportsofpatientswithcolorectalcancer