A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm

Purposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available cli...

Full description

Bibliographic Details
Main Authors: Yuanyuan LUO, Chunming YANG, Bo LI, Hui ZHANG, Xujian ZHAO
Format: Article
Language:English
Published: Editorial Office of Journal of Taiyuan University of Technology 2024-01-01
Series:Taiyuan Ligong Daxue xuebao
Subjects:
Online Access:https://tyutjournal.tyut.edu.cn/englishpaper/show-2260.html
_version_ 1797208308223836160
author Yuanyuan LUO
Chunming YANG
Bo LI
Hui ZHANG
Xujian ZHAO
author_facet Yuanyuan LUO
Chunming YANG
Bo LI
Hui ZHANG
Xujian ZHAO
author_sort Yuanyuan LUO
collection DOAJ
description Purposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available clinical event tagging data is limited. These problems bring great challenges to the event extraction task. Methods In this research, clinical event is extracted and modelled as an entity recognition model, and a Chinese medical event extraction method incorporating multiple features is proposed: BERT-MCRF. In this method, Bidirectional Encoder Representation from Transformers(BERT) is used to construct the embedding and feature extraction parts of the model, multiple word sliding window features in the Conditional Random Fields(CRF) layer are added, then BERT-MCRF is used as a base experiment for semi-supervised experiments, and a high confidence pseudo-labeled data is proposed. The selection algorithm is used as a condition to filter the data, and 300 data of higher quality are obtained and merged with the original data. Finally, 1 700 corpus are constructed and the model is retrained. Findings The overall F1 value of the BERT-MCRF model on the three attribute entities reaches 80.21%, which is 15.11% better than that of the classical Bi-directional Long Short Term Memory-Conditional Random Fields (BiLSTM-CRF) model; with the model retrained by the semi-supervised idea, the final F1 value reaches 81.56%, which is 1.35% higher than the original BERT-MCRF.
first_indexed 2024-04-24T09:36:44Z
format Article
id doaj.art-acac5f6844a64e119e791c99346402a1
institution Directory Open Access Journal
issn 1007-9432
language English
last_indexed 2024-04-24T09:36:44Z
publishDate 2024-01-01
publisher Editorial Office of Journal of Taiyuan University of Technology
record_format Article
series Taiyuan Ligong Daxue xuebao
spelling doaj.art-acac5f6844a64e119e791c99346402a12024-04-15T09:17:22ZengEditorial Office of Journal of Taiyuan University of TechnologyTaiyuan Ligong Daxue xuebao1007-94322024-01-0155120421310.16355/j.tyut.1007-9432.2023BD0111007-9432(2024)01-0204-10A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection AlgorithmYuanyuan LUO0Chunming YANG1Bo LI2Hui ZHANG3Xujian ZHAO4School of Computer and Software, Chengdu Neusoft Institute of Information, Chengdu 611844, ChinaSchool of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, ChinaSchool of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, ChinaSchool of Mathematics and Physics, Southwest University of Science and Technology, Mianyang 621000, ChinaSchool of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, ChinaPurposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available clinical event tagging data is limited. These problems bring great challenges to the event extraction task. Methods In this research, clinical event is extracted and modelled as an entity recognition model, and a Chinese medical event extraction method incorporating multiple features is proposed: BERT-MCRF. In this method, Bidirectional Encoder Representation from Transformers(BERT) is used to construct the embedding and feature extraction parts of the model, multiple word sliding window features in the Conditional Random Fields(CRF) layer are added, then BERT-MCRF is used as a base experiment for semi-supervised experiments, and a high confidence pseudo-labeled data is proposed. The selection algorithm is used as a condition to filter the data, and 300 data of higher quality are obtained and merged with the original data. Finally, 1 700 corpus are constructed and the model is retrained. Findings The overall F1 value of the BERT-MCRF model on the three attribute entities reaches 80.21%, which is 15.11% better than that of the classical Bi-directional Long Short Term Memory-Conditional Random Fields (BiLSTM-CRF) model; with the model retrained by the semi-supervised idea, the final F1 value reaches 81.56%, which is 1.35% higher than the original BERT-MCRF.https://tyutjournal.tyut.edu.cn/englishpaper/show-2260.htmlclinical medical event extractionentity recognitionmulti-featuressemi-supervised learninghigh-confidence pseudo-label selection algorithm
spellingShingle Yuanyuan LUO
Chunming YANG
Bo LI
Hui ZHANG
Xujian ZHAO
A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
Taiyuan Ligong Daxue xuebao
clinical medical event extraction
entity recognition
multi-features
semi-supervised learning
high-confidence pseudo-label selection algorithm
title A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
title_full A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
title_fullStr A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
title_full_unstemmed A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
title_short A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
title_sort clinical event extraction method based on a high confidence pseudo label data selection algorithm
topic clinical medical event extraction
entity recognition
multi-features
semi-supervised learning
high-confidence pseudo-label selection algorithm
url https://tyutjournal.tyut.edu.cn/englishpaper/show-2260.html
work_keys_str_mv AT yuanyuanluo aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT chunmingyang aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT boli aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT huizhang aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT xujianzhao aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT yuanyuanluo clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT chunmingyang clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT boli clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT huizhang clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm
AT xujianzhao clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm