A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm
Purposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available cli...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Editorial Office of Journal of Taiyuan University of Technology
2024-01-01
|
Series: | Taiyuan Ligong Daxue xuebao |
Subjects: | |
Online Access: | https://tyutjournal.tyut.edu.cn/englishpaper/show-2260.html |
_version_ | 1797208308223836160 |
---|---|
author | Yuanyuan LUO Chunming YANG Bo LI Hui ZHANG Xujian ZHAO |
author_facet | Yuanyuan LUO Chunming YANG Bo LI Hui ZHANG Xujian ZHAO |
author_sort | Yuanyuan LUO |
collection | DOAJ |
description | Purposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available clinical event tagging data is limited. These problems bring great challenges to the event extraction task. Methods In this research, clinical event is extracted and modelled as an entity recognition model, and a Chinese medical event extraction method incorporating multiple features is proposed: BERT-MCRF. In this method, Bidirectional Encoder Representation from Transformers(BERT) is used to construct the embedding and feature extraction parts of the model, multiple word sliding window features in the Conditional Random Fields(CRF) layer are added, then BERT-MCRF is used as a base experiment for semi-supervised experiments, and a high confidence pseudo-labeled data is proposed. The selection algorithm is used as a condition to filter the data, and 300 data of higher quality are obtained and merged with the original data. Finally, 1 700 corpus are constructed and the model is retrained. Findings The overall F1 value of the BERT-MCRF model on the three attribute entities reaches 80.21%, which is 15.11% better than that of the classical Bi-directional Long Short Term Memory-Conditional Random Fields (BiLSTM-CRF) model; with the model retrained by the semi-supervised idea, the final F1 value reaches 81.56%, which is 1.35% higher than the original BERT-MCRF. |
first_indexed | 2024-04-24T09:36:44Z |
format | Article |
id | doaj.art-acac5f6844a64e119e791c99346402a1 |
institution | Directory Open Access Journal |
issn | 1007-9432 |
language | English |
last_indexed | 2024-04-24T09:36:44Z |
publishDate | 2024-01-01 |
publisher | Editorial Office of Journal of Taiyuan University of Technology |
record_format | Article |
series | Taiyuan Ligong Daxue xuebao |
spelling | doaj.art-acac5f6844a64e119e791c99346402a12024-04-15T09:17:22ZengEditorial Office of Journal of Taiyuan University of TechnologyTaiyuan Ligong Daxue xuebao1007-94322024-01-0155120421310.16355/j.tyut.1007-9432.2023BD0111007-9432(2024)01-0204-10A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection AlgorithmYuanyuan LUO0Chunming YANG1Bo LI2Hui ZHANG3Xujian ZHAO4School of Computer and Software, Chengdu Neusoft Institute of Information, Chengdu 611844, ChinaSchool of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, ChinaSchool of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, ChinaSchool of Mathematics and Physics, Southwest University of Science and Technology, Mianyang 621000, ChinaSchool of Computer Science and Technology, Southwest University of Science and Technology, Mianyang 621000, ChinaPurposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available clinical event tagging data is limited. These problems bring great challenges to the event extraction task. Methods In this research, clinical event is extracted and modelled as an entity recognition model, and a Chinese medical event extraction method incorporating multiple features is proposed: BERT-MCRF. In this method, Bidirectional Encoder Representation from Transformers(BERT) is used to construct the embedding and feature extraction parts of the model, multiple word sliding window features in the Conditional Random Fields(CRF) layer are added, then BERT-MCRF is used as a base experiment for semi-supervised experiments, and a high confidence pseudo-labeled data is proposed. The selection algorithm is used as a condition to filter the data, and 300 data of higher quality are obtained and merged with the original data. Finally, 1 700 corpus are constructed and the model is retrained. Findings The overall F1 value of the BERT-MCRF model on the three attribute entities reaches 80.21%, which is 15.11% better than that of the classical Bi-directional Long Short Term Memory-Conditional Random Fields (BiLSTM-CRF) model; with the model retrained by the semi-supervised idea, the final F1 value reaches 81.56%, which is 1.35% higher than the original BERT-MCRF.https://tyutjournal.tyut.edu.cn/englishpaper/show-2260.htmlclinical medical event extractionentity recognitionmulti-featuressemi-supervised learninghigh-confidence pseudo-label selection algorithm |
spellingShingle | Yuanyuan LUO Chunming YANG Bo LI Hui ZHANG Xujian ZHAO A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm Taiyuan Ligong Daxue xuebao clinical medical event extraction entity recognition multi-features semi-supervised learning high-confidence pseudo-label selection algorithm |
title | A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm |
title_full | A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm |
title_fullStr | A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm |
title_full_unstemmed | A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm |
title_short | A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm |
title_sort | clinical event extraction method based on a high confidence pseudo label data selection algorithm |
topic | clinical medical event extraction entity recognition multi-features semi-supervised learning high-confidence pseudo-label selection algorithm |
url | https://tyutjournal.tyut.edu.cn/englishpaper/show-2260.html |
work_keys_str_mv | AT yuanyuanluo aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT chunmingyang aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT boli aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT huizhang aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT xujianzhao aclinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT yuanyuanluo clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT chunmingyang clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT boli clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT huizhang clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm AT xujianzhao clinicaleventextractionmethodbasedonahighconfidencepseudolabeldataselectionalgorithm |