A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks

Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream rela...

Full description

Bibliographic Details
Main Authors: Shuilong Zou, Zhaoyang Liu, Kaiqi Wang, Jun Cao, Shixiong Liu, Wangping Xiong, Shaoyi Li
Format: Article
Language:English
Published: AIMS Press 2024-01-01
Series:Mathematical Biosciences and Engineering
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/mbe.2024064?viewType=HTML
_version_ 1797326302887280640
author Shuilong Zou
Zhaoyang Liu
Kaiqi Wang
Jun Cao
Shixiong Liu
Wangping Xiong
Shaoyi Li
author_facet Shuilong Zou
Zhaoyang Liu
Kaiqi Wang
Jun Cao
Shixiong Liu
Wangping Xiong
Shaoyi Li
author_sort Shuilong Zou
collection DOAJ
description Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.
first_indexed 2024-03-08T06:22:22Z
format Article
id doaj.art-7a1b084005f5462b876df916494f3156
institution Directory Open Access Journal
issn 1551-0018
language English
last_indexed 2024-03-08T06:22:22Z
publishDate 2024-01-01
publisher AIMS Press
record_format Article
series Mathematical Biosciences and Engineering
spelling doaj.art-7a1b084005f5462b876df916494f31562024-02-04T01:33:52ZengAIMS PressMathematical Biosciences and Engineering1551-00182024-01-012111489150710.3934/mbe.2024064A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networksShuilong Zou0Zhaoyang Liu1Kaiqi Wang2Jun Cao3Shixiong Liu4Wangping Xiong 5Shaoyi Li 61. Nanchang Institute of science & Technology, Nanchang 330004, China2. School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China2. School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China2. School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China1. Nanchang Institute of science & Technology, Nanchang 330004, China2. School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China1. Nanchang Institute of science & Technology, Nanchang 330004, ChinaEffective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.https://www.aimspress.com/article/doi/10.3934/mbe.2024064?viewType=HTMLmedical textrelation extractionbertheterogeneous graph neural networks
spellingShingle Shuilong Zou
Zhaoyang Liu
Kaiqi Wang
Jun Cao
Shixiong Liu
Wangping Xiong
Shaoyi Li
A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
Mathematical Biosciences and Engineering
medical text
relation extraction
bert
heterogeneous graph neural networks
title A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
title_full A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
title_fullStr A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
title_full_unstemmed A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
title_short A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
title_sort study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks
topic medical text
relation extraction
bert
heterogeneous graph neural networks
url https://www.aimspress.com/article/doi/10.3934/mbe.2024064?viewType=HTML
work_keys_str_mv AT shuilongzou astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT zhaoyangliu astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT kaiqiwang astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT juncao astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT shixiongliu astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT wangpingxiong astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT shaoyili astudyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT shuilongzou studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT zhaoyangliu studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT kaiqiwang studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT juncao studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT shixiongliu studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT wangpingxiong studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks
AT shaoyili studyonpharmaceuticaltextrelationshipextractionbasedonheterogeneousgraphneuralnetworks