Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity

Abstract Background Semantic textual similarity (STS) is a fundamental natural language processing (NLP) task which can be widely used in many NLP applications such as Question Answer (QA), Information Retrieval (IR), etc. It is a typical regression problem, and almost all STS systems either use dis...

Full description

Bibliographic Details
Main Authors: Ying Xiong, Shuai Chen, Haoming Qin, He Cao, Yedan Shen, Xiaolong Wang, Qingcai Chen, Jun Yan, Buzhou Tang
Format: Article
Language:English
Published: BMC 2020-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-020-1045-z
_version_ 1819278131154911232
author Ying Xiong
Shuai Chen
Haoming Qin
He Cao
Yedan Shen
Xiaolong Wang
Qingcai Chen
Jun Yan
Buzhou Tang
author_facet Ying Xiong
Shuai Chen
Haoming Qin
He Cao
Yedan Shen
Xiaolong Wang
Qingcai Chen
Jun Yan
Buzhou Tang
author_sort Ying Xiong
collection DOAJ
description Abstract Background Semantic textual similarity (STS) is a fundamental natural language processing (NLP) task which can be widely used in many NLP applications such as Question Answer (QA), Information Retrieval (IR), etc. It is a typical regression problem, and almost all STS systems either use distributed representation or one-hot representation to model sentence pairs. Methods In this paper, we proposed a novel framework based on a gated network to fuse distributed representation and one-hot representation of sentence pairs. Some current state-of-the-art distributed representation methods, including Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory networks (Bi-LSTM) and Bidirectional Encoder Representations from Transformers (BERT), were used in our framework, and a system based on this framework was developed for a shared task regarding clinical STS organized by BioCreative/OHNLP in 2018. Results Compared with the systems only using distributed representation or one-hot representation, our method achieved much higher Pearson correlation. Among all distributed representations, BERT performed best. The highest Person correlation of our system was 0.8541, higher than the best official one of the BioCreative/OHNLP clinical STS shared task in 2018 (0.8328) by 0.0213. Conclusions Distributed representation and one-hot representation are complementary to each other and can be fused by gated network.
first_indexed 2024-12-24T00:07:08Z
format Article
id doaj.art-c24d0ba95516468eb01d1dd0c3d6351d
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-24T00:07:08Z
publishDate 2020-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-c24d0ba95516468eb01d1dd0c3d6351d2022-12-21T17:24:58ZengBMCBMC Medical Informatics and Decision Making1472-69472020-04-0120S11710.1186/s12911-020-1045-zDistributed representation and one-hot representation fusion with gated network for clinical semantic textual similarityYing Xiong0Shuai Chen1Haoming Qin2He Cao3Yedan Shen4Xiaolong Wang5Qingcai Chen6Jun Yan7Buzhou Tang8Department of Computer Science, Harbin Institute of TechnologyDepartment of Computer Science, Harbin Institute of TechnologyDepartment of Computer Science, Harbin Institute of TechnologyDepartment of Computer Science, Harbin Institute of TechnologyDepartment of Computer Science, Harbin Institute of TechnologyDepartment of Computer Science, Harbin Institute of TechnologyDepartment of Computer Science, Harbin Institute of TechnologyYidu Cloud (Beijing) Technology Co., LtdDepartment of Computer Science, Harbin Institute of TechnologyAbstract Background Semantic textual similarity (STS) is a fundamental natural language processing (NLP) task which can be widely used in many NLP applications such as Question Answer (QA), Information Retrieval (IR), etc. It is a typical regression problem, and almost all STS systems either use distributed representation or one-hot representation to model sentence pairs. Methods In this paper, we proposed a novel framework based on a gated network to fuse distributed representation and one-hot representation of sentence pairs. Some current state-of-the-art distributed representation methods, including Convolutional Neural Network (CNN), Bi-directional Long Short Term Memory networks (Bi-LSTM) and Bidirectional Encoder Representations from Transformers (BERT), were used in our framework, and a system based on this framework was developed for a shared task regarding clinical STS organized by BioCreative/OHNLP in 2018. Results Compared with the systems only using distributed representation or one-hot representation, our method achieved much higher Pearson correlation. Among all distributed representations, BERT performed best. The highest Person correlation of our system was 0.8541, higher than the best official one of the BioCreative/OHNLP clinical STS shared task in 2018 (0.8328) by 0.0213. Conclusions Distributed representation and one-hot representation are complementary to each other and can be fused by gated network.http://link.springer.com/article/10.1186/s12911-020-1045-zClinical semantic textual similarityGated networkDistributed representationOne-hot representation
spellingShingle Ying Xiong
Shuai Chen
Haoming Qin
He Cao
Yedan Shen
Xiaolong Wang
Qingcai Chen
Jun Yan
Buzhou Tang
Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity
BMC Medical Informatics and Decision Making
Clinical semantic textual similarity
Gated network
Distributed representation
One-hot representation
title Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity
title_full Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity
title_fullStr Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity
title_full_unstemmed Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity
title_short Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity
title_sort distributed representation and one hot representation fusion with gated network for clinical semantic textual similarity
topic Clinical semantic textual similarity
Gated network
Distributed representation
One-hot representation
url http://link.springer.com/article/10.1186/s12911-020-1045-z
work_keys_str_mv AT yingxiong distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT shuaichen distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT haomingqin distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT hecao distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT yedanshen distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT xiaolongwang distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT qingcaichen distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT junyan distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity
AT buzhoutang distributedrepresentationandonehotrepresentationfusionwithgatednetworkforclinicalsemantictextualsimilarity