Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings

The most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is re...

Full description

Bibliographic Details
Main Authors: Chengyang Zhuang, Yuanjie Zheng, Wenhui Huang, Weikuan Jia
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8918121/
_version_ 1819132935814512640
author Chengyang Zhuang
Yuanjie Zheng
Wenhui Huang
Weikuan Jia
author_facet Chengyang Zhuang
Yuanjie Zheng
Wenhui Huang
Weikuan Jia
author_sort Chengyang Zhuang
collection DOAJ
description The most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is related to the accuracy of prediction. Therefore, to obtain high-quality Chinese character embedding, we propose a continuously enhanced word embedding model. The model starts with fine-grained strokes and adjacent stroke information and enhances subcharacter embedding by combining the relationship vector representation between strokes. Similarly, we combine the subcharacter relationship vector and the character relationship vector to learn Chinese word embedding based on the enhanced subcharacter embedding. We construct the underlying stroke n-grams and adjacent stroke n-grams and extract the relationship vector used to enhance the relationship between the components, which can be used to learn Chinese word embedding and improve the accuracy. Finally, we evaluate our model on the word similarity calculations and word reasoning tasks.
first_indexed 2024-12-22T09:39:18Z
format Article
id doaj.art-3dfae04e38a449ce95369359a5798dc2
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-22T09:39:18Z
publishDate 2019-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3dfae04e38a449ce95369359a5798dc22022-12-21T18:30:43ZengIEEEIEEE Access2169-35362019-01-01717469917470810.1109/ACCESS.2019.29568228918121Joint Fine-Grained Components Continuously Enhance Chinese Word EmbeddingsChengyang Zhuang0https://orcid.org/0000-0001-9714-9124Yuanjie Zheng1https://orcid.org/0000-0002-5786-2491Wenhui Huang2https://orcid.org/0000-0002-5435-8775Weikuan Jia3https://orcid.org/0000-0001-6242-3269School of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaThe most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is related to the accuracy of prediction. Therefore, to obtain high-quality Chinese character embedding, we propose a continuously enhanced word embedding model. The model starts with fine-grained strokes and adjacent stroke information and enhances subcharacter embedding by combining the relationship vector representation between strokes. Similarly, we combine the subcharacter relationship vector and the character relationship vector to learn Chinese word embedding based on the enhanced subcharacter embedding. We construct the underlying stroke n-grams and adjacent stroke n-grams and extract the relationship vector used to enhance the relationship between the components, which can be used to learn Chinese word embedding and improve the accuracy. Finally, we evaluate our model on the word similarity calculations and word reasoning tasks.https://ieeexplore.ieee.org/document/8918121/Chinese word embeddingstrokesub-charactercharacterlanguagen–grams
spellingShingle Chengyang Zhuang
Yuanjie Zheng
Wenhui Huang
Weikuan Jia
Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
IEEE Access
Chinese word embedding
stroke
sub-character
character
language
n–grams
title Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
title_full Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
title_fullStr Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
title_full_unstemmed Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
title_short Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
title_sort joint fine grained components continuously enhance chinese word embeddings
topic Chinese word embedding
stroke
sub-character
character
language
n–grams
url https://ieeexplore.ieee.org/document/8918121/
work_keys_str_mv AT chengyangzhuang jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings
AT yuanjiezheng jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings
AT wenhuihuang jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings
AT weikuanjia jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings