Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings
The most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is re...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8918121/ |
_version_ | 1819132935814512640 |
---|---|
author | Chengyang Zhuang Yuanjie Zheng Wenhui Huang Weikuan Jia |
author_facet | Chengyang Zhuang Yuanjie Zheng Wenhui Huang Weikuan Jia |
author_sort | Chengyang Zhuang |
collection | DOAJ |
description | The most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is related to the accuracy of prediction. Therefore, to obtain high-quality Chinese character embedding, we propose a continuously enhanced word embedding model. The model starts with fine-grained strokes and adjacent stroke information and enhances subcharacter embedding by combining the relationship vector representation between strokes. Similarly, we combine the subcharacter relationship vector and the character relationship vector to learn Chinese word embedding based on the enhanced subcharacter embedding. We construct the underlying stroke n-grams and adjacent stroke n-grams and extract the relationship vector used to enhance the relationship between the components, which can be used to learn Chinese word embedding and improve the accuracy. Finally, we evaluate our model on the word similarity calculations and word reasoning tasks. |
first_indexed | 2024-12-22T09:39:18Z |
format | Article |
id | doaj.art-3dfae04e38a449ce95369359a5798dc2 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-12-22T09:39:18Z |
publishDate | 2019-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-3dfae04e38a449ce95369359a5798dc22022-12-21T18:30:43ZengIEEEIEEE Access2169-35362019-01-01717469917470810.1109/ACCESS.2019.29568228918121Joint Fine-Grained Components Continuously Enhance Chinese Word EmbeddingsChengyang Zhuang0https://orcid.org/0000-0001-9714-9124Yuanjie Zheng1https://orcid.org/0000-0002-5786-2491Wenhui Huang2https://orcid.org/0000-0002-5435-8775Weikuan Jia3https://orcid.org/0000-0001-6242-3269School of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaSchool of Information Science and Engineering, Shandong Normal University, Jinan, ChinaThe most common method of word embedding is to learn word vector representations from context information of large-scale text. However, Chinese words usually consist of characters, subcharacters, and strokes, and each part contains rich semantic information. The quality of Chinese word vectors is related to the accuracy of prediction. Therefore, to obtain high-quality Chinese character embedding, we propose a continuously enhanced word embedding model. The model starts with fine-grained strokes and adjacent stroke information and enhances subcharacter embedding by combining the relationship vector representation between strokes. Similarly, we combine the subcharacter relationship vector and the character relationship vector to learn Chinese word embedding based on the enhanced subcharacter embedding. We construct the underlying stroke n-grams and adjacent stroke n-grams and extract the relationship vector used to enhance the relationship between the components, which can be used to learn Chinese word embedding and improve the accuracy. Finally, we evaluate our model on the word similarity calculations and word reasoning tasks.https://ieeexplore.ieee.org/document/8918121/Chinese word embeddingstrokesub-charactercharacterlanguagen–grams |
spellingShingle | Chengyang Zhuang Yuanjie Zheng Wenhui Huang Weikuan Jia Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings IEEE Access Chinese word embedding stroke sub-character character language n–grams |
title | Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings |
title_full | Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings |
title_fullStr | Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings |
title_full_unstemmed | Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings |
title_short | Joint Fine-Grained Components Continuously Enhance Chinese Word Embeddings |
title_sort | joint fine grained components continuously enhance chinese word embeddings |
topic | Chinese word embedding stroke sub-character character language n–grams |
url | https://ieeexplore.ieee.org/document/8918121/ |
work_keys_str_mv | AT chengyangzhuang jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings AT yuanjiezheng jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings AT wenhuihuang jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings AT weikuanjia jointfinegrainedcomponentscontinuouslyenhancechinesewordembeddings |