Improve word embedding using both writing and pronunciation.
Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2018-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC6287836?pdf=render |
_version_ | 1818278352349298688 |
---|---|
author | Wenhao Zhu Xin Jin Jianyue Ni Baogang Wei Zhiguo Lu |
author_facet | Wenhao Zhu Xin Jin Jianyue Ni Baogang Wei Zhiguo Lu |
author_sort | Wenhao Zhu |
collection | DOAJ |
description | Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages. |
first_indexed | 2024-12-12T23:16:04Z |
format | Article |
id | doaj.art-7c9039b8ca17477487e6796341ab711e |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-12T23:16:04Z |
publishDate | 2018-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-7c9039b8ca17477487e6796341ab711e2022-12-22T00:08:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011312e020878510.1371/journal.pone.0208785Improve word embedding using both writing and pronunciation.Wenhao ZhuXin JinJianyue NiBaogang WeiZhiguo LuText representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.http://europepmc.org/articles/PMC6287836?pdf=render |
spellingShingle | Wenhao Zhu Xin Jin Jianyue Ni Baogang Wei Zhiguo Lu Improve word embedding using both writing and pronunciation. PLoS ONE |
title | Improve word embedding using both writing and pronunciation. |
title_full | Improve word embedding using both writing and pronunciation. |
title_fullStr | Improve word embedding using both writing and pronunciation. |
title_full_unstemmed | Improve word embedding using both writing and pronunciation. |
title_short | Improve word embedding using both writing and pronunciation. |
title_sort | improve word embedding using both writing and pronunciation |
url | http://europepmc.org/articles/PMC6287836?pdf=render |
work_keys_str_mv | AT wenhaozhu improvewordembeddingusingbothwritingandpronunciation AT xinjin improvewordembeddingusingbothwritingandpronunciation AT jianyueni improvewordembeddingusingbothwritingandpronunciation AT baogangwei improvewordembeddingusingbothwritingandpronunciation AT zhiguolu improvewordembeddingusingbothwritingandpronunciation |