Improve word embedding using both writing and pronunciation.

Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to...

Full description

Bibliographic Details
Main Authors: Wenhao Zhu, Xin Jin, Jianyue Ni, Baogang Wei, Zhiguo Lu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC6287836?pdf=render
_version_ 1818278352349298688
author Wenhao Zhu
Xin Jin
Jianyue Ni
Baogang Wei
Zhiguo Lu
author_facet Wenhao Zhu
Xin Jin
Jianyue Ni
Baogang Wei
Zhiguo Lu
author_sort Wenhao Zhu
collection DOAJ
description Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.
first_indexed 2024-12-12T23:16:04Z
format Article
id doaj.art-7c9039b8ca17477487e6796341ab711e
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-12T23:16:04Z
publishDate 2018-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-7c9039b8ca17477487e6796341ab711e2022-12-22T00:08:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011312e020878510.1371/journal.pone.0208785Improve word embedding using both writing and pronunciation.Wenhao ZhuXin JinJianyue NiBaogang WeiZhiguo LuText representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.http://europepmc.org/articles/PMC6287836?pdf=render
spellingShingle Wenhao Zhu
Xin Jin
Jianyue Ni
Baogang Wei
Zhiguo Lu
Improve word embedding using both writing and pronunciation.
PLoS ONE
title Improve word embedding using both writing and pronunciation.
title_full Improve word embedding using both writing and pronunciation.
title_fullStr Improve word embedding using both writing and pronunciation.
title_full_unstemmed Improve word embedding using both writing and pronunciation.
title_short Improve word embedding using both writing and pronunciation.
title_sort improve word embedding using both writing and pronunciation
url http://europepmc.org/articles/PMC6287836?pdf=render
work_keys_str_mv AT wenhaozhu improvewordembeddingusingbothwritingandpronunciation
AT xinjin improvewordembeddingusingbothwritingandpronunciation
AT jianyueni improvewordembeddingusingbothwritingandpronunciation
AT baogangwei improvewordembeddingusingbothwritingandpronunciation
AT zhiguolu improvewordembeddingusingbothwritingandpronunciation