Improve word embedding using both writing and pronunciation.

Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to...

Full description

Bibliographic Details
Main Authors:	Wenhao Zhu, Xin Jin, Jianyue Ni, Baogang Wei, Zhiguo Lu
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2018-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC6287836?pdf=render

_version_	1818278352349298688
author	Wenhao Zhu Xin Jin Jianyue Ni Baogang Wei Zhiguo Lu
author_facet	Wenhao Zhu Xin Jin Jianyue Ni Baogang Wei Zhiguo Lu
author_sort	Wenhao Zhu
collection	DOAJ
description	Text representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.
first_indexed	2024-12-12T23:16:04Z
format	Article
id	doaj.art-7c9039b8ca17477487e6796341ab711e
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-12-12T23:16:04Z
publishDate	2018-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-7c9039b8ca17477487e6796341ab711e2022-12-22T00:08:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-011312e020878510.1371/journal.pone.0208785Improve word embedding using both writing and pronunciation.Wenhao ZhuXin JinJianyue NiBaogang WeiZhiguo LuText representation can map text into a vector space for subsequent use in numerical calculations and processing tasks. Word embedding is an important component of text representation. Most existing word embedding models focus on writing and utilize context, weight, dependency, morphology, etc., to optimize the training. However, from the linguistic point of view, spoken language is a more direct expression of semantics; writing has meaning only as a recording of spoken language. Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to meaning. This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. Word similarity and text classification experiments show that the PWE outperforms the baseline model that does not include speech information. Language is a storehouse of sound-images; therefore, the PWE can be applied to most languages.http://europepmc.org/articles/PMC6287836?pdf=render
spellingShingle	Wenhao Zhu Xin Jin Jianyue Ni Baogang Wei Zhiguo Lu Improve word embedding using both writing and pronunciation. PLoS ONE
title	Improve word embedding using both writing and pronunciation.
title_full	Improve word embedding using both writing and pronunciation.
title_fullStr	Improve word embedding using both writing and pronunciation.
title_full_unstemmed	Improve word embedding using both writing and pronunciation.
title_short	Improve word embedding using both writing and pronunciation.
title_sort	improve word embedding using both writing and pronunciation
url	http://europepmc.org/articles/PMC6287836?pdf=render
work_keys_str_mv	AT wenhaozhu improvewordembeddingusingbothwritingandpronunciation AT xinjin improvewordembeddingusingbothwritingandpronunciation AT jianyueni improvewordembeddingusingbothwritingandpronunciation AT baogangwei improvewordembeddingusingbothwritingandpronunciation AT zhiguolu improvewordembeddingusingbothwritingandpronunciation

Improve word embedding using both writing and pronunciation.

Similar Items