The representation of some phrases in Arabic word semantic vector spaces

We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which hav...

Full description

Bibliographic Details
Main Authors: Taylor Stephen, Brychcín Tomáš
Format: Article
Language:English
Published: De Gruyter 2018-12-01
Series:Open Computer Science
Subjects:
Online Access:http://www.degruyter.com/view/j/comp.2018.8.issue-1/comp-2018-0017/comp-2018-0017.xml?format=INT
_version_ 1818034197986541568
author Taylor Stephen
Brychcín Tomáš
author_facet Taylor Stephen
Brychcín Tomáš
author_sort Taylor Stephen
collection DOAJ
description We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers.
first_indexed 2024-12-10T06:35:20Z
format Article
id doaj.art-818d5a0d53cb4c64ac77ac8a7aba6c43
institution Directory Open Access Journal
issn 2299-1093
language English
last_indexed 2024-12-10T06:35:20Z
publishDate 2018-12-01
publisher De Gruyter
record_format Article
series Open Computer Science
spelling doaj.art-818d5a0d53cb4c64ac77ac8a7aba6c432022-12-22T01:58:56ZengDe GruyterOpen Computer Science2299-10932018-12-018118219310.1515/comp-2018-0017comp-2018-0017The representation of some phrases in Arabic word semantic vector spacesTaylor Stephen0Brychcín Tomáš1Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia,Pilsen, Czech RepublicDepartment of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia,Pilsen, Czech RepublicWe demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers.http://www.degruyter.com/view/j/comp.2018.8.issue-1/comp-2018-0017/comp-2018-0017.xml?format=INTphrase semantic vectorsword analogiesword embeddingsarabic
spellingShingle Taylor Stephen
Brychcín Tomáš
The representation of some phrases in Arabic word semantic vector spaces
Open Computer Science
phrase semantic vectors
word analogies
word embeddings
arabic
title The representation of some phrases in Arabic word semantic vector spaces
title_full The representation of some phrases in Arabic word semantic vector spaces
title_fullStr The representation of some phrases in Arabic word semantic vector spaces
title_full_unstemmed The representation of some phrases in Arabic word semantic vector spaces
title_short The representation of some phrases in Arabic word semantic vector spaces
title_sort representation of some phrases in arabic word semantic vector spaces
topic phrase semantic vectors
word analogies
word embeddings
arabic
url http://www.degruyter.com/view/j/comp.2018.8.issue-1/comp-2018-0017/comp-2018-0017.xml?format=INT
work_keys_str_mv AT taylorstephen therepresentationofsomephrasesinarabicwordsemanticvectorspaces
AT brychcintomas therepresentationofsomephrasesinarabicwordsemanticvectorspaces
AT taylorstephen representationofsomephrasesinarabicwordsemanticvectorspaces
AT brychcintomas representationofsomephrasesinarabicwordsemanticvectorspaces