The representation of some phrases in Arabic word semantic vector spaces
We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which hav...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2018-12-01
|
Series: | Open Computer Science |
Subjects: | |
Online Access: | http://www.degruyter.com/view/j/comp.2018.8.issue-1/comp-2018-0017/comp-2018-0017.xml?format=INT |
_version_ | 1818034197986541568 |
---|---|
author | Taylor Stephen Brychcín Tomáš |
author_facet | Taylor Stephen Brychcín Tomáš |
author_sort | Taylor Stephen |
collection | DOAJ |
description | We demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers. |
first_indexed | 2024-12-10T06:35:20Z |
format | Article |
id | doaj.art-818d5a0d53cb4c64ac77ac8a7aba6c43 |
institution | Directory Open Access Journal |
issn | 2299-1093 |
language | English |
last_indexed | 2024-12-10T06:35:20Z |
publishDate | 2018-12-01 |
publisher | De Gruyter |
record_format | Article |
series | Open Computer Science |
spelling | doaj.art-818d5a0d53cb4c64ac77ac8a7aba6c432022-12-22T01:58:56ZengDe GruyterOpen Computer Science2299-10932018-12-018118219310.1515/comp-2018-0017comp-2018-0017The representation of some phrases in Arabic word semantic vector spacesTaylor Stephen0Brychcín Tomáš1Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia,Pilsen, Czech RepublicDepartment of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia,Pilsen, Czech RepublicWe demonstrate several ways to use morphological word analogies to examine the representation of complex words in semantic vector spaces. We present a set of morphological relations, each of which can be used to generate many word analogies. 1. We show that the difference-vectors for pairs which have the same relation to each other are similarly aligned. 2. We suggest that addition of difference-vectors is a useful phrase-building operator. 3. We propose that pairs in the same relation may have similar relative frequencies. 4. We suggest that homographs, which necessarily have the same semantic vectors, can sometimes be separated into different vectors for different senses, using frequency estimates and alignment constraints obtained from word analogies. 5. We observe that some of our analogies seem to be parallel, and might be combined. We use Arabic words as a case study, because Arabic orthography includes verb conjugations, object pronouns, definitive articles, possessive pronouns, and some prepositions in single word-forms. Therefore, a number of short phrases, built up of easily perceived constituents, are already present in stock semantic spaces for Arabic available on the web. Similar phrases in English would require including bigrams or trigrams as lemmas in the word embedding, although English derivational morphology allows for other relationships in standard semantic spaces which Arabic does not, for example negation. We make our corpus of morphological relations available to other researchers.http://www.degruyter.com/view/j/comp.2018.8.issue-1/comp-2018-0017/comp-2018-0017.xml?format=INTphrase semantic vectorsword analogiesword embeddingsarabic |
spellingShingle | Taylor Stephen Brychcín Tomáš The representation of some phrases in Arabic word semantic vector spaces Open Computer Science phrase semantic vectors word analogies word embeddings arabic |
title | The representation of some phrases in Arabic word semantic vector spaces |
title_full | The representation of some phrases in Arabic word semantic vector spaces |
title_fullStr | The representation of some phrases in Arabic word semantic vector spaces |
title_full_unstemmed | The representation of some phrases in Arabic word semantic vector spaces |
title_short | The representation of some phrases in Arabic word semantic vector spaces |
title_sort | representation of some phrases in arabic word semantic vector spaces |
topic | phrase semantic vectors word analogies word embeddings arabic |
url | http://www.degruyter.com/view/j/comp.2018.8.issue-1/comp-2018-0017/comp-2018-0017.xml?format=INT |
work_keys_str_mv | AT taylorstephen therepresentationofsomephrasesinarabicwordsemanticvectorspaces AT brychcintomas therepresentationofsomephrasesinarabicwordsemanticvectorspaces AT taylorstephen representationofsomephrasesinarabicwordsemanticvectorspaces AT brychcintomas representationofsomephrasesinarabicwordsemanticvectorspaces |