The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora

The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type <i>high esteem</i>, <i>strong tea</i>, <i>run</i> [<i>an</i>] <i>experiment</i>, <i>war brea...

Full description

Bibliographic Details
Main Authors: Alexander Shvets, Leo Wanner
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/10/20/3831
_version_ 1797471722833707008
author Alexander Shvets
Leo Wanner
author_facet Alexander Shvets
Leo Wanner
author_sort Alexander Shvets
collection DOAJ
description The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type <i>high esteem</i>, <i>strong tea</i>, <i>run</i> [<i>an</i>] <i>experiment</i>, <i>war break(s) out</i>, etc. In lexicography, such co-occurrences are referred to as <i>collocations</i>. Due to their semi-decompositional nature, collocations are of high relevance to a large number of natural language processing applications as well as to second language learning. A substantial body of work exists on the automatic recognition of collocations in textual material and, increasingly also on their semantic classification, even if not yet in the mainstream research. Especially classification with respect to the lexical function (LF) taxonomy, which is the most detailed semantically oriented taxonomy of collocations available to date, proved to be of real use to human speakers and machines alike. The most recent approaches in the field are based on multilingual neural graph transformer models that use explicit syntactic dependencies. Our goal is to explore whether the extension of such a model by a semantic relation extraction network improves its classification performance or whether it already learns the corresponding semantic relations from the dependencies and the sentential contexts, such that an additional relation extraction network will not improve the overall performance. The experiments show that the semantic relation extraction layer indeed improves the overall performance of a graph transformer. However, this improvement is not very significant, such that we can conclude that graph transformers already learn to a certain extent the semantics of the dependencies between the collocation elements.
first_indexed 2024-03-09T19:52:15Z
format Article
id doaj.art-7e8b253a438448ee8190f8d118007f67
institution Directory Open Access Journal
issn 2227-7390
language English
last_indexed 2024-03-09T19:52:15Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj.art-7e8b253a438448ee8190f8d118007f672023-11-24T01:07:40ZengMDPI AGMathematics2227-73902022-10-011020383110.3390/math10203831The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text CorporaAlexander Shvets0Leo Wanner1NLP Group, Pompeu Fabra University, 08018 Barcelona, SpainCatalan Institute for Research and Advanced Studies (ICREA) and NLP Group, Pompeu Fabra University, 08018 Barcelona, SpainThe speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type <i>high esteem</i>, <i>strong tea</i>, <i>run</i> [<i>an</i>] <i>experiment</i>, <i>war break(s) out</i>, etc. In lexicography, such co-occurrences are referred to as <i>collocations</i>. Due to their semi-decompositional nature, collocations are of high relevance to a large number of natural language processing applications as well as to second language learning. A substantial body of work exists on the automatic recognition of collocations in textual material and, increasingly also on their semantic classification, even if not yet in the mainstream research. Especially classification with respect to the lexical function (LF) taxonomy, which is the most detailed semantically oriented taxonomy of collocations available to date, proved to be of real use to human speakers and machines alike. The most recent approaches in the field are based on multilingual neural graph transformer models that use explicit syntactic dependencies. Our goal is to explore whether the extension of such a model by a semantic relation extraction network improves its classification performance or whether it already learns the corresponding semantic relations from the dependencies and the sentential contexts, such that an additional relation extraction network will not improve the overall performance. The experiments show that the semantic relation extraction layer indeed improves the overall performance of a graph transformer. However, this improvement is not very significant, such that we can conclude that graph transformers already learn to a certain extent the semantics of the dependencies between the collocation elements.https://www.mdpi.com/2227-7390/10/20/3831idiosyncratic word co-occurrencescollocationslexical functionsmultilingualgraph transformersmultitask learning
spellingShingle Alexander Shvets
Leo Wanner
The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
Mathematics
idiosyncratic word co-occurrences
collocations
lexical functions
multilingual
graph transformers
multitask learning
title The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
title_full The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
title_fullStr The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
title_full_unstemmed The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
title_short The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora
title_sort relation dimension in the identification and classification of lexically restricted word co occurrences in text corpora
topic idiosyncratic word co-occurrences
collocations
lexical functions
multilingual
graph transformers
multitask learning
url https://www.mdpi.com/2227-7390/10/20/3831
work_keys_str_mv AT alexandershvets therelationdimensionintheidentificationandclassificationoflexicallyrestrictedwordcooccurrencesintextcorpora
AT leowanner therelationdimensionintheidentificationandclassificationoflexicallyrestrictedwordcooccurrencesintextcorpora
AT alexandershvets relationdimensionintheidentificationandclassificationoflexicallyrestrictedwordcooccurrencesintextcorpora
AT leowanner relationdimensionintheidentificationandclassificationoflexicallyrestrictedwordcooccurrencesintextcorpora