Using parallel text for the extraction of German multiword expressions

A procedure for the identification of semantically opaque (i.e. idiomatic) German multiwords is presented. We focus on verb + PP combinations that are lexicographically relevant (extracted via dependency parsing [Schiehlen 2003]) of the kind ins Leben rufen – “to initiate”, lit.: “to call into life”...

Full description

Bibliographic Details
Main Author: Fabienne Fritzinger
Format: Article
Language:English
Published: Université Jean Moulin - Lyon 3 2010-04-01
Series:Lexis: Journal in English Lexicology
Subjects:
Online Access:http://journals.openedition.org/lexis/564
_version_ 1818559842319597568
author Fabienne Fritzinger
author_facet Fabienne Fritzinger
author_sort Fabienne Fritzinger
collection DOAJ
description A procedure for the identification of semantically opaque (i.e. idiomatic) German multiwords is presented. We focus on verb + PP combinations that are lexicographically relevant (extracted via dependency parsing [Schiehlen 2003]) of the kind ins Leben rufen – “to initiate”, lit.: “to call into life”. Starting from [Villada Moirón and Tiedemann 2006], the method exploits the fact that opaque combinations are translated as a whole, whereas compositional uses would show regular, individual translations of the words involved. The translations into other languages are obtained by applying GIZA++ [Och and Ney 2003] word alignment to the EUROPARL corpus [Koehn 2005]. Numerous experiments are performed to further optimise the original method: several parameters are analysed individually as well as in combination with each other. This leads to the following results: depending on the actual parameter settings, values between 0.800 and 0.936 (in terms of uninterpolated average precision) are reached amongst the highest scoring 200 multiword candidates, as opposed to a baseline of 0.584, using the 200 most frequent multiwords in decreasing order of their occurrence frequency.
first_indexed 2024-12-14T00:30:47Z
format Article
id doaj.art-cc59fa6fb15043dcb8292df29d013292
institution Directory Open Access Journal
issn 1951-6215
language English
last_indexed 2024-12-14T00:30:47Z
publishDate 2010-04-01
publisher Université Jean Moulin - Lyon 3
record_format Article
series Lexis: Journal in English Lexicology
spelling doaj.art-cc59fa6fb15043dcb8292df29d0132922022-12-21T23:24:52ZengUniversité Jean Moulin - Lyon 3Lexis: Journal in English Lexicology1951-62152010-04-01410.4000/lexis.564Using parallel text for the extraction of German multiword expressionsFabienne FritzingerA procedure for the identification of semantically opaque (i.e. idiomatic) German multiwords is presented. We focus on verb + PP combinations that are lexicographically relevant (extracted via dependency parsing [Schiehlen 2003]) of the kind ins Leben rufen – “to initiate”, lit.: “to call into life”. Starting from [Villada Moirón and Tiedemann 2006], the method exploits the fact that opaque combinations are translated as a whole, whereas compositional uses would show regular, individual translations of the words involved. The translations into other languages are obtained by applying GIZA++ [Och and Ney 2003] word alignment to the EUROPARL corpus [Koehn 2005]. Numerous experiments are performed to further optimise the original method: several parameters are analysed individually as well as in combination with each other. This leads to the following results: depending on the actual parameter settings, values between 0.800 and 0.936 (in terms of uninterpolated average precision) are reached amongst the highest scoring 200 multiword candidates, as opposed to a baseline of 0.584, using the 200 most frequent multiwords in decreasing order of their occurrence frequency.http://journals.openedition.org/lexis/564multiword expressionsmultilingual corpusdependency parsingstatistical word alignment
spellingShingle Fabienne Fritzinger
Using parallel text for the extraction of German multiword expressions
Lexis: Journal in English Lexicology
multiword expressions
multilingual corpus
dependency parsing
statistical word alignment
title Using parallel text for the extraction of German multiword expressions
title_full Using parallel text for the extraction of German multiword expressions
title_fullStr Using parallel text for the extraction of German multiword expressions
title_full_unstemmed Using parallel text for the extraction of German multiword expressions
title_short Using parallel text for the extraction of German multiword expressions
title_sort using parallel text for the extraction of german multiword expressions
topic multiword expressions
multilingual corpus
dependency parsing
statistical word alignment
url http://journals.openedition.org/lexis/564
work_keys_str_mv AT fabiennefritzinger usingparalleltextfortheextractionofgermanmultiwordexpressions