Using parallel text for the extraction of German multiword expressions
A procedure for the identification of semantically opaque (i.e. idiomatic) German multiwords is presented. We focus on verb + PP combinations that are lexicographically relevant (extracted via dependency parsing [Schiehlen 2003]) of the kind ins Leben rufen – “to initiate”, lit.: “to call into life”...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Université Jean Moulin - Lyon 3
2010-04-01
|
Series: | Lexis: Journal in English Lexicology |
Subjects: | |
Online Access: | http://journals.openedition.org/lexis/564 |
_version_ | 1818559842319597568 |
---|---|
author | Fabienne Fritzinger |
author_facet | Fabienne Fritzinger |
author_sort | Fabienne Fritzinger |
collection | DOAJ |
description | A procedure for the identification of semantically opaque (i.e. idiomatic) German multiwords is presented. We focus on verb + PP combinations that are lexicographically relevant (extracted via dependency parsing [Schiehlen 2003]) of the kind ins Leben rufen – “to initiate”, lit.: “to call into life”. Starting from [Villada Moirón and Tiedemann 2006], the method exploits the fact that opaque combinations are translated as a whole, whereas compositional uses would show regular, individual translations of the words involved. The translations into other languages are obtained by applying GIZA++ [Och and Ney 2003] word alignment to the EUROPARL corpus [Koehn 2005]. Numerous experiments are performed to further optimise the original method: several parameters are analysed individually as well as in combination with each other. This leads to the following results: depending on the actual parameter settings, values between 0.800 and 0.936 (in terms of uninterpolated average precision) are reached amongst the highest scoring 200 multiword candidates, as opposed to a baseline of 0.584, using the 200 most frequent multiwords in decreasing order of their occurrence frequency. |
first_indexed | 2024-12-14T00:30:47Z |
format | Article |
id | doaj.art-cc59fa6fb15043dcb8292df29d013292 |
institution | Directory Open Access Journal |
issn | 1951-6215 |
language | English |
last_indexed | 2024-12-14T00:30:47Z |
publishDate | 2010-04-01 |
publisher | Université Jean Moulin - Lyon 3 |
record_format | Article |
series | Lexis: Journal in English Lexicology |
spelling | doaj.art-cc59fa6fb15043dcb8292df29d0132922022-12-21T23:24:52ZengUniversité Jean Moulin - Lyon 3Lexis: Journal in English Lexicology1951-62152010-04-01410.4000/lexis.564Using parallel text for the extraction of German multiword expressionsFabienne FritzingerA procedure for the identification of semantically opaque (i.e. idiomatic) German multiwords is presented. We focus on verb + PP combinations that are lexicographically relevant (extracted via dependency parsing [Schiehlen 2003]) of the kind ins Leben rufen – “to initiate”, lit.: “to call into life”. Starting from [Villada Moirón and Tiedemann 2006], the method exploits the fact that opaque combinations are translated as a whole, whereas compositional uses would show regular, individual translations of the words involved. The translations into other languages are obtained by applying GIZA++ [Och and Ney 2003] word alignment to the EUROPARL corpus [Koehn 2005]. Numerous experiments are performed to further optimise the original method: several parameters are analysed individually as well as in combination with each other. This leads to the following results: depending on the actual parameter settings, values between 0.800 and 0.936 (in terms of uninterpolated average precision) are reached amongst the highest scoring 200 multiword candidates, as opposed to a baseline of 0.584, using the 200 most frequent multiwords in decreasing order of their occurrence frequency.http://journals.openedition.org/lexis/564multiword expressionsmultilingual corpusdependency parsingstatistical word alignment |
spellingShingle | Fabienne Fritzinger Using parallel text for the extraction of German multiword expressions Lexis: Journal in English Lexicology multiword expressions multilingual corpus dependency parsing statistical word alignment |
title | Using parallel text for the extraction of German multiword expressions |
title_full | Using parallel text for the extraction of German multiword expressions |
title_fullStr | Using parallel text for the extraction of German multiword expressions |
title_full_unstemmed | Using parallel text for the extraction of German multiword expressions |
title_short | Using parallel text for the extraction of German multiword expressions |
title_sort | using parallel text for the extraction of german multiword expressions |
topic | multiword expressions multilingual corpus dependency parsing statistical word alignment |
url | http://journals.openedition.org/lexis/564 |
work_keys_str_mv | AT fabiennefritzinger usingparalleltextfortheextractionofgermanmultiwordexpressions |