Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases

Multiword expressions (MWEs) make up a significant portion of the lexicon and have distinctive characteristics of non-compositionality, non-substitutability, non-modifiability. They have been widely recognized as a very problematic part of natural language processing (NLP) as the current linguistic...

Full description

Bibliographic Details
Main Author: Lee, Hui Shan
Other Authors: Francis Bond
Format: Final Year Project (FYP)
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/69702
_version_ 1826115890823823360
author Lee, Hui Shan
author2 Francis Bond
author_facet Francis Bond
Lee, Hui Shan
author_sort Lee, Hui Shan
collection NTU
description Multiword expressions (MWEs) make up a significant portion of the lexicon and have distinctive characteristics of non-compositionality, non-substitutability, non-modifiability. They have been widely recognized as a very problematic part of natural language processing (NLP) as the current linguistic databases often do not have enough coverage on MWEs. This paper attempts to fill a gap in research by looking at how difficult it is to retrieve and process Japanese MWEs. The research presents an overview of 360 entries obtained through automatic-retrieval (AR) and manual retrieval (MR) from the corpus. These entries are then compared across seven databases; goo dictionary, imiwa? dictionary, the JDMWE, the WWWJDIC, NINJAL, wordnet, and the N-gram count corpus to test for whether thery are MWEs. The results obtained from this study suggest that the coverage of the database used, the differences in how phrases are represented in the dictionary, complications caused by the different writing systems present in Japanese, as well as the need for human judgement, are some of the main problems in determining whether a phrase is an MWE.
first_indexed 2024-10-01T04:02:30Z
format Final Year Project (FYP)
id ntu-10356/69702
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:02:30Z
publishDate 2017
record_format dspace
spelling ntu-10356/697022019-12-10T13:17:24Z Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases Lee, Hui Shan Francis Bond School of Humanities and Social Sciences DRNTU::Humanities Multiword expressions (MWEs) make up a significant portion of the lexicon and have distinctive characteristics of non-compositionality, non-substitutability, non-modifiability. They have been widely recognized as a very problematic part of natural language processing (NLP) as the current linguistic databases often do not have enough coverage on MWEs. This paper attempts to fill a gap in research by looking at how difficult it is to retrieve and process Japanese MWEs. The research presents an overview of 360 entries obtained through automatic-retrieval (AR) and manual retrieval (MR) from the corpus. These entries are then compared across seven databases; goo dictionary, imiwa? dictionary, the JDMWE, the WWWJDIC, NINJAL, wordnet, and the N-gram count corpus to test for whether thery are MWEs. The results obtained from this study suggest that the coverage of the database used, the differences in how phrases are represented in the dictionary, complications caused by the different writing systems present in Japanese, as well as the need for human judgement, are some of the main problems in determining whether a phrase is an MWE. Bachelor of Arts 2017-03-20T04:23:43Z 2017-03-20T04:23:43Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/69702 en Nanyang Technological University 55 p. application/pdf
spellingShingle DRNTU::Humanities
Lee, Hui Shan
Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases
title Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases
title_full Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases
title_fullStr Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases
title_full_unstemmed Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases
title_short Multiword expressions : a study on representation of Japanese MWEs in wordnet and other lexical databases
title_sort multiword expressions a study on representation of japanese mwes in wordnet and other lexical databases
topic DRNTU::Humanities
url http://hdl.handle.net/10356/69702
work_keys_str_mv AT leehuishan multiwordexpressionsastudyonrepresentationofjapanesemwesinwordnetandotherlexicaldatabases