Interpreting noun compounds

<p>Noun compounds, which are sequences of nouns functioning as a single noun, are abundant in both written and spoken English and their interpretation is crucial for many natural language processing tasks, such as machine translation or information retrieval. Therefore there is significant ong...

Full description

Bibliographic Details
Main Author: Dobó, A
Other Authors: Pulman, S
Format: Thesis
Language:English
Published: 2010
Subjects:
Description
Summary:<p>Noun compounds, which are sequences of nouns functioning as a single noun, are abundant in both written and spoken English and their interpretation is crucial for many natural language processing tasks, such as machine translation or information retrieval. Therefore there is significant ongoing interest in their interpretation. Although it is an easy task for humans, it is rather challenging for computers.</p> <p>The interpretation of a noun compound can be given with a list of suitable paraphrases that are ranked according to their aptness, where the paraphrases can be verbs and prepositions. The aim of this dissertation is to develop methods that can automatically interpret two-noun noun compounds by paraphrases using large corpora. A general paraphrasing method is proposed that searches for paraphrases in static corpora, and uses Web search engine queries to validate results. Furthermore, a method for the SemEval-2 Task #9 is developed from the validation part of the general paraphrasing method.</p> <p>The results of the general paraphrasing method were evaluated by human judges; based on their aptness for the noun compound, the first three paraphrases returned for each noun compound were given a score between 1 and 5 by each judge. The paraphrases ranked first, second and third by the method proposed here received average scores of 3.1842, 2.7687 and 2.5583, respectively. Further, when comparing the returned paraphrase distribution for each noun compound with the judges’ distributions, it achieved an average Spearman’s rank correlation coefficient of 0.3108, an average Pearson’s correlation coefficient of 0.2738 and an average Kullback-Leibler divergence of 0.1589. The method for the SemEval-2 Task #9 was evaluated with the scorer provided for the task, on the test data set, by calculating the similarity of the returned paraphrase distribution for each noun compound with a gold standard. It achieved an average Spearman’s rank correlation coefficient of 0.3387, an average Pearson’s correlation coefficient of 0.3196 and an average Kullback-Leibler divergence of 4.1520.</p>