Paraphrase-Sense-Tagged Sentences

Many natural language processing tasks require discriminating the particular meaning of a word in context, but building corpora for developing sense-aware models can be a challenge. We present a large resource of example usages for words having a particular meaning, called Paraphrase-Sense-Tagged Se...

Full description

Bibliographic Details
Main Authors: Cocos, Anne, Callison-Burch, Chris
Format: Article
Language:English
Published: The MIT Press 2019-11-01
Series:Transactions of the Association for Computational Linguistics
Online Access:https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00295
_version_ 1818808730219708416
author Cocos, Anne
Callison-Burch, Chris
author_facet Cocos, Anne
Callison-Burch, Chris
author_sort Cocos, Anne
collection DOAJ
description Many natural language processing tasks require discriminating the particular meaning of a word in context, but building corpora for developing sense-aware models can be a challenge. We present a large resource of example usages for words having a particular meaning, called Paraphrase-Sense-Tagged Sentences (PSTS). Built on the premise that a word’s paraphrases instantiate its fine-grained meanings (i.e., bug has different meanings corresponding to its paraphrases fly and microbe) the resource contains up to 10,000 sentences for each of 3 million target-paraphrase pairs where the target word takes on the meaning of the paraphrase. We describe an automatic method based on bilingual pivoting used to enumerate sentences for PSTS, and present two models for ranking PSTS sentences based on their quality. Finally, we demonstrate the utility of PSTS by using it to build a dataset for the task of hypernym prediction in context. Training a model on this automatically generated dataset produces accuracy that is competitive with a model trained on smaller datasets crafted with some manual effort.
first_indexed 2024-12-18T19:46:12Z
format Article
id doaj.art-35fe5b0819ac4ef68cad89ddd12368bf
institution Directory Open Access Journal
issn 2307-387X
language English
last_indexed 2024-12-18T19:46:12Z
publishDate 2019-11-01
publisher The MIT Press
record_format Article
series Transactions of the Association for Computational Linguistics
spelling doaj.art-35fe5b0819ac4ef68cad89ddd12368bf2022-12-21T20:55:19ZengThe MIT PressTransactions of the Association for Computational Linguistics2307-387X2019-11-01771472810.1162/tacl_a_00295Paraphrase-Sense-Tagged SentencesCocos, AnneCallison-Burch, ChrisMany natural language processing tasks require discriminating the particular meaning of a word in context, but building corpora for developing sense-aware models can be a challenge. We present a large resource of example usages for words having a particular meaning, called Paraphrase-Sense-Tagged Sentences (PSTS). Built on the premise that a word’s paraphrases instantiate its fine-grained meanings (i.e., bug has different meanings corresponding to its paraphrases fly and microbe) the resource contains up to 10,000 sentences for each of 3 million target-paraphrase pairs where the target word takes on the meaning of the paraphrase. We describe an automatic method based on bilingual pivoting used to enumerate sentences for PSTS, and present two models for ranking PSTS sentences based on their quality. Finally, we demonstrate the utility of PSTS by using it to build a dataset for the task of hypernym prediction in context. Training a model on this automatically generated dataset produces accuracy that is competitive with a model trained on smaller datasets crafted with some manual effort.https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00295
spellingShingle Cocos, Anne
Callison-Burch, Chris
Paraphrase-Sense-Tagged Sentences
Transactions of the Association for Computational Linguistics
title Paraphrase-Sense-Tagged Sentences
title_full Paraphrase-Sense-Tagged Sentences
title_fullStr Paraphrase-Sense-Tagged Sentences
title_full_unstemmed Paraphrase-Sense-Tagged Sentences
title_short Paraphrase-Sense-Tagged Sentences
title_sort paraphrase sense tagged sentences
url https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00295
work_keys_str_mv AT cocosanne paraphrasesensetaggedsentences
AT callisonburchchris paraphrasesensetaggedsentences