UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1

The corpus comprises 62 files in "Story Workbench" annotation format: 30 folktales in English from a variety of sources, and 32 Wall Street Journal articles selected to coincide with articles found in the Penn Treebank. The files are annotated with the location of referring expressions, co...

Full description

Bibliographic Details
Main Authors:	Finlayson, Mark Alan, Hervas, Raquel
Other Authors:	Patrick Winston
Published:	2010
Online Access:	http://hdl.handle.net/1721.1/57507

_version_	1826216052454850560
author	Finlayson, Mark Alan Hervas, Raquel
author2	Patrick Winston
author_facet	Patrick Winston Finlayson, Mark Alan Hervas, Raquel
author_sort	Finlayson, Mark Alan
collection	MIT
description	The corpus comprises 62 files in "Story Workbench" annotation format: 30 folktales in English from a variety of sources, and 32 Wall Street Journal articles selected to coincide with articles found in the Penn Treebank. The files are annotated with the location of referring expressions, coreference relations between the referring expressions, and so-called "indication structures", which split referring expressions into constituents (nuclei and modifiers) and mark each constituent as either 'distinctive' or 'descriptive', indicating whether or not the constituent contains information required for uniquely identifying the referent. The files distributed in this corpus archive are the gold-standard files, which were constructed by merging annotations done by two trained annotators. The contents of this corpus, the annotation procedure, and the indication structures are described in more detail in a paper titled "The Prevalence of Descriptive Referring Expressions in News and Narrative" published in the proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, held in July 2010 in Uppsala, Sweden (ACL-2010). A near-final version of the paper is included in the doc/ directory of the compressed corpus archive file. This is version 1.1 of the UMIREC corpus, in which the coreference annotations have been fixed relative to version 1.0. UMIREC v1.0 suffered from a bug in the export script that corrupted the coreference data.
first_indexed	2024-09-23T16:41:33Z
id	mit-1721.1/57507
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T16:41:33Z
publishDate	2010
record_format	dspace
spelling	mit-1721.1/575072019-04-08T07:42:59Z UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1 Finlayson, Mark Alan Hervas, Raquel Patrick Winston Genesis Patrick Winston Genesis The corpus comprises 62 files in "Story Workbench" annotation format: 30 folktales in English from a variety of sources, and 32 Wall Street Journal articles selected to coincide with articles found in the Penn Treebank. The files are annotated with the location of referring expressions, coreference relations between the referring expressions, and so-called "indication structures", which split referring expressions into constituents (nuclei and modifiers) and mark each constituent as either 'distinctive' or 'descriptive', indicating whether or not the constituent contains information required for uniquely identifying the referent. The files distributed in this corpus archive are the gold-standard files, which were constructed by merging annotations done by two trained annotators. The contents of this corpus, the annotation procedure, and the indication structures are described in more detail in a paper titled "The Prevalence of Descriptive Referring Expressions in News and Narrative" published in the proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, held in July 2010 in Uppsala, Sweden (ACL-2010). A near-final version of the paper is included in the doc/ directory of the compressed corpus archive file. This is version 1.1 of the UMIREC corpus, in which the coreference annotations have been fixed relative to version 1.0. UMIREC v1.0 suffered from a bug in the export script that corrupted the coreference data. 2010-08-19T18:15:22Z 2010-08-19T18:15:22Z 2010-05-12 http://hdl.handle.net/1721.1/57507 Finlayson, M.A. & Hervás, R. (2010) UCM/MIT Indications, Referring Expressions, and Co-Reference Corpus v1.1 (UMIREC corpus). MIT CSAIL Work Product. http://hdl.handle.net/1721.1/54765 http://hdl.handle.net/1721.1/54766 Creative Commons Attribution 3.0 Unported http://creativecommons.org/licenses/by/3.0/ 877 ko application/octet-stream
spellingShingle	Finlayson, Mark Alan Hervas, Raquel UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1
title	UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1
title_full	UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1
title_fullStr	UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1
title_full_unstemmed	UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1
title_short	UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1
title_sort	ucm mit indications referring expressions and coreference corpus umirec corpus v1 1
url	http://hdl.handle.net/1721.1/57507
work_keys_str_mv	AT finlaysonmarkalan ucmmitindicationsreferringexpressionsandcoreferencecorpusumireccorpusv11 AT hervasraquel ucmmitindicationsreferringexpressionsandcoreferencecorpusumireccorpusv11

UCM/MIT Indications, Referring Expressions, and Coreference Corpus (UMIREC corpus) v1.1

Similar Items