Discourse models for collaboratively edited corpora

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.

Bibliographic Details
Main Author: Chen, Erdong, S.M. Massachusetts Institute of Technology
Other Authors: Regina Barzilay.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2009
Subjects:
Online Access:http://hdl.handle.net/1721.1/44374
_version_ 1826214572336349184
author Chen, Erdong, S.M. Massachusetts Institute of Technology
author2 Regina Barzilay.
author_facet Regina Barzilay.
Chen, Erdong, S.M. Massachusetts Institute of Technology
author_sort Chen, Erdong, S.M. Massachusetts Institute of Technology
collection MIT
description Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
first_indexed 2024-09-23T16:07:55Z
format Thesis
id mit-1721.1/44374
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T16:07:55Z
publishDate 2009
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/443742019-04-11T14:26:04Z Discourse models for collaboratively edited corpora Chen, Erdong, S.M. Massachusetts Institute of Technology Regina Barzilay. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008. Includes bibliographical references (p. 77-81). This thesis focuses on computational discourse models for collaboratively edited corpora. Due to the exponential growth rate and significant stylistic and content variations of collaboratively edited corpora, models based on professionally edited texts are incapable of processing the new data effectively. For these methods to succeed, one challenge is to preserve the local coherence as well as global consistence. We explore two corpus-based methods for processing collaboratively edited corpora, which effectively model and optimize the consistence of user generated text. The first method addresses the task of inserting new information into existing texts. In particular, we wish to determine the best location in a text for a given piece of new information. We present an online ranking model which exploits this hierarchical structure - representationally in its features and algorithmically in its learning procedure. When tested on a corpus of Wikipedia articles, our hierarchically informed model predicts the correct insertion paragraph more accurately than baseline methods. The second method concerns inducing a common structure across multiple articles in similar domains to aid cross document collaborative editing. A graphical model is designed to induce section topics and to learn topic clusters. Some preliminary experiments showed that the proposed method is comparable to baseline methods. by Erdong Chen. S.M. 2009-01-30T16:38:50Z 2009-01-30T16:38:50Z 2008 2008 Thesis http://hdl.handle.net/1721.1/44374 276947510 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 81 p. application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Chen, Erdong, S.M. Massachusetts Institute of Technology
Discourse models for collaboratively edited corpora
title Discourse models for collaboratively edited corpora
title_full Discourse models for collaboratively edited corpora
title_fullStr Discourse models for collaboratively edited corpora
title_full_unstemmed Discourse models for collaboratively edited corpora
title_short Discourse models for collaboratively edited corpora
title_sort discourse models for collaboratively edited corpora
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/44374
work_keys_str_mv AT chenerdongsmmassachusettsinstituteoftechnology discoursemodelsforcollaborativelyeditedcorpora