Automated creation of Wikipedia articles

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.

Bibliographic Details
Main Author: Sauper, Christina (Christina Joan)
Other Authors: Regina Barzilay.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2009
Subjects:
Online Access:http://hdl.handle.net/1721.1/47824
_version_ 1826214150805651456
author Sauper, Christina (Christina Joan)
author2 Regina Barzilay.
author_facet Regina Barzilay.
Sauper, Christina (Christina Joan)
author_sort Sauper, Christina (Christina Joan)
collection MIT
description Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
first_indexed 2024-09-23T16:00:38Z
format Thesis
id mit-1721.1/47824
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T16:00:38Z
publishDate 2009
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/478242019-04-10T11:06:40Z Automated creation of Wikipedia articles Sauper, Christina (Christina Joan) Regina Barzilay. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009. Includes bibliographical references (leaves 81-84). This thesis describes an automatic approach for producing Wikipedia articles. The wealth of information present on the Internet is currently untapped for many topics of secondary concern. Creating articles requires a great deal of time spent collecting information and editing. This thesis presents a solution. The proposed algorithm creates a new article by querying the Internet, selecting relevant excerpts from the search results, and synthesizing the best excerpts into a coherent document. This work builds on previous work in document summarization, web question answering, and Integer Linear Programming. At the core of our approach is a method for using existing human-authored Wikipedia articles to learn a content selection mechanism. Articles in the same category often present similar types of information; we can leverage this to create content templates for new articles. Once a template has been created, we use classification and clustering techniques to select a single best excerpt for each section. Finally, we use Integer Linear Programming techniques to eliminate any redundancy over the complete article. We evaluate our system for both individual sections and complete articles, using both human and automatic evaluation methods. The results indicate that articles created by our system are close to human-authored Wikipedia entries in quality of content selection. We show that both human and automatic evaluation metrics are in agreement; therefore, automatic methods are a reasonable evaluation tool for this task. We also empirically demonstrate that explicit modeling of content structure is essential for improving the quality of an automatically-produced article. by Christina Sauper. S.M. 2009-10-01T15:47:27Z 2009-10-01T15:47:27Z 2009 2009 Thesis http://hdl.handle.net/1721.1/47824 429487065 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 84 leaves application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Sauper, Christina (Christina Joan)
Automated creation of Wikipedia articles
title Automated creation of Wikipedia articles
title_full Automated creation of Wikipedia articles
title_fullStr Automated creation of Wikipedia articles
title_full_unstemmed Automated creation of Wikipedia articles
title_short Automated creation of Wikipedia articles
title_sort automated creation of wikipedia articles
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/47824
work_keys_str_mv AT sauperchristinachristinajoan automatedcreationofwikipediaarticles