AnnoTool : crowdsourcing for natural language corpus creation

Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, June 2014.

Bibliographic Details
Main Author: Hayden, Katherine (Katherine Marie)
Other Authors: Catherine Havasi.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1721.1/91819
_version_ 1811087219786514432
author Hayden, Katherine (Katherine Marie)
author2 Catherine Havasi.
author_facet Catherine Havasi.
Hayden, Katherine (Katherine Marie)
author_sort Hayden, Katherine (Katherine Marie)
collection MIT
description Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, June 2014.
first_indexed 2024-09-23T13:41:58Z
format Thesis
id mit-1721.1/91819
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T13:41:58Z
publishDate 2014
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/918192022-01-18T16:16:47Z AnnoTool : crowdsourcing for natural language corpus creation Anno Tool : crowdsourcing for natural language corpus creation Crowdsourcing for natural language corpus creation Hayden, Katherine (Katherine Marie) Catherine Havasi. Massachusetts Institute of Technology. Department of Architecture. Program in Media Arts and Sciences. Program in Media Arts and Sciences (Massachusetts Institute of Technology) Architecture. Program in Media Arts and Sciences. Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, June 2014. 72 "September 2013." Cataloged from PDF version of thesis. Includes bibliographical references (pages [51]-[54]). This thesis explores the extent to which untrained annotators can create annotated corpora of scientific texts. Currently the variety and quantity of annotated corpora are limited by the expense of hiring or training annotators. The expense for finding and hiring professionals increases as the task becomes more esoteric or requiring of a specialized skill set. Training annotators is an investment in itself, often difficult to justify. Undergraduate students or volunteers may not remain with a project for long enough after being trained and graduate students' time may already be prioritized for other research goals. As the demand increases for computer programs capable of interacting with users through natural language, producing annotated datasets with which to train these programs is becoming increasingly important. This thesis presents an approach combining crowdsourcing with Luis von Ahn's "games with a purpose " paradigm. Crowdsourcing combines contributions from many participants in an online community. Games with a purpose incentivize voluntary contributions by providing an avenue for a task people are already incentivized to do, and collect data in the background. Here the desired data are annotations and the target community people annotating text for professional or personal benefit, such as scientists, researchers or the general public with an interest in science. An annotation tool was designed in the form of a Google Chrome extension specifically built to work with articles from the open-access, online scientific journal Public Library of Science (PLOS) ONE. A study was designed where participants with no prior annotator training were given a brief introduction to the annotation tool and assigned to annotate three articles. The results of the study demonstrate considerable annotator agreement. The results of this thesis demonstrate that crowdsourcing annotations is feasible even for technically sophisticated texts and presents a model of a platform that continuously gathers annotated corpora. by Katherine Hayden. S.M. 2014-11-24T18:37:07Z 2014-11-24T18:37:07Z 2013 2014 Thesis http://hdl.handle.net/1721.1/91819 894221970 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 49, 5 pages application/pdf Massachusetts Institute of Technology
spellingShingle Architecture. Program in Media Arts and Sciences.
Hayden, Katherine (Katherine Marie)
AnnoTool : crowdsourcing for natural language corpus creation
title AnnoTool : crowdsourcing for natural language corpus creation
title_full AnnoTool : crowdsourcing for natural language corpus creation
title_fullStr AnnoTool : crowdsourcing for natural language corpus creation
title_full_unstemmed AnnoTool : crowdsourcing for natural language corpus creation
title_short AnnoTool : crowdsourcing for natural language corpus creation
title_sort annotool crowdsourcing for natural language corpus creation
topic Architecture. Program in Media Arts and Sciences.
url http://hdl.handle.net/1721.1/91819
work_keys_str_mv AT haydenkatherinekatherinemarie annotoolcrowdsourcingfornaturallanguagecorpuscreation
AT haydenkatherinekatherinemarie crowdsourcingfornaturallanguagecorpuscreation