An affordance-inspired tool for automated web page labeling and classification

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.

Bibliographic Details
Main Author: Sittig, Karen Anne
Other Authors: Catherine Havasi and Kevin C. Gold.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2014
Subjects:
Online Access:http://hdl.handle.net/1721.1/85500
_version_ 1826201215119130624
author Sittig, Karen Anne
author2 Catherine Havasi and Kevin C. Gold.
author_facet Catherine Havasi and Kevin C. Gold.
Sittig, Karen Anne
author_sort Sittig, Karen Anne
collection MIT
description Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.
first_indexed 2024-09-23T11:48:20Z
format Thesis
id mit-1721.1/85500
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T11:48:20Z
publishDate 2014
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/855002019-04-10T15:42:31Z An affordance-inspired tool for automated web page labeling and classification Automated web navigation via affordance learning Sittig, Karen Anne Catherine Havasi and Kevin C. Gold. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013. Cataloged from PDF version of thesis. Includes bibliographical references (pages 59-60). Writing programs that are capable of completing complex tasks on web pages is difficult due to the inconsistent nature of the pages themselves. While there exist best practices for developing naming schemes for page elements, these schemes are not strictly enforced, making it difficult to develop a general-use automated system. Many pages must be hand-labeled if they are to be incorporated into an automated testing framework. In this thesis, I build an application that assists human users in classifying and labeling web pages. This system uses a gradient boosting classifier from the scikit-learn Python package to identify which of four tasks may be performed on a given web page. It also attempts to automatically label the input fields and buttons on the web page using a gradient boosting classifier. It outputs its results in a format that can be easily consumed by the LARIAT system at MIT Lincoln Laboratory, greatly reducing the human labor required to incorporate new web pages into the system. by Karen Anne Sittig. M. Eng. 2014-03-06T15:46:31Z 2014-03-06T15:46:31Z 2013 2013 Thesis http://hdl.handle.net/1721.1/85500 871002879 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 60 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Sittig, Karen Anne
An affordance-inspired tool for automated web page labeling and classification
title An affordance-inspired tool for automated web page labeling and classification
title_full An affordance-inspired tool for automated web page labeling and classification
title_fullStr An affordance-inspired tool for automated web page labeling and classification
title_full_unstemmed An affordance-inspired tool for automated web page labeling and classification
title_short An affordance-inspired tool for automated web page labeling and classification
title_sort affordance inspired tool for automated web page labeling and classification
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/85500
work_keys_str_mv AT sittigkarenanne anaffordanceinspiredtoolforautomatedwebpagelabelingandclassification
AT sittigkarenanne automatedwebnavigationviaaffordancelearning
AT sittigkarenanne affordanceinspiredtoolforautomatedwebpagelabelingandclassification