Extracting fields from free-text

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

Bibliographic Details
Main Author:	Cattori, Pedro
Other Authors:	Samuel Madden.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2016
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/106077

_version_	1826189211357675520
author	Cattori, Pedro
author2	Samuel Madden.
author_facet	Samuel Madden. Cattori, Pedro
author_sort	Cattori, Pedro
collection	MIT
description	Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
first_indexed	2024-09-23T08:11:16Z
format	Thesis
id	mit-1721.1/106077
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T08:11:16Z
publishDate	2016
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1060772019-04-09T17:13:10Z Extracting fields from free-text Cattori, Pedro Samuel Madden. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. Cataloged from PDF version of thesis. Includes bibliographical references (pages 86-87). The Field Extraction Library (FEL) provides functions for named-entity extraction within free text. FEL models the content structure of the specified named-entities rather than relying on brittle, context-specific separator logic. Users specify the names of the fields they wish to extract, which determine the number of states for an underlying Hidden Markov Model. The observable emission set is pre-determined by FEL's tokenizer. Once the model topology is set, users provide training examples of the form: x = raw text, y {fieldl: val1, field2:val2, ... } FEL learns the parameters of the underlying Hidden Markov Model by maximum likelihood model-estimation on the training examples. FEL is designed to operate on small, sparse training data. As a result, users can provide few (less than 10) training examples to bootstrap the model. FEL offers 3 iterative mechanisms for scaling data quality as users provide guidance through additional feedback: (1) accept more training examples, (2) create landmark states, and (3) bridge related states with state bridges. FEL detects ambiguities both in its internal model and in the extraction results to prompt users for more feedback. Once the model yields acceptable result quality, users can extract fields into a table for easy querying and exporting. by Pedro Cattori. M. Eng. 2016-12-22T16:28:01Z 2016-12-22T16:28:01Z 2016 2016 Thesis http://hdl.handle.net/1721.1/106077 965198310 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 87 pages application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Cattori, Pedro Extracting fields from free-text
title	Extracting fields from free-text
title_full	Extracting fields from free-text
title_fullStr	Extracting fields from free-text
title_full_unstemmed	Extracting fields from free-text
title_short	Extracting fields from free-text
title_sort	extracting fields from free text
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/106077
work_keys_str_mv	AT cattoripedro extractingfieldsfromfreetext

Extracting fields from free-text

Similar Items