Automatically learning optimal formula simplifiers and database entity matching rules

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.

Bibliographic Details
Main Author: Singh, Rohit, Ph. D. Massachusetts Institute of Technology
Other Authors: Armando Solar-Lezama.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2018
Subjects:
Online Access:http://hdl.handle.net/1721.1/113938
_version_ 1826202187688050688
author Singh, Rohit, Ph. D. Massachusetts Institute of Technology
author2 Armando Solar-Lezama.
author_facet Armando Solar-Lezama.
Singh, Rohit, Ph. D. Massachusetts Institute of Technology
author_sort Singh, Rohit, Ph. D. Massachusetts Institute of Technology
collection MIT
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
first_indexed 2024-09-23T12:03:35Z
format Thesis
id mit-1721.1/113938
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T12:03:35Z
publishDate 2018
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1139382019-04-12T07:40:19Z Automatically learning optimal formula simplifiers and database entity matching rules Singh, Rohit, Ph. D. Massachusetts Institute of Technology Armando Solar-Lezama. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 153-161). Traditionally, machine learning (ML) is used to find a function from data to optimize a numerical score. On the other hand, synthesis is traditionally used to find a function (or a program) that can be derived from a grammar and satisfies a logical specification. The boundary between ML and synthesis has been blurred by some recent work [56,90]. However, this interaction between ML and synthesis has not been fully explored. In this thesis, we focus on the problem of finding a function given large amounts of data such that the function satisfies a logical specification and also optimizes a numerical score over the input data. We present a framework to solve this problem in two impactful application domains: formula simplification in constraint solvers and database entity matching (EM). First, we present a system called Swapper based on our framework that can automatically generate code for efficient formula simplifiers specialized to a class of problems. Formula simplification is an important part of modern constraint solvers, and writing efficient simplifiers has largely been an arduous manual task. Evaluation of Swapper on multiple applications of the Sketch constraint solver showed 15-60% improvement over the existing hand-crafted simplifier in Sketch. Second, we present a system called EM-Synth based on our framework that generates as effective and more interpretable EM rules than the state-of-the-art techniques. Database entity matching is a critical part of data integration and cleaning, and it usually involves learning rules or classifiers from labeled examples. Evaluation of EM-Synth on multiple real-world datasets against other interpretable (shallow decision trees, SIFI [116]) and noninterpretable (SVM, deep decision trees) methods showed that EM-Synth generates more concise and interpretable rules without sacrificing too much accuracy. by Rohit Singh. Ph. D. 2018-03-02T21:40:03Z 2018-03-02T21:40:03Z 2017 2017 Thesis http://hdl.handle.net/1721.1/113938 1023862454 eng MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582 161 pages application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Singh, Rohit, Ph. D. Massachusetts Institute of Technology
Automatically learning optimal formula simplifiers and database entity matching rules
title Automatically learning optimal formula simplifiers and database entity matching rules
title_full Automatically learning optimal formula simplifiers and database entity matching rules
title_fullStr Automatically learning optimal formula simplifiers and database entity matching rules
title_full_unstemmed Automatically learning optimal formula simplifiers and database entity matching rules
title_short Automatically learning optimal formula simplifiers and database entity matching rules
title_sort automatically learning optimal formula simplifiers and database entity matching rules
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/113938
work_keys_str_mv AT singhrohitphdmassachusettsinstituteoftechnology automaticallylearningoptimalformulasimplifiersanddatabaseentitymatchingrules