Representing and querying regression models in a relational database management system
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2008
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/42254 |
_version_ | 1826211622554697728 |
---|---|
author | Thiagarajan, Arvind |
author2 | Samuel Madden and Hari Balakrishnan. |
author_facet | Samuel Madden and Hari Balakrishnan. Thiagarajan, Arvind |
author_sort | Thiagarajan, Arvind |
collection | MIT |
description | Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. |
first_indexed | 2024-09-23T15:08:54Z |
format | Thesis |
id | mit-1721.1/42254 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T15:08:54Z |
publishDate | 2008 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/422542019-04-11T12:06:53Z Representing and querying regression models in a relational database management system Thiagarajan, Arvind Samuel Madden and Hari Balakrishnan. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007. Includes bibliographical references (p. 77-79). Curve fitting is a widely employed, useful modeling tool in several financial, scientific, engineering and data mining applications, and in applications like sensor networks that need to tolerate missing or noisy data. These applications need to both fit functions to their data using regression, and pose relational-style queries over regression models. Unfortunately, existing DBMSs are ill suited for this task because they do not include support for creating, representing and querying functional data, short of brute-force discretization of functions into a collection of tuples. This thesis describes FunctionDB, a novel DBMS that extends the state of the art. FunctionDB treats functions output by regression as first-class citizens that can be queried declaratively and manipulated like traditional database relations. The key contributions of FunctionDB are a compact, algebraic representation for regression models as piecewise functions, and an algebraic query processor that executes declarative queries directly on this representation as combinations of algebraic operations like function inversion, zero finding and symbolic integration. FunctionDB is evaluated on two real world data sets: measurements from a temperature sensor network, and traffic traces from cars driving on Boston roads. The results show that operating in the functional domain has substantial accuracy advantages (over 15% for some queries) and order of magnitude (10x-100x) performance gains over existing approaches that represent models as discrete collections of points. The thesis also describes an algorithm to maintain regression models online, as new raw data is inserted into the system. The algorithm supports a sustained insertion rate of the order of a million records per second, while generating models no less compact than a clairvoyant (offline) strategy. by Arvind Thiagarajan. S.M. 2008-09-03T15:05:17Z 2008-09-03T15:05:17Z 2007 2007 Thesis http://hdl.handle.net/1721.1/42254 231635736 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 79 p. application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Thiagarajan, Arvind Representing and querying regression models in a relational database management system |
title | Representing and querying regression models in a relational database management system |
title_full | Representing and querying regression models in a relational database management system |
title_fullStr | Representing and querying regression models in a relational database management system |
title_full_unstemmed | Representing and querying regression models in a relational database management system |
title_short | Representing and querying regression models in a relational database management system |
title_sort | representing and querying regression models in a relational database management system |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/42254 |
work_keys_str_mv | AT thiagarajanarvind representingandqueryingregressionmodelsinarelationaldatabasemanagementsystem |