A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2007.

Bibliographic Details
Main Author: Tsou, Ching-Huei, 1973-
Other Authors: John R. Williams.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2008
Subjects:
Online Access:http://dspace.mit.edu/handle/1721.1/38578
http://hdl.handle.net/1721.1/38578
_version_ 1811097222075383808
author Tsou, Ching-Huei, 1973-
author2 John R. Williams.
author_facet John R. Williams.
Tsou, Ching-Huei, 1973-
author_sort Tsou, Ching-Huei, 1973-
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2007.
first_indexed 2024-09-23T16:56:14Z
format Thesis
id mit-1721.1/38578
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T16:56:14Z
publishDate 2008
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/385782019-04-10T16:58:16Z A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications Tsou, Ching-Huei, 1973- John R. Williams. Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering. Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering. Civil and Environmental Engineering. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2007. Includes bibliographical references (p. 102-104). A machine learning framework is presented that supports data mining and statistical modeling of systems that are monitored by large-scale sensor networks. The proposed algorithm is novel in that it takes both observations and domain knowledge into consideration and provides a mechanism that combines analytical modeling and inductive learning. An efficient solver is presented that allow the algorithm to solve large-scale problems efficiently. The solver uses a randomized kernel that incorporates domain knowledge into support vector machine learning. It also takes advantage of the sparseness of support vectors and this allows for parallelization and online training to further speed-up of the computation. The solver can be integrated into existing systems, embedded into databases, or exposed as a web service. Understanding the data generated by large-scale system presents several problems. First, statistical modeling approaches may either under-fit or over-fit the data and are sensitive to data quality. Second, learning is a computational extensive process and often becomes intractable when the sample size exceeds several thousands. (cont.) Third, learning algorithms need to be tuned to the specific problem in most engineering and business fields. Last but not least, a flexible learning framework is also not available. This work addresses these problems by presenting a methodology that combines machine learning with domain knowledge, and an efficient framework that supports the algorithm. Benchmark and practical engineering problems are used to validate the methodology. by Ching-Huei Tsou. Ph.D. 2008-11-10T19:49:09Z 2008-11-10T19:49:09Z 2007 2007 Thesis http://dspace.mit.edu/handle/1721.1/38578 http://hdl.handle.net/1721.1/38578 156281156 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/38578 http://dspace.mit.edu/handle/1721.1/7582 104 p. application/pdf Massachusetts Institute of Technology
spellingShingle Civil and Environmental Engineering.
Tsou, Ching-Huei, 1973-
A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications
title A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications
title_full A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications
title_fullStr A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications
title_full_unstemmed A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications
title_short A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications
title_sort statistical learning framework for data mining of large scale systems algorithms implementation and applications
topic Civil and Environmental Engineering.
url http://dspace.mit.edu/handle/1721.1/38578
http://hdl.handle.net/1721.1/38578
work_keys_str_mv AT tsouchinghuei1973 astatisticallearningframeworkfordataminingoflargescalesystemsalgorithmsimplementationandapplications
AT tsouchinghuei1973 statisticallearningframeworkfordataminingoflargescalesystemsalgorithmsimplementationandapplications