A regularization framework for active learning from imbalanced data

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.

Bibliographic Details
Main Author:	Paskov, Hristo Spassimirov
Other Authors:	Tomaso A. Poggio and Lorenzo A. Rosasco.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2011
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/61177

_version_	1826204428575703040
author	Paskov, Hristo Spassimirov
author2	Tomaso A. Poggio and Lorenzo A. Rosasco.
author_facet	Tomaso A. Poggio and Lorenzo A. Rosasco. Paskov, Hristo Spassimirov
author_sort	Paskov, Hristo Spassimirov
collection	MIT
description	Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
first_indexed	2024-09-23T12:54:55Z
format	Thesis
id	mit-1721.1/61177
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T12:54:55Z
publishDate	2011
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/611772019-04-12T16:10:21Z A regularization framework for active learning from imbalanced data Multiclass extensions of Regularized Least Squares Paskov, Hristo Spassimirov Tomaso A. Poggio and Lorenzo A. Rosasco. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010. Cataloged from PDF version of thesis. Includes bibliographical references (p. 81-83). We consider the problem of building a viable multiclass classification system that minimizes training data, is robust to noisy, imbalanced samples, and outputs confidence scores along with its predications. These goals address critical steps along the entire classification pipeline that pertain to collecting data, training, and classifying. To this end, we investigate the merits of a classification framework that uses a robust algorithm known as Regularized Least Squares (RLS) as its basic classifier. We extend RLS to account for data imbalances, perform efficient active learning, and output confidence scores. Each of these extensions is a new result that combines with our other findings to give an altogether novel and effective classification system. Our first set of results investigates various ways to handle multiclass data imbalances and ultimately leads to a derivation of a weighted version of RLS with and without an offset term. Weighting RLS provides an effective countermeasure to imbalanced data and facilitates the automatic selection of a regularization parameter through exact and efficient calculation of the Leave One Out error. Next, we present two methods that estimate multiclass confidence from an asymptotic analysis of RLS and another method that stems from a Bayesian interpretation of the classifier. We show that while the third method incorporates more information in its estimate, the asymptotic methods are more accurate and resilient to imperfect kernel and regularization parameter choices. Finally, we present an active learning extension of RLS (ARLS) that uses our weighting methods to overcome imbalanced data. ARLS is particularly adept to this task because of its intelligent selection scheme. by Hristo Spassimirov Paskov. M.Eng. 2011-02-23T14:24:51Z 2011-02-23T14:24:51Z 2010 2010 Thesis http://hdl.handle.net/1721.1/61177 699803074 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 83 p. application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Paskov, Hristo Spassimirov A regularization framework for active learning from imbalanced data
title	A regularization framework for active learning from imbalanced data
title_full	A regularization framework for active learning from imbalanced data
title_fullStr	A regularization framework for active learning from imbalanced data
title_full_unstemmed	A regularization framework for active learning from imbalanced data
title_short	A regularization framework for active learning from imbalanced data
title_sort	regularization framework for active learning from imbalanced data
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/61177
work_keys_str_mv	AT paskovhristospassimirov aregularizationframeworkforactivelearningfromimbalanceddata AT paskovhristospassimirov multiclassextensionsofregularizedleastsquares AT paskovhristospassimirov regularizationframeworkforactivelearningfromimbalanceddata

A regularization framework for active learning from imbalanced data

Similar Items