Applications of empirical processes in learning theory : algorithmic stability and generalization bounds

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2006.

Bibliographic Details
Main Author: Rakhlin, Alexander
Other Authors: Tomaso Poggio.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2006
Subjects:
Online Access:http://hdl.handle.net/1721.1/34564
_version_ 1826191611204206592
author Rakhlin, Alexander
author2 Tomaso Poggio.
author_facet Tomaso Poggio.
Rakhlin, Alexander
author_sort Rakhlin, Alexander
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2006.
first_indexed 2024-09-23T08:58:37Z
format Thesis
id mit-1721.1/34564
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T08:58:37Z
publishDate 2006
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/345642019-04-10T12:27:41Z Applications of empirical processes in learning theory : algorithmic stability and generalization bounds Rakhlin, Alexander Tomaso Poggio. Massachusetts Institute of Technology. Dept. of Brain and Cognitive Sciences. Massachusetts Institute of Technology. Dept. of Brain and Cognitive Sciences. Brain and Cognitive Sciences. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2006. Includes bibliographical references (p. 141-148). This thesis studies two key properties of learning algorithms: their generalization ability and their stability with respect to perturbations. To analyze these properties, we focus on concentration inequalities and tools from empirical process theory. We obtain theoretical results and demonstrate their applications to machine learning. First, we show how various notions of stability upper- and lower-bound the bias and variance of several estimators of the expected performance for general learning algorithms. A weak stability condition is shown to be equivalent to consistency of empirical risk minimization. The second part of the thesis derives tight performance guarantees for greedy error minimization methods - a family of computationally tractable algorithms. In particular, we derive risk bounds for a greedy mixture density estimation procedure. We prove that, unlike what is suggested in the literature, the number of terms in the mixture is not a bias-variance trade-off for the performance. The third part of this thesis provides a solution to an open problem regarding the stability of Empirical Risk Minimization (ERM). This algorithm is of central importance in Learning Theory. (cont.) By studying the suprema of the empirical process, we prove that ERM over Donsker classes of functions is stable in the L1 norm. Hence, as the number of samples grows, it becomes less and less likely that a perturbation of o(v/n) samples will result in a very different empirical minimizer. Asymptotic rates of this stability are proved under metric entropy assumptions on the function class. Through the use of a ratio limit inequality, we also prove stability of expected errors of empirical minimizers. Next, we investigate applications of the stability result. In particular, we focus on procedures that optimize an objective function, such as k-means and other clustering methods. We demonstrate that stability of clustering, just like stability of ERM, is closely related to the geometry of the class and the underlying measure. Furthermore, our result on stability of ERM delineates a phase transition between stability and instability of clustering methods. In the last chapter, we prove a generalization of the bounded-difference concentration inequality for almost-everywhere smooth functions. This result can be utilized to analyze algorithms which are almost always stable. Next, we prove a phase transition in the concentration of almost-everywhere smooth functions. Finally, a tight concentration of empirical errors of empirical minimizers is shown under an assumption on the underlying space. by Alexander Rakhlin. Ph.D. 2006-11-07T12:58:36Z 2006-11-07T12:58:36Z 2006 2006 Thesis http://hdl.handle.net/1721.1/34564 71152955 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 148 p. 5578123 bytes 5584307 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle Brain and Cognitive Sciences.
Rakhlin, Alexander
Applications of empirical processes in learning theory : algorithmic stability and generalization bounds
title Applications of empirical processes in learning theory : algorithmic stability and generalization bounds
title_full Applications of empirical processes in learning theory : algorithmic stability and generalization bounds
title_fullStr Applications of empirical processes in learning theory : algorithmic stability and generalization bounds
title_full_unstemmed Applications of empirical processes in learning theory : algorithmic stability and generalization bounds
title_short Applications of empirical processes in learning theory : algorithmic stability and generalization bounds
title_sort applications of empirical processes in learning theory algorithmic stability and generalization bounds
topic Brain and Cognitive Sciences.
url http://hdl.handle.net/1721.1/34564
work_keys_str_mv AT rakhlinalexander applicationsofempiricalprocessesinlearningtheoryalgorithmicstabilityandgeneralizationbounds