One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data i...

Full description

Bibliographic Details
Main Author:	Min, Youngjae
Other Authors:	Azizan, Navid
Format:	Thesis
Published:	Massachusetts Institute of Technology 2025
Online Access:	https://hdl.handle.net/1721.1/158204 https://orcid.org/0000-0002-3737-1206

_version_	1824458092496027648
author	Min, Youngjae
author2	Azizan, Navid
author_facet	Azizan, Navid Min, Youngjae
author_sort	Min, Youngjae
collection	MIT
description	While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this thesis, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines.
first_indexed	2025-02-19T04:20:24Z
format	Thesis
id	mit-1721.1/158204
institution	Massachusetts Institute of Technology
last_indexed	2025-02-19T04:20:24Z
publishDate	2025
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1582042025-02-13T19:04:19Z One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares Min, Youngjae Azizan, Navid Massachusetts Institute of Technology. Department of Aeronautics and Astronautics While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this thesis, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines. S.M. 2025-02-13T19:03:46Z 2025-02-13T19:03:46Z 2023-06 2025-02-06T13:35:36.888Z Thesis https://hdl.handle.net/1721.1/158204 https://orcid.org/0000-0002-3737-1206 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Min, Youngjae One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
title	One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
title_full	One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
title_fullStr	One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
title_full_unstemmed	One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
title_short	One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
title_sort	one pass learning via bridging orthogonal gradient descent and recursive least squares
url	https://hdl.handle.net/1721.1/158204 https://orcid.org/0000-0002-3737-1206
work_keys_str_mv	AT minyoungjae onepasslearningviabridgingorthogonalgradientdescentandrecursiveleastsquares

One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

Similar Items