Non-parametric deep learning with applications in active learning

<p>We challenge a common assumption underlying most supervised <i>deep learning</i>: that a model makes a prediction depending only on its parameters and the features of a <i>single input</i>. To this end, we introduce a general-purpose deep learning architecture---<...

Full description

Bibliographic Details
Main Author: Band, N
Other Authors: Kalaitzis, A
Format: Thesis
Language:English
Published: 2022
Subjects:
Description
Summary:<p>We challenge a common assumption underlying most supervised <i>deep learning</i>: that a model makes a prediction depending only on its parameters and the features of a <i>single input</i>. To this end, we introduce a general-purpose deep learning architecture---<strong>Non-Parametric Transformers (NPTs)</strong>---that takes as input the <i>entire dataset</i> instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, NPTs solve complex reasoning tasks unsolvable by traditional deep learning models and make use of interactions between datapoints in achieving highly competitive results on tabular data.</p> <p>We then turn our attention to <i>in-context learning</i>, a non-parametric method for few-shot learning recently popularized as language model (LM) <i>prompting</i>, which conditions predictions directly on a context set of examples. Combining in-context learning with information-based active learning, we introduce <strong>Active In-Context Learning</strong>, an algorithm for designing informative context sets which minimizes labeling costs and avoids the retraining normally necessary in active learning. Next, we demonstrate that our method can <i>tractably</i> compute two mutual information (MI)--based acquisition functions previously deemed intractable or approximated using variational inference. Lastly, we consider active learning for downstream tasks far from LM pretraining distributions. We develop <strong>Meta-NPTs</strong>, an in-context learner <i>meta-trained</i> on a distribution of tasks related to the downstream task of interest. Meta-NPTs implicitly ensemble predictions over a <i>task posterior</i>, quantifying uncertainty over plausible task identities. In early results, Active In-Context Learning improves the few-shot learning performance of OPT-30B on BoolQ, Meta-NPTs leverage uncertainty over tasks to perform Active In-Context Learning, and our MI-based acquisition functions often outperform the established predictive entropy function.</p>