Measuring and Manipulating State Representations in Neural Language Models

Modern neural language models (LMs) are typically pre-trained with a self-supervised objective: they are presented with texts that have piece(s) withheld, and asked to generate the withheld portions of the text. By simply scaling up such training, LMs have been able to achieve remarkable performance...

Full description

Bibliographic Details
Main Author: Li, Belinda Zou
Other Authors: Andreas, Jacob
Format: Thesis
Published: Massachusetts Institute of Technology 2023
Online Access:https://hdl.handle.net/1721.1/150114
Description
Summary:Modern neural language models (LMs) are typically pre-trained with a self-supervised objective: they are presented with texts that have piece(s) withheld, and asked to generate the withheld portions of the text. By simply scaling up such training, LMs have been able to achieve remarkable performance on many language reasoning benchmarks. However, sentences generated by LMs often still suffer from coherence errors: they describe events and situations inconsistent with the state of the world described by preceding text. One account of the successes and failures of LM generation states that LMs are simply modeling surface word co-occurrence statistics. However, we provide evidence for an alternative account (not mutually exclusive with the first): LMs represent and reason about the world they describe. In BART and T5 transformer LMs, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a linear readout of each entity’s current properties and relations, and can be manipulated with predictable effects on language generation. Our results indicate that prediction in pretrained LMs is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data. Consequently, when LMs fail generate coherent text, such failure can be attributable to either errors in inferring state from context or errors in generating next sentences consistent with the inferred state. We describe a procedure for distinguishing these two types of errors. In models with correctable errors of the first type, we show that targeted supervision can address them. We introduce two procedures for using explicit representations of world state as auxiliary supervision. These procedures efficiently improve LM coherence, in some cases providing the benefits of 1,000–9,000 training examples with only 500 state annotations.