Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction

Contemporary autoregressive language models (LMs) trained purely on corpus data have been shown to capture numerous features of human incremental processing. However, past work has also suggested dissociations between corpus probabilities and human next-word predictions. Here we evaluate several...

Full description

Bibliographic Details
Main Authors: Eisape, Tiwalayo, Zaslavsky, Noga, Levy, Roger
Format: Article
Language:English
Published: Association for Computational Linguistics (ACL) 2021
Online Access:https://hdl.handle.net/1721.1/138277
Description
Summary:Contemporary autoregressive language models (LMs) trained purely on corpus data have been shown to capture numerous features of human incremental processing. However, past work has also suggested dissociations between corpus probabilities and human next-word predictions. Here we evaluate several state-of-theart language models for their match to human next-word predictions and to reading time behavior from eye movements. We then propose a novel method for distilling the linguistic information implicit in human linguistic predictions into pre-trained LMs: Cloze Distillation. We apply this method to a baseline neural LM and show potential improvement in reading time prediction and generalization to held-out human cloze data.