Cloze Distillation: Improving Neural Language Models with Human Next-Word Prediction
Contemporary autoregressive language models (LMs) trained purely on corpus data have been shown to capture numerous features of human incremental processing. However, past work has also suggested dissociations between corpus probabilities and human next-word predictions. Here we evaluate several...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Association for Computational Linguistics (ACL)
2021
|
Online Access: | https://hdl.handle.net/1721.1/138277 |
Summary: | Contemporary autoregressive language models (LMs) trained purely on corpus data have
been shown to capture numerous features of
human incremental processing. However, past
work has also suggested dissociations between
corpus probabilities and human next-word predictions. Here we evaluate several state-of-theart language models for their match to human
next-word predictions and to reading time behavior from eye movements. We then propose
a novel method for distilling the linguistic information implicit in human linguistic predictions into pre-trained LMs: Cloze Distillation.
We apply this method to a baseline neural LM
and show potential improvement in reading
time prediction and generalization to held-out
human cloze data. |
---|