Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?
AbstractThis work presents a linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times. First, regression analyses show a strictly m...
Main Authors: | Byung-Doh Oh, William Schuler |
---|---|
Format: | Article |
Language: | English |
Published: |
The MIT Press
2023-01-01
|
Series: | Transactions of the Association for Computational Linguistics |
Online Access: | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00548/115371/Why-Does-Surprisal-From-Larger-Transformer-Based |
Similar Items
-
Comparison of Structural Parsers and Neural Language Models as Surprisal Estimators
by: Byung-Doh Oh, et al.
Published: (2022-03-01) -
Speaker Input Variability Does Not Explain Why Larger Populations Have Simpler Languages.
by: Mark Atkinson, et al.
Published: (2015-01-01) -
Why Does the NOTION Trial Show Poorer than Expected Outcomes in the Surgical Arm?
by: Stefano Urso, et al.
Published: (2022-01-01) -
Is larger eccentric utilization ratio associated with poorer rate of force development in squat jump? An exploratory study
by: Žiga Kozinc, et al.
Published: (2024-12-01) -
Surprise! Why Insightful Solution Is Pleasurable
by: Anna Savinova, et al.
Published: (2022-11-01)