Cause, Composition, and Structure in Language

From everyday communication to exploring new thoughts through writing, humans use language in a remarkably flexible, robust, and creative way. In this thesis, I present three case studies supporting the overarching hypothesis that linguistic knowledge in the human mind can be understood as hierarchi...

Full description

Bibliographic Details
Main Author:	Qian, Peng
Other Authors:	Levy, Roger P.
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/145598 https://orcid.org/ 0000-0002-6916-3057

_version_	1826210940019802112
author	Qian, Peng
author2	Levy, Roger P.
author_facet	Levy, Roger P. Qian, Peng
author_sort	Qian, Peng
collection	MIT
description	From everyday communication to exploring new thoughts through writing, humans use language in a remarkably flexible, robust, and creative way. In this thesis, I present three case studies supporting the overarching hypothesis that linguistic knowledge in the human mind can be understood as hierarchically-structured causal generative models, within which a repertoire of compositional inference motifs support efficient inference. I begin with a targeted case study showing how native speakers follow principles of noisy-channel inference in resolving subject-verb agreement mismatches such as "The gift for the kids are hidden under the bed". Results suggest that native-speakers' inferences reflect both prior expectations and structure-sensitive conditioning of error probabilities consistent with the statistics of the language production environment. Second, I develop a more open-ended inferential challenge, completing fragmentary linguistic inputs such as "____ published won ____." into well-formed sentences. I use large-scale neural language models to compare two classes of models on this task: the task-specific fine-tuning approach standard in AI and NLP, versus an inferential approach involving composition of two simple computational motifs; the inferential approach yields more human-like completions. Third, I show that incorporating hierarchical linguistic structure into one of these computational motifs, namely the auto-regressive word prediction task, yields improvements in neural language model performance on targeted evaluations of models’ grammatical capabilities. I conclude by suggesting future directions in understanding the form and content of these causal generative models of human language.
first_indexed	2024-09-23T14:57:50Z
format	Thesis
id	mit-1721.1/145598
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T14:57:50Z
publishDate	2022
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1455982022-10-04T03:10:31Z Cause, Composition, and Structure in Language Qian, Peng Levy, Roger P. Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences From everyday communication to exploring new thoughts through writing, humans use language in a remarkably flexible, robust, and creative way. In this thesis, I present three case studies supporting the overarching hypothesis that linguistic knowledge in the human mind can be understood as hierarchically-structured causal generative models, within which a repertoire of compositional inference motifs support efficient inference. I begin with a targeted case study showing how native speakers follow principles of noisy-channel inference in resolving subject-verb agreement mismatches such as "The gift for the kids are hidden under the bed". Results suggest that native-speakers' inferences reflect both prior expectations and structure-sensitive conditioning of error probabilities consistent with the statistics of the language production environment. Second, I develop a more open-ended inferential challenge, completing fragmentary linguistic inputs such as "____ published won ____." into well-formed sentences. I use large-scale neural language models to compare two classes of models on this task: the task-specific fine-tuning approach standard in AI and NLP, versus an inferential approach involving composition of two simple computational motifs; the inferential approach yields more human-like completions. Third, I show that incorporating hierarchical linguistic structure into one of these computational motifs, namely the auto-regressive word prediction task, yields improvements in neural language model performance on targeted evaluations of models’ grammatical capabilities. I conclude by suggesting future directions in understanding the form and content of these causal generative models of human language. Ph.D. 2022-09-27T20:16:11Z 2022-09-27T20:16:11Z 2022-05 2022-09-27T16:57:59.169Z Thesis https://hdl.handle.net/1721.1/145598 https://orcid.org/ 0000-0002-6916-3057 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Qian, Peng Cause, Composition, and Structure in Language
title	Cause, Composition, and Structure in Language
title_full	Cause, Composition, and Structure in Language
title_fullStr	Cause, Composition, and Structure in Language
title_full_unstemmed	Cause, Composition, and Structure in Language
title_short	Cause, Composition, and Structure in Language
title_sort	cause composition and structure in language
url	https://hdl.handle.net/1721.1/145598 https://orcid.org/ 0000-0002-6916-3057
work_keys_str_mv	AT qianpeng causecompositionandstructureinlanguage

Cause, Composition, and Structure in Language

Similar Items