Language learning at scale: Data‐driven and model‐motivated analyses of lexical and morphological development

Studying how children learn language gives complementary insights into both children and language. Given that languages have the properties that they do, what does children’s ability to learn them tell us about cognitive development? And given that children have the cognitive capacities that they do...

Full description

Bibliographic Details
Main Author: Braginsky, Mika
Other Authors: Gibson, Edward
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/143365
Description
Summary:Studying how children learn language gives complementary insights into both children and language. Given that languages have the properties that they do, what does children’s ability to learn them tell us about cognitive development? And given that children have the cognitive capacities that they do, what does the learnability of language tell us about its properties? I present a series of large‐scale investigations of language learning, using dense datasets and computational models to support generalization across children, over development, among languages, and while distinguishing among theories. In the initial investigation (Chapter 2), we ask: what makes words harder or easier to learn, and what does that reveal about word learning mechanisms? We examined the factors that contribute to individual words’ learning trajectories, and the consistency of those factors across languages. This work establishes a framework for conducting analyses of large‐scale language learning data in a way that brings together disparate data sources, generalizes across languages, and tracks change over development. For subsequent investigations (Chapter 3‐4), I focus on the study of morphology learning, from various directions. Learning morphology is a particularly interesting problem because it straight‐ forwardly involves both memorizing specific forms and generalizing beyond direct experience. It is thus a fruitful case study for the interplay between mechanisms of memory and inference, which is fundamental both in language and in cognition more generally. Chapter 3 applies the datasets and approaches developed in Chapter 1 to the domain of morphology, investigating how morphology learning relates to age, vocabulary development, and phono‐ logical structure. Chapter 4 delves further into morphology learning by applying the abstraction and rigor of computational modeling. Finally, I propose a series of studies to use a novel data col‐ lection method to create a dense dataset on morphological development and evaluate theories of morphological development. Taken together, these studies synthesize empirical and computational methods to investigate multiple domains of language development at scale. Focusing on lexical and morphological development, they clarify and enhance the empirical and theoretical landscape of language learning.