The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model

While many studies have shown that toddlers are able to detect syntactic regularities in speech, the learning mechanism allowing them to do this is still largely unclear. In this article, we use computational modeling to assess the plausibility of a context-based learning mechanism for the acquisiti...

Full description

Bibliographic Details
Main Authors:	Perrine Brusini, Olga Seminck, Pascal Amsili, Anne Christophe
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2021-08-01
Series:	Frontiers in Psychology
Subjects:	language development acquisition of syntax computational modeling semantic seed noun verb
Online Access:	https://www.frontiersin.org/articles/10.3389/fpsyg.2021.661479/full

_version_	1819084944532570112
author	Perrine Brusini Perrine Brusini Olga Seminck Pascal Amsili Anne Christophe
author_facet	Perrine Brusini Perrine Brusini Olga Seminck Pascal Amsili Anne Christophe
author_sort	Perrine Brusini
collection	DOAJ
description	While many studies have shown that toddlers are able to detect syntactic regularities in speech, the learning mechanism allowing them to do this is still largely unclear. In this article, we use computational modeling to assess the plausibility of a context-based learning mechanism for the acquisition of nouns and verbs. We hypothesize that infants can assign basic semantic features, such as “is-an-object” and/or “is-an-action,” to the very first words they learn, then use these words, the semantic seed, to ground proto-categories of nouns and verbs. The contexts in which these words occur, would then be exploited to bootstrap the noun and verb categories: unknown words are attributed to the class that has been observed most frequently in the corresponding context. To test our hypothesis, we designed a series of computational experiments which used French corpora of child-directed speech and different sizes of semantic seed. We partitioned these corpora in training and test sets: the model extracted the two-word contexts of the seed from the training sets, then used them to predict the syntactic category of content words from the test sets. This very simple algorithm demonstrated to be highly efficient in a categorization task: even the smallest semantic seed (only 8 nouns and 1 verb known) yields a very high precision (~90% of new nouns; ~80% of new verbs). Recall, in contrast, was low for small seeds, and increased with the seed size. Interestingly, we observed that the contexts used most often by the model featured function words, which is in line with what we know about infants' language development. Crucially, for the learning method we evaluated here, all initialization hypotheses are plausible and fit the developmental literature (semantic seed and ability to analyse contexts). While this experiment cannot prove that this learning mechanism is indeed used by infants, it demonstrates the feasibility of a realistic learning hypothesis, by using an algorithm that relies on very little computational and memory resources. Altogether, this supports the idea that a probabilistic, context-based mechanism can be very efficient for the acquisition of syntactic categories in infants.
first_indexed	2024-12-21T20:56:30Z
format	Article
id	doaj.art-e213f490862642b3bfb7e055f1101e0a
institution	Directory Open Access Journal
issn	1664-1078
language	English
last_indexed	2024-12-21T20:56:30Z
publishDate	2021-08-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Psychology
spelling	doaj.art-e213f490862642b3bfb7e055f1101e0a2022-12-21T18:50:34ZengFrontiers Media S.A.Frontiers in Psychology1664-10782021-08-011210.3389/fpsyg.2021.661479661479The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational ModelPerrine Brusini0Perrine Brusini1Olga Seminck2Pascal Amsili3Anne Christophe4Department of Psychological Sciences, University of Liverpool, Liverpool, United KingdomLaboratoire de Sciences Cognitives et Psycholinguistique, Centre National de la Recherche Scientifique, École Normale Supérieure/PSL University, Paris, FranceLaboratoire Langues, Textes, Traitements Informatiques, Cognition (Lattice), Centre National de la Recherche Scientifique, École Normale Supérieure/PSL University, Université Sorbonne Nouvelle, Paris, FranceLaboratoire Langues, Textes, Traitements Informatiques, Cognition (Lattice), Centre National de la Recherche Scientifique, École Normale Supérieure/PSL University, Université Sorbonne Nouvelle, Paris, FranceLaboratoire de Sciences Cognitives et Psycholinguistique, Centre National de la Recherche Scientifique, École Normale Supérieure/PSL University, Paris, FranceWhile many studies have shown that toddlers are able to detect syntactic regularities in speech, the learning mechanism allowing them to do this is still largely unclear. In this article, we use computational modeling to assess the plausibility of a context-based learning mechanism for the acquisition of nouns and verbs. We hypothesize that infants can assign basic semantic features, such as “is-an-object” and/or “is-an-action,” to the very first words they learn, then use these words, the semantic seed, to ground proto-categories of nouns and verbs. The contexts in which these words occur, would then be exploited to bootstrap the noun and verb categories: unknown words are attributed to the class that has been observed most frequently in the corresponding context. To test our hypothesis, we designed a series of computational experiments which used French corpora of child-directed speech and different sizes of semantic seed. We partitioned these corpora in training and test sets: the model extracted the two-word contexts of the seed from the training sets, then used them to predict the syntactic category of content words from the test sets. This very simple algorithm demonstrated to be highly efficient in a categorization task: even the smallest semantic seed (only 8 nouns and 1 verb known) yields a very high precision (~90% of new nouns; ~80% of new verbs). Recall, in contrast, was low for small seeds, and increased with the seed size. Interestingly, we observed that the contexts used most often by the model featured function words, which is in line with what we know about infants' language development. Crucially, for the learning method we evaluated here, all initialization hypotheses are plausible and fit the developmental literature (semantic seed and ability to analyse contexts). While this experiment cannot prove that this learning mechanism is indeed used by infants, it demonstrates the feasibility of a realistic learning hypothesis, by using an algorithm that relies on very little computational and memory resources. Altogether, this supports the idea that a probabilistic, context-based mechanism can be very efficient for the acquisition of syntactic categories in infants.https://www.frontiersin.org/articles/10.3389/fpsyg.2021.661479/fulllanguage developmentacquisition of syntaxcomputational modelingsemantic seednounverb
spellingShingle	Perrine Brusini Perrine Brusini Olga Seminck Pascal Amsili Anne Christophe The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model Frontiers in Psychology language development acquisition of syntax computational modeling semantic seed noun verb
title	The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model
title_full	The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model
title_fullStr	The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model
title_full_unstemmed	The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model
title_short	The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model
title_sort	acquisition of noun and verb categories by bootstrapping from a few known words a computational model
topic	language development acquisition of syntax computational modeling semantic seed noun verb
url	https://www.frontiersin.org/articles/10.3389/fpsyg.2021.661479/full
work_keys_str_mv	AT perrinebrusini theacquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT perrinebrusini theacquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT olgaseminck theacquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT pascalamsili theacquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT annechristophe theacquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT perrinebrusini acquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT perrinebrusini acquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT olgaseminck acquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT pascalamsili acquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel AT annechristophe acquisitionofnounandverbcategoriesbybootstrappingfromafewknownwordsacomputationalmodel

The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model

Similar Items