A corpus-based developmental investigation of linguistic complexity in children's writing

Writing proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (N>100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed...

Full description

Bibliographic Details
Main Authors: Hsiao, Y, Dawson, NJ, Banerji, N, Nation, K
Format: Journal article
Language:English
Published: Elsevier 2024
_version_ 1826312172520603648
author Hsiao, Y
Dawson, NJ
Banerji, N
Nation, K
author_facet Hsiao, Y
Dawson, NJ
Banerji, N
Nation, K
author_sort Hsiao, Y
collection OXFORD
description Writing proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (N>100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed using both lexical (N = 30) and syntactic (N = 14) measures. Most measures were associated with age, with writing by older children showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50 % of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across the age range, there was wider variation in syntactic complexity than in lexical diversity, suggesting that syntactic development is subject to more individual differences than the ability to use a diverse set of lexical items. Our findings quantify the nature and content of children's writing through mid-childhood, and we discuss the utility of analysing children's writing using a computational, data-driven approach.
first_indexed 2024-03-07T08:23:33Z
format Journal article
id oxford-uuid:aa3a1d87-ea9e-4c37-be04-4ec1450276e4
institution University of Oxford
language English
last_indexed 2024-03-07T08:23:33Z
publishDate 2024
publisher Elsevier
record_format dspace
spelling oxford-uuid:aa3a1d87-ea9e-4c37-be04-4ec1450276e42024-02-07T06:26:42ZA corpus-based developmental investigation of linguistic complexity in children's writingJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:aa3a1d87-ea9e-4c37-be04-4ec1450276e4EnglishSymplectic ElementsElsevier2024Hsiao, YDawson, NJBanerji, NNation, KWriting proficiency is associated with linguistic complexity. We used measures of linguistic complexity to investigate the development of children's narrative writing using a large corpus of short stories (N>100,000) written by children aged 5–13 in the UK. Linguistic complexity was assessed using both lexical (N = 30) and syntactic (N = 14) measures. Most measures were associated with age, with writing by older children showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50 % of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across the age range, there was wider variation in syntactic complexity than in lexical diversity, suggesting that syntactic development is subject to more individual differences than the ability to use a diverse set of lexical items. Our findings quantify the nature and content of children's writing through mid-childhood, and we discuss the utility of analysing children's writing using a computational, data-driven approach.
spellingShingle Hsiao, Y
Dawson, NJ
Banerji, N
Nation, K
A corpus-based developmental investigation of linguistic complexity in children's writing
title A corpus-based developmental investigation of linguistic complexity in children's writing
title_full A corpus-based developmental investigation of linguistic complexity in children's writing
title_fullStr A corpus-based developmental investigation of linguistic complexity in children's writing
title_full_unstemmed A corpus-based developmental investigation of linguistic complexity in children's writing
title_short A corpus-based developmental investigation of linguistic complexity in children's writing
title_sort corpus based developmental investigation of linguistic complexity in children s writing
work_keys_str_mv AT hsiaoy acorpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT dawsonnj acorpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT banerjin acorpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT nationk acorpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT hsiaoy corpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT dawsonnj corpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT banerjin corpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting
AT nationk corpusbaseddevelopmentalinvestigationoflinguisticcomplexityinchildrenswriting