Crowdsourcing ratings for single lexical items

In this study, we investigate theoretical and practical issues connected to differentiating between core and peripheral vocabulary at different levels of linguistic proficiency using statistical approaches combined with crowdsourcing. We also investigate whether crowdsourcing second language learne...

Full description

Bibliographic Details
Main Authors: Elena Volodina, David Alfter, Therese Lindström Tiedemann
Format: Article
Language:English
Published: University of Ljubljana Press (Založba Univerze v Ljubljani) 2022-12-01
Series:Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:
Online Access:https://journals.uni-lj.si/slovenscina2/article/view/11247
_version_ 1797730781280337920
author Elena Volodina
David Alfter
Therese Lindström Tiedemann
author_facet Elena Volodina
David Alfter
Therese Lindström Tiedemann
author_sort Elena Volodina
collection DOAJ
description In this study, we investigate theoretical and practical issues connected to differentiating between core and peripheral vocabulary at different levels of linguistic proficiency using statistical approaches combined with crowdsourcing. We also investigate whether crowdsourcing second language learners’ rankings can be used for assigning levels to unseen vocabulary. The study is performed on Swedish single-word items. The four hypotheses we examine are: (1) there is core vocabulary for each proficiency level, but this is only true until CEFR level B2 (upper-intermediate); (2) core vocabulary shows more systematicity in its behavior and usage, whereas peripheral items have more idiosyncratic behavior; (3) given that we have truly core items (aka anchor items) for each level, we can place any new unseen item in relation to the identified core items by using a series of comparative judgment tasks, this way assigning a “target” level for a previously unseen item; and (4) non-experts will perform on par with experts in a comparative judgment setting. The hypotheses have been largely confirmed: In relation to (1) and (2), our results show that there seems to be some systematicity in core vocabulary for early to mid-levels (A1-B1) while we find less systematicity for higher levels (B2-C1). In relation to (3), we suggest crowdsourcing word rankings using comparative judgment with known anchor words as a method to assign a “target” level to unseen words. With regard to (4), we confirm the previous findings that non-experts, in our case language learners, can be effectively used for the linguistic annotation tasks in a comparative judgment setting.
first_indexed 2024-03-12T11:49:16Z
format Article
id doaj.art-0c53fd68a2794b6faff535bcc2cecc8f
institution Directory Open Access Journal
issn 2335-2736
language English
last_indexed 2024-03-12T11:49:16Z
publishDate 2022-12-01
publisher University of Ljubljana Press (Založba Univerze v Ljubljani)
record_format Article
series Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
spelling doaj.art-0c53fd68a2794b6faff535bcc2cecc8f2023-08-31T11:24:09ZengUniversity of Ljubljana Press (Založba Univerze v Ljubljani)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362022-12-0110210.4312/slo2.0.2022.2.5-61Crowdsourcing ratings for single lexical itemsElena Volodina0David Alfter1Therese Lindström Tiedemann2University of Gothenburg, SwedenUniversity of Gothenburg, Sweden; Université Catholique de Louvain, BelgiumUniversity of Helsinki, Finland In this study, we investigate theoretical and practical issues connected to differentiating between core and peripheral vocabulary at different levels of linguistic proficiency using statistical approaches combined with crowdsourcing. We also investigate whether crowdsourcing second language learners’ rankings can be used for assigning levels to unseen vocabulary. The study is performed on Swedish single-word items. The four hypotheses we examine are: (1) there is core vocabulary for each proficiency level, but this is only true until CEFR level B2 (upper-intermediate); (2) core vocabulary shows more systematicity in its behavior and usage, whereas peripheral items have more idiosyncratic behavior; (3) given that we have truly core items (aka anchor items) for each level, we can place any new unseen item in relation to the identified core items by using a series of comparative judgment tasks, this way assigning a “target” level for a previously unseen item; and (4) non-experts will perform on par with experts in a comparative judgment setting. The hypotheses have been largely confirmed: In relation to (1) and (2), our results show that there seems to be some systematicity in core vocabulary for early to mid-levels (A1-B1) while we find less systematicity for higher levels (B2-C1). In relation to (3), we suggest crowdsourcing word rankings using comparative judgment with known anchor words as a method to assign a “target” level to unseen words. With regard to (4), we confirm the previous findings that non-experts, in our case language learners, can be effectively used for the linguistic annotation tasks in a comparative judgment setting. https://journals.uni-lj.si/slovenscina2/article/view/11247core vocabulary and language learningnon-expert crowdsourcingsingle lexical itemsCEFR levelscomparative judgment
spellingShingle Elena Volodina
David Alfter
Therese Lindström Tiedemann
Crowdsourcing ratings for single lexical items
Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
core vocabulary and language learning
non-expert crowdsourcing
single lexical items
CEFR levels
comparative judgment
title Crowdsourcing ratings for single lexical items
title_full Crowdsourcing ratings for single lexical items
title_fullStr Crowdsourcing ratings for single lexical items
title_full_unstemmed Crowdsourcing ratings for single lexical items
title_short Crowdsourcing ratings for single lexical items
title_sort crowdsourcing ratings for single lexical items
topic core vocabulary and language learning
non-expert crowdsourcing
single lexical items
CEFR levels
comparative judgment
url https://journals.uni-lj.si/slovenscina2/article/view/11247
work_keys_str_mv AT elenavolodina crowdsourcingratingsforsinglelexicalitems
AT davidalfter crowdsourcingratingsforsinglelexicalitems
AT thereselindstromtiedemann crowdsourcingratingsforsinglelexicalitems