CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading

<jats:title>Abstract</jats:title> <jats:p>We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. CELER comprises over 320,000 words, and eye-tracking data from 365 participants. Sixty-nine participa...

Full description

Bibliographic Details
Main Authors: Berzak, Yevgeni, Nakamura, Chie, Smith, Amelia, Weng, Emily, Katz, Boris, Flynn, Suzanne, Levy, Roger
Other Authors: Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Format: Article
Language:English
Published: MIT Press - Journals 2023
Online Access:https://hdl.handle.net/1721.1/150006
_version_ 1826212010002481152
author Berzak, Yevgeni
Nakamura, Chie
Smith, Amelia
Weng, Emily
Katz, Boris
Flynn, Suzanne
Levy, Roger
author2 Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
author_facet Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Berzak, Yevgeni
Nakamura, Chie
Smith, Amelia
Weng, Emily
Katz, Boris
Flynn, Suzanne
Levy, Roger
author_sort Berzak, Yevgeni
collection MIT
description <jats:title>Abstract</jats:title> <jats:p>We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. CELER comprises over 320,000 words, and eye-tracking data from 365 participants. Sixty-nine participants are L1 (first language) speakers, and 296 are L2 (second language) speakers from a wide range of English proficiency levels and five different native language backgrounds. As such, CELER has an order of magnitude more L2 participants than any currently available eye movements dataset with L2 readers. Each participant in CELER reads 156 newswire sentences from the Wall Street Journal (WSJ), in a new experimental design where half of the sentences are shared across participants and half are unique to each participant. We provide analyses that compare L1 and L2 participants with respect to standard reading time measures, as well as the effects of frequency, surprisal, and word length on reading times. These analyses validate the corpus and demonstrate some of its strengths. We envision CELER to enable new types of research on language processing and acquisition, and to facilitate interactions between psycholinguistics and natural language processing (NLP).</jats:p>
first_indexed 2024-09-23T15:15:03Z
format Article
id mit-1721.1/150006
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T15:15:03Z
publishDate 2023
publisher MIT Press - Journals
record_format dspace
spelling mit-1721.1/1500062023-04-01T03:10:43Z CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading Berzak, Yevgeni Nakamura, Chie Smith, Amelia Weng, Emily Katz, Boris Flynn, Suzanne Levy, Roger Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences <jats:title>Abstract</jats:title> <jats:p>We present CELER (Corpus of Eye Movements in L1 and L2 English Reading), a broad coverage eye-tracking corpus for English. CELER comprises over 320,000 words, and eye-tracking data from 365 participants. Sixty-nine participants are L1 (first language) speakers, and 296 are L2 (second language) speakers from a wide range of English proficiency levels and five different native language backgrounds. As such, CELER has an order of magnitude more L2 participants than any currently available eye movements dataset with L2 readers. Each participant in CELER reads 156 newswire sentences from the Wall Street Journal (WSJ), in a new experimental design where half of the sentences are shared across participants and half are unique to each participant. We provide analyses that compare L1 and L2 participants with respect to standard reading time measures, as well as the effects of frequency, surprisal, and word length on reading times. These analyses validate the corpus and demonstrate some of its strengths. We envision CELER to enable new types of research on language processing and acquisition, and to facilitate interactions between psycholinguistics and natural language processing (NLP).</jats:p> 2023-03-30T13:20:43Z 2023-03-30T13:20:43Z 2022 2023-03-30T13:17:48Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/150006 Berzak, Yevgeni, Nakamura, Chie, Smith, Amelia, Weng, Emily, Katz, Boris et al. 2022. "CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading." Open Mind, 6. en 10.1162/OPMI_A_00054 Open Mind Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/ application/pdf MIT Press - Journals MIT Press
spellingShingle Berzak, Yevgeni
Nakamura, Chie
Smith, Amelia
Weng, Emily
Katz, Boris
Flynn, Suzanne
Levy, Roger
CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
title CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
title_full CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
title_fullStr CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
title_full_unstemmed CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
title_short CELER: A 365-Participant Corpus of Eye Movements in L1 and L2 English Reading
title_sort celer a 365 participant corpus of eye movements in l1 and l2 english reading
url https://hdl.handle.net/1721.1/150006
work_keys_str_mv AT berzakyevgeni celera365participantcorpusofeyemovementsinl1andl2englishreading
AT nakamurachie celera365participantcorpusofeyemovementsinl1andl2englishreading
AT smithamelia celera365participantcorpusofeyemovementsinl1andl2englishreading
AT wengemily celera365participantcorpusofeyemovementsinl1andl2englishreading
AT katzboris celera365participantcorpusofeyemovementsinl1andl2englishreading
AT flynnsuzanne celera365participantcorpusofeyemovementsinl1andl2englishreading
AT levyroger celera365participantcorpusofeyemovementsinl1andl2englishreading