Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora

Up until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined...

Full description

Bibliographic Details
Main Authors: Alexander König, Jennifer-Carmen Frey, Egon W. Stemle
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/12/5/199
_version_ 1797535647162957824
author Alexander König
Jennifer-Carmen Frey
Egon W. Stemle
author_facet Alexander König
Jennifer-Carmen Frey
Egon W. Stemle
author_sort Alexander König
collection DOAJ
description Up until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined with domain-inherent obstacles in data sharing have so far hampered comparability, reusability and reproducibility of data and research results. In this article, we present work in creating a digital infrastructure for L1 and L2 learner corpora and populating it with data collected in the past. We embed our infrastructure efforts in the broader field of infrastructures for scientific research, drawing from technical solutions and frameworks from research data management, among which the FAIR guiding principles for data stewardship. We share our experiences from integrating some L1 and L2 learner corpora from concluded projects into the infrastructure while trying to ensure compliance with the FAIR principles and the standards we established for reproducibility, discussing how far research data that has been collected in the past can be made comparable, reusable and reproducible. Our results show that some basic needs for providing comparable and reusable data are covered by existing general infrastructure solutions and can be exploited for domain-specific infrastructures such as the one presented in this article. Other aspects need genuinely domain-driven approaches. The solutions found for the corpora in the presented infrastructure can only be a preliminary attempt, and further community involvement would be needed to provide templates and models acknowledged and promoted by the community. Furthermore, forward-looking data management would be needed starting from the beginning of new corpus creation projects to ensure that all requirements for FAIR data can be met.
first_indexed 2024-03-10T11:47:20Z
format Article
id doaj.art-905d666d30d942129ccc6058740aa869
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-10T11:47:20Z
publishDate 2021-04-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-905d666d30d942129ccc6058740aa8692023-11-21T17:58:14ZengMDPI AGInformation2078-24892021-04-0112519910.3390/info12050199Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner CorporaAlexander König0Jennifer-Carmen Frey1Egon W. Stemle2CLARIN ERIC, 3512 BS Utrecht, The NetherlandsInstitute for Applied Linguistics, Eurac Research, 39100 Bolzano, ItalyInstitute for Applied Linguistics, Eurac Research, 39100 Bolzano, ItalyUp until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined with domain-inherent obstacles in data sharing have so far hampered comparability, reusability and reproducibility of data and research results. In this article, we present work in creating a digital infrastructure for L1 and L2 learner corpora and populating it with data collected in the past. We embed our infrastructure efforts in the broader field of infrastructures for scientific research, drawing from technical solutions and frameworks from research data management, among which the FAIR guiding principles for data stewardship. We share our experiences from integrating some L1 and L2 learner corpora from concluded projects into the infrastructure while trying to ensure compliance with the FAIR principles and the standards we established for reproducibility, discussing how far research data that has been collected in the past can be made comparable, reusable and reproducible. Our results show that some basic needs for providing comparable and reusable data are covered by existing general infrastructure solutions and can be exploited for domain-specific infrastructures such as the one presented in this article. Other aspects need genuinely domain-driven approaches. The solutions found for the corpora in the presented infrastructure can only be a preliminary attempt, and further community involvement would be needed to provide templates and models acknowledged and promoted by the community. Furthermore, forward-looking data management would be needed starting from the beginning of new corpus creation projects to ensure that all requirements for FAIR data can be met.https://www.mdpi.com/2078-2489/12/5/199learner corpus researchresearch infrastructures
spellingShingle Alexander König
Jennifer-Carmen Frey
Egon W. Stemle
Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora
Information
learner corpus research
research infrastructures
title Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora
title_full Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora
title_fullStr Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora
title_full_unstemmed Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora
title_short Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora
title_sort exploring reusability and reproducibility for a research infrastructure for l1 and l2 learner corpora
topic learner corpus research
research infrastructures
url https://www.mdpi.com/2078-2489/12/5/199
work_keys_str_mv AT alexanderkonig exploringreusabilityandreproducibilityforaresearchinfrastructureforl1andl2learnercorpora
AT jennifercarmenfrey exploringreusabilityandreproducibilityforaresearchinfrastructureforl1andl2learnercorpora
AT egonwstemle exploringreusabilityandreproducibilityforaresearchinfrastructureforl1andl2learnercorpora