Taming our Wild Data

Many research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagr...

Full description

Bibliographic Details
Main Authors: Renske van Enschot, Wilbert Spooren, Antal van den Bosch, Christian Burgers, Liesbeth Degand, Jacqueline Evers-Vermeul, Florian Kunneman, Christine Liebrecht, Yvette Linders, Alfons Maes
Format: Article
Language:English
Published: openjournals.nl 2024-03-01
Series:Dutch Journal of Applied Linguistics
Subjects:
Online Access:https://dujal.nl/article/view/16248
_version_ 1797198288899801088
author Renske van Enschot
Wilbert Spooren
Antal van den Bosch
Christian Burgers
Liesbeth Degand
Jacqueline Evers-Vermeul
Florian Kunneman
Christine Liebrecht
Yvette Linders
Alfons Maes
author_facet Renske van Enschot
Wilbert Spooren
Antal van den Bosch
Christian Burgers
Liesbeth Degand
Jacqueline Evers-Vermeul
Florian Kunneman
Christine Liebrecht
Yvette Linders
Alfons Maes
author_sort Renske van Enschot
collection DOAJ
description Many research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagreement among raters. In this paper, we discuss causes of and solutions to disagreement problems in the analysis of discourse. We discuss crucial factors determining the quality and outcome of corpus analyses, and focus on the sometimes tense relation between reliability and validity. We evaluate formal assessments of intercoder reliability. We suggest a number of ways to improve the intercoder reliability, such as the precise specification of the variables and their coding categories and carving up the coding process into smaller substeps. The paper ends with a reflection on challenges for future work in discourse analysis, with special attention to big data and multimodal discourse.
first_indexed 2024-04-24T06:57:29Z
format Article
id doaj.art-409f1c7a00e54fcd9ecb7082f8a67fde
institution Directory Open Access Journal
issn 2211-7253
language English
last_indexed 2024-04-24T06:57:29Z
publishDate 2024-03-01
publisher openjournals.nl
record_format Article
series Dutch Journal of Applied Linguistics
spelling doaj.art-409f1c7a00e54fcd9ecb7082f8a67fde2024-04-22T10:40:08Zengopenjournals.nlDutch Journal of Applied Linguistics2211-72532024-03-011310.51751/dujal16248Taming our Wild DataRenske van Enschot0Wilbert Spooren1Antal van den Bosch2Christian Burgers3Liesbeth Degand4Jacqueline Evers-Vermeul5Florian Kunneman6Christine Liebrecht7Yvette Linders8Alfons Maes9Tilburg University, Department of Communication and CognitionCentre for Language Studies, Radboud UniversityInstitute for Language Sciences, Utrecht University Amsterdam School of Communication Research (ASCoR), University of AmsterdamInstitute for Language and Communication, University of LouvainInstitute for Language Sciences, Utrecht UniversityDept. Computer Science, Social AI, VU University AmsterdamTilburg center for Cognition and Communication, Tilburg UniversityCentre for Language Studies, Radboud UniversityTilburg center for Cognition and Communication, Tilburg UniversityMany research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagreement among raters. In this paper, we discuss causes of and solutions to disagreement problems in the analysis of discourse. We discuss crucial factors determining the quality and outcome of corpus analyses, and focus on the sometimes tense relation between reliability and validity. We evaluate formal assessments of intercoder reliability. We suggest a number of ways to improve the intercoder reliability, such as the precise specification of the variables and their coding categories and carving up the coding process into smaller substeps. The paper ends with a reflection on challenges for future work in discourse analysis, with special attention to big data and multimodal discourse. https://dujal.nl/article/view/16248discoursequantitative content analysiscomplex discourse datahands-on proceduresintercoder reliability
spellingShingle Renske van Enschot
Wilbert Spooren
Antal van den Bosch
Christian Burgers
Liesbeth Degand
Jacqueline Evers-Vermeul
Florian Kunneman
Christine Liebrecht
Yvette Linders
Alfons Maes
Taming our Wild Data
Dutch Journal of Applied Linguistics
discourse
quantitative content analysis
complex discourse data
hands-on procedures
intercoder reliability
title Taming our Wild Data
title_full Taming our Wild Data
title_fullStr Taming our Wild Data
title_full_unstemmed Taming our Wild Data
title_short Taming our Wild Data
title_sort taming our wild data
topic discourse
quantitative content analysis
complex discourse data
hands-on procedures
intercoder reliability
url https://dujal.nl/article/view/16248
work_keys_str_mv AT renskevanenschot tamingourwilddata
AT wilbertspooren tamingourwilddata
AT antalvandenbosch tamingourwilddata
AT christianburgers tamingourwilddata
AT liesbethdegand tamingourwilddata
AT jacquelineeversvermeul tamingourwilddata
AT floriankunneman tamingourwilddata
AT christineliebrecht tamingourwilddata
AT yvettelinders tamingourwilddata
AT alfonsmaes tamingourwilddata