Taming our Wild Data
Many research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagr...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
openjournals.nl
2024-03-01
|
Series: | Dutch Journal of Applied Linguistics |
Subjects: | |
Online Access: | https://dujal.nl/article/view/16248 |
_version_ | 1797198288899801088 |
---|---|
author | Renske van Enschot Wilbert Spooren Antal van den Bosch Christian Burgers Liesbeth Degand Jacqueline Evers-Vermeul Florian Kunneman Christine Liebrecht Yvette Linders Alfons Maes |
author_facet | Renske van Enschot Wilbert Spooren Antal van den Bosch Christian Burgers Liesbeth Degand Jacqueline Evers-Vermeul Florian Kunneman Christine Liebrecht Yvette Linders Alfons Maes |
author_sort | Renske van Enschot |
collection | DOAJ |
description | Many research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagreement among raters. In this paper, we discuss causes of and solutions to disagreement problems in the analysis of discourse. We discuss crucial factors determining the quality and outcome of corpus analyses, and focus on the sometimes tense relation between reliability and validity. We evaluate formal assessments of intercoder reliability. We suggest a number of ways to improve the intercoder reliability, such as the precise specification of the variables and their coding categories and carving up the coding process into smaller substeps. The paper ends with a reflection on challenges for future work in discourse analysis, with special attention to big data and multimodal discourse.
|
first_indexed | 2024-04-24T06:57:29Z |
format | Article |
id | doaj.art-409f1c7a00e54fcd9ecb7082f8a67fde |
institution | Directory Open Access Journal |
issn | 2211-7253 |
language | English |
last_indexed | 2024-04-24T06:57:29Z |
publishDate | 2024-03-01 |
publisher | openjournals.nl |
record_format | Article |
series | Dutch Journal of Applied Linguistics |
spelling | doaj.art-409f1c7a00e54fcd9ecb7082f8a67fde2024-04-22T10:40:08Zengopenjournals.nlDutch Journal of Applied Linguistics2211-72532024-03-011310.51751/dujal16248Taming our Wild DataRenske van Enschot0Wilbert Spooren1Antal van den Bosch2Christian Burgers3Liesbeth Degand4Jacqueline Evers-Vermeul5Florian Kunneman6Christine Liebrecht7Yvette Linders8Alfons Maes9Tilburg University, Department of Communication and CognitionCentre for Language Studies, Radboud UniversityInstitute for Language Sciences, Utrecht University Amsterdam School of Communication Research (ASCoR), University of AmsterdamInstitute for Language and Communication, University of LouvainInstitute for Language Sciences, Utrecht UniversityDept. Computer Science, Social AI, VU University AmsterdamTilburg center for Cognition and Communication, Tilburg UniversityCentre for Language Studies, Radboud UniversityTilburg center for Cognition and Communication, Tilburg UniversityMany research questions in the field of applied linguistics are answered by manually analyzing data collections or corpora: collections of spoken, written and/or visual communicative messages. In this kind of quantitative content analysis, the coding of subjective language data often leads to disagreement among raters. In this paper, we discuss causes of and solutions to disagreement problems in the analysis of discourse. We discuss crucial factors determining the quality and outcome of corpus analyses, and focus on the sometimes tense relation between reliability and validity. We evaluate formal assessments of intercoder reliability. We suggest a number of ways to improve the intercoder reliability, such as the precise specification of the variables and their coding categories and carving up the coding process into smaller substeps. The paper ends with a reflection on challenges for future work in discourse analysis, with special attention to big data and multimodal discourse. https://dujal.nl/article/view/16248discoursequantitative content analysiscomplex discourse datahands-on proceduresintercoder reliability |
spellingShingle | Renske van Enschot Wilbert Spooren Antal van den Bosch Christian Burgers Liesbeth Degand Jacqueline Evers-Vermeul Florian Kunneman Christine Liebrecht Yvette Linders Alfons Maes Taming our Wild Data Dutch Journal of Applied Linguistics discourse quantitative content analysis complex discourse data hands-on procedures intercoder reliability |
title | Taming our Wild Data |
title_full | Taming our Wild Data |
title_fullStr | Taming our Wild Data |
title_full_unstemmed | Taming our Wild Data |
title_short | Taming our Wild Data |
title_sort | taming our wild data |
topic | discourse quantitative content analysis complex discourse data hands-on procedures intercoder reliability |
url | https://dujal.nl/article/view/16248 |
work_keys_str_mv | AT renskevanenschot tamingourwilddata AT wilbertspooren tamingourwilddata AT antalvandenbosch tamingourwilddata AT christianburgers tamingourwilddata AT liesbethdegand tamingourwilddata AT jacquelineeversvermeul tamingourwilddata AT floriankunneman tamingourwilddata AT christineliebrecht tamingourwilddata AT yvettelinders tamingourwilddata AT alfonsmaes tamingourwilddata |