A TEI Schema for the Representation of Computer-mediated Communication
The paper presents an XML schema for the representation of genres of computer-mediated communication (CMC) that is compliant with the encoding framework defined by the TEI. It was designed for the annotation of CMC documents in the project Deutsches Referenzkorpus zur internetbasierten Kommunikation...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | deu |
Published: |
Text Encoding Initiative Consortium
2012-10-01
|
Series: | Journal of the Text Encoding Initiative |
Subjects: | |
Online Access: | http://journals.openedition.org/jtei/476 |
_version_ | 1797707916459900928 |
---|---|
author | Michael Beißwenger Maria Ermakova Alexander Geyken Lothar Lemnitzer Angelika Storrer |
author_facet | Michael Beißwenger Maria Ermakova Alexander Geyken Lothar Lemnitzer Angelika Storrer |
author_sort | Michael Beißwenger |
collection | DOAJ |
description | The paper presents an XML schema for the representation of genres of computer-mediated communication (CMC) that is compliant with the encoding framework defined by the TEI. It was designed for the annotation of CMC documents in the project Deutsches Referenzkorpus zur internetbasierten Kommunikation (DeRiK), which aims at building a corpus on language use in the most popular CMC genres on the German-speaking Internet. The focus of the schema is on those CMC genres which are written and dialogic―such as forums, bulletin boards, chats, instant messaging, wiki and weblog discussions, microblogging on Twitter, and conversation on “social network” sites. The schema provides a representation format for the main structural features of CMC discourse as well as elements for the annotation of those units regarded as “typical” for language use on the Internet. The schema introduces an element <posting>, which describes stretches of text that are sent to the server by a user at a certain point in time. Postings are the main constituting elements of threads and logfiles, which, in our schema, are the two main types of CMC macrostructures. For the microlevel of CMC documents (that is, the structure of the <posting> content), the schema introduces elements for selected features of Internet jargon such as emoticons, interaction words and addressing terms. It allows for easy anonymization of CMC data for purposes in which the annotated data are made publicly available and includes metadata which are necessary for referencing random excerpts from the data as references in dictionary entries or as results of corpus queries. Documentation of the schema as well as encoding examples can be retrieved from the web at http://www.empirikom.net/bin/view/Themen/CmcTEI. The schema is meant to be a core model for representing CMC that can be modified and extended by others according to their own specific perspectives on CMC data. It could be a first step towards an integration of features for the representation of CMC genres into a future new version of the TEI Guidelines. |
first_indexed | 2024-03-12T06:13:35Z |
format | Article |
id | doaj.art-47a049392c9243fb8944e414c9bb56e5 |
institution | Directory Open Access Journal |
issn | 2162-5603 |
language | deu |
last_indexed | 2024-03-12T06:13:35Z |
publishDate | 2012-10-01 |
publisher | Text Encoding Initiative Consortium |
record_format | Article |
series | Journal of the Text Encoding Initiative |
spelling | doaj.art-47a049392c9243fb8944e414c9bb56e52023-09-03T02:48:11ZdeuText Encoding Initiative ConsortiumJournal of the Text Encoding Initiative2162-56032012-10-01310.4000/jtei.476A TEI Schema for the Representation of Computer-mediated CommunicationMichael BeißwengerMaria ErmakovaAlexander GeykenLothar LemnitzerAngelika StorrerThe paper presents an XML schema for the representation of genres of computer-mediated communication (CMC) that is compliant with the encoding framework defined by the TEI. It was designed for the annotation of CMC documents in the project Deutsches Referenzkorpus zur internetbasierten Kommunikation (DeRiK), which aims at building a corpus on language use in the most popular CMC genres on the German-speaking Internet. The focus of the schema is on those CMC genres which are written and dialogic―such as forums, bulletin boards, chats, instant messaging, wiki and weblog discussions, microblogging on Twitter, and conversation on “social network” sites. The schema provides a representation format for the main structural features of CMC discourse as well as elements for the annotation of those units regarded as “typical” for language use on the Internet. The schema introduces an element <posting>, which describes stretches of text that are sent to the server by a user at a certain point in time. Postings are the main constituting elements of threads and logfiles, which, in our schema, are the two main types of CMC macrostructures. For the microlevel of CMC documents (that is, the structure of the <posting> content), the schema introduces elements for selected features of Internet jargon such as emoticons, interaction words and addressing terms. It allows for easy anonymization of CMC data for purposes in which the annotated data are made publicly available and includes metadata which are necessary for referencing random excerpts from the data as references in dictionary entries or as results of corpus queries. Documentation of the schema as well as encoding examples can be retrieved from the web at http://www.empirikom.net/bin/view/Themen/CmcTEI. The schema is meant to be a core model for representing CMC that can be modified and extended by others according to their own specific perspectives on CMC data. It could be a first step towards an integration of features for the representation of CMC genres into a future new version of the TEI Guidelines.http://journals.openedition.org/jtei/476computer-mediated communicationCMCweb genresthreadlogfileforum |
spellingShingle | Michael Beißwenger Maria Ermakova Alexander Geyken Lothar Lemnitzer Angelika Storrer A TEI Schema for the Representation of Computer-mediated Communication Journal of the Text Encoding Initiative computer-mediated communication CMC web genres thread logfile forum |
title | A TEI Schema for the Representation of Computer-mediated Communication |
title_full | A TEI Schema for the Representation of Computer-mediated Communication |
title_fullStr | A TEI Schema for the Representation of Computer-mediated Communication |
title_full_unstemmed | A TEI Schema for the Representation of Computer-mediated Communication |
title_short | A TEI Schema for the Representation of Computer-mediated Communication |
title_sort | tei schema for the representation of computer mediated communication |
topic | computer-mediated communication CMC web genres thread logfile forum |
url | http://journals.openedition.org/jtei/476 |
work_keys_str_mv | AT michaelbeißwenger ateischemafortherepresentationofcomputermediatedcommunication AT mariaermakova ateischemafortherepresentationofcomputermediatedcommunication AT alexandergeyken ateischemafortherepresentationofcomputermediatedcommunication AT lotharlemnitzer ateischemafortherepresentationofcomputermediatedcommunication AT angelikastorrer ateischemafortherepresentationofcomputermediatedcommunication AT michaelbeißwenger teischemafortherepresentationofcomputermediatedcommunication AT mariaermakova teischemafortherepresentationofcomputermediatedcommunication AT alexandergeyken teischemafortherepresentationofcomputermediatedcommunication AT lotharlemnitzer teischemafortherepresentationofcomputermediatedcommunication AT angelikastorrer teischemafortherepresentationofcomputermediatedcommunication |