Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media

The article aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project Lithuanian Language: Ideals, Ideologies and Identity Shifts to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010...

Full description

Bibliographic Details
Main Author: Laima Nevinskaitė
Format: Article
Language:deu
Published: Vilnius University Press 2013-10-01
Series:Taikomoji kalbotyra
Subjects:
Online Access:https://www.journals.vu.lt/taikomojikalbotyra/article/view/17259
_version_ 1819047838648107008
author Laima Nevinskaitė
author_facet Laima Nevinskaitė
author_sort Laima Nevinskaitė
collection DOAJ
description The article aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project Lithuanian Language: Ideals, Ideologies and Identity Shifts to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010. The article discusses theoretical principles of corpus sampling, which were based on the criteria of time periods and genres; the methodological issues encountered when constructing the sampling scheme; shares the practical experience of selecting and gathering recordings to be included into the corpus; and presents the actual structure of the corpus. One of the main requirements for any corpus is its representativeness and balance. One possible way to achieve them is to distinguish objectively defined text types and to build the corpus along these lines. Thus the composition of the corpus of the broadcast media was designed on the basis of two criteria: periods of broadcast media development and genre. There are three periods distinguished: Soviet 1960–1987, transitional 1988–1992 and contemporary 1993–now. In respect of genre, three groups of programs are included: talk programs (further subdivided into the types of interview, debate and talk-show); documentaries, features and journal programs; information programs. The article discusses problems that were encountered when trying to implement the corpus along these lines: problems of availability of materials due to technological peculiarities of different periods and organisational factors of archive institutions; the issue of balance between the periods; the problems of genre comparability and different extent of diversity of genres in different periods, and continuity of genres. Finally, the composition and the size of the corpus are presented (63 hours of recordings, about 350 thousand words). The paper concludes that despite the limited availability of materials and other problems discussed above which is why the corpus cannot be regarded as perfectly representative and balanced, it is sufficient for research into public language change. This was confirmed by tentative research studies done on its basis. The corpus meets the usual technical requirements: the transcriptions have been made in CLAN software developed within the CHILDES project, the recordings have been transcribed, coded and morphologically annotated following the conventions of the CHILDES project, the speakers have been assigned individual codes, and the transcriptions have been linked to the sound/image files.
first_indexed 2024-12-21T11:06:44Z
format Article
id doaj.art-cc045db2ed054f0c8e09b03ff4ebf149
institution Directory Open Access Journal
issn 2029-8935
language deu
last_indexed 2024-12-21T11:06:44Z
publishDate 2013-10-01
publisher Vilnius University Press
record_format Article
series Taikomoji kalbotyra
spelling doaj.art-cc045db2ed054f0c8e09b03ff4ebf1492022-12-21T19:06:12ZdeuVilnius University PressTaikomoji kalbotyra2029-89352013-10-01210.15388/TK.2013.17259Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast MediaLaima Nevinskaitė0Institute of Lithuanian Language, LithuaniaThe article aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project Lithuanian Language: Ideals, Ideologies and Identity Shifts to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010. The article discusses theoretical principles of corpus sampling, which were based on the criteria of time periods and genres; the methodological issues encountered when constructing the sampling scheme; shares the practical experience of selecting and gathering recordings to be included into the corpus; and presents the actual structure of the corpus. One of the main requirements for any corpus is its representativeness and balance. One possible way to achieve them is to distinguish objectively defined text types and to build the corpus along these lines. Thus the composition of the corpus of the broadcast media was designed on the basis of two criteria: periods of broadcast media development and genre. There are three periods distinguished: Soviet 1960–1987, transitional 1988–1992 and contemporary 1993–now. In respect of genre, three groups of programs are included: talk programs (further subdivided into the types of interview, debate and talk-show); documentaries, features and journal programs; information programs. The article discusses problems that were encountered when trying to implement the corpus along these lines: problems of availability of materials due to technological peculiarities of different periods and organisational factors of archive institutions; the issue of balance between the periods; the problems of genre comparability and different extent of diversity of genres in different periods, and continuity of genres. Finally, the composition and the size of the corpus are presented (63 hours of recordings, about 350 thousand words). The paper concludes that despite the limited availability of materials and other problems discussed above which is why the corpus cannot be regarded as perfectly representative and balanced, it is sufficient for research into public language change. This was confirmed by tentative research studies done on its basis. The corpus meets the usual technical requirements: the transcriptions have been made in CLAN software developed within the CHILDES project, the recordings have been transcribed, coded and morphologically annotated following the conventions of the CHILDES project, the speakers have been assigned individual codes, and the transcriptions have been linked to the sound/image files.https://www.journals.vu.lt/taikomojikalbotyra/article/view/17259spoken public languagespoken mass mediagenres of spoken mass mediacorpus samplinglanguage corpus
spellingShingle Laima Nevinskaitė
Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
Taikomoji kalbotyra
spoken public language
spoken mass media
genres of spoken mass media
corpus sampling
language corpus
title Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_full Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_fullStr Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_full_unstemmed Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_short Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_sort methodology and experience of building the retrospective corpus of lithuanian broadcast media
topic spoken public language
spoken mass media
genres of spoken mass media
corpus sampling
language corpus
url https://www.journals.vu.lt/taikomojikalbotyra/article/view/17259
work_keys_str_mv AT laimanevinskaite methodologyandexperienceofbuildingtheretrospectivecorpusoflithuanianbroadcastmedia