Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media

The article aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project Lithuanian Language: Ideals, Ideologies and Identity Shifts to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010...

Full description

Bibliographic Details
Main Author:	Laima Nevinskaitė
Format:	Article
Language:	deu
Published:	Vilnius University Press 2013-10-01
Series:	Taikomoji kalbotyra
Subjects:	spoken public language spoken mass media genres of spoken mass media corpus sampling language corpus
Online Access:	https://www.journals.vu.lt/taikomojikalbotyra/article/view/17259

_version_	1830460116049068032
author	Laima Nevinskaitė
author_facet	Laima Nevinskaitė
author_sort	Laima Nevinskaitė
collection	DOAJ
description	The article aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project Lithuanian Language: Ideals, Ideologies and Identity Shifts to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010. The article discusses theoretical principles of corpus sampling, which were based on the criteria of time periods and genres; the methodological issues encountered when constructing the sampling scheme; shares the practical experience of selecting and gathering recordings to be included into the corpus; and presents the actual structure of the corpus. One of the main requirements for any corpus is its representativeness and balance. One possible way to achieve them is to distinguish objectively defined text types and to build the corpus along these lines. Thus the composition of the corpus of the broadcast media was designed on the basis of two criteria: periods of broadcast media development and genre. There are three periods distinguished: Soviet 1960–1987, transitional 1988–1992 and contemporary 1993–now. In respect of genre, three groups of programs are included: talk programs (further subdivided into the types of interview, debate and talk-show); documentaries, features and journal programs; information programs. The article discusses problems that were encountered when trying to implement the corpus along these lines: problems of availability of materials due to technological peculiarities of different periods and organisational factors of archive institutions; the issue of balance between the periods; the problems of genre comparability and different extent of diversity of genres in different periods, and continuity of genres. Finally, the composition and the size of the corpus are presented (63 hours of recordings, about 350 thousand words). The paper concludes that despite the limited availability of materials and other problems discussed above which is why the corpus cannot be regarded as perfectly representative and balanced, it is sufficient for research into public language change. This was confirmed by tentative research studies done on its basis. The corpus meets the usual technical requirements: the transcriptions have been made in CLAN software developed within the CHILDES project, the recordings have been transcribed, coded and morphologically annotated following the conventions of the CHILDES project, the speakers have been assigned individual codes, and the transcriptions have been linked to the sound/image files.
first_indexed	2024-12-21T11:06:44Z
format	Article
id	doaj.art-cc045db2ed054f0c8e09b03ff4ebf149
institution	Directory Open Access Journal
issn	2029-8935
language	deu
last_indexed	2024-12-21T11:06:44Z
publishDate	2013-10-01
publisher	Vilnius University Press
record_format	Article
series	Taikomoji kalbotyra
spelling	doaj.art-cc045db2ed054f0c8e09b03ff4ebf1492022-12-21T19:06:12ZdeuVilnius University PressTaikomoji kalbotyra2029-89352013-10-01210.15388/TK.2013.17259Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast MediaLaima Nevinskaitė0Institute of Lithuanian Language, LithuaniaThe article aims to introduce the corpus of broadcast media (radio and television), which was compiled in the framework of the project Lithuanian Language: Ideals, Ideologies and Identity Shifts to the academic community. The corpus includes about 63 hours of transcribed recordings from 1960 to 2010. The article discusses theoretical principles of corpus sampling, which were based on the criteria of time periods and genres; the methodological issues encountered when constructing the sampling scheme; shares the practical experience of selecting and gathering recordings to be included into the corpus; and presents the actual structure of the corpus. One of the main requirements for any corpus is its representativeness and balance. One possible way to achieve them is to distinguish objectively defined text types and to build the corpus along these lines. Thus the composition of the corpus of the broadcast media was designed on the basis of two criteria: periods of broadcast media development and genre. There are three periods distinguished: Soviet 1960–1987, transitional 1988–1992 and contemporary 1993–now. In respect of genre, three groups of programs are included: talk programs (further subdivided into the types of interview, debate and talk-show); documentaries, features and journal programs; information programs. The article discusses problems that were encountered when trying to implement the corpus along these lines: problems of availability of materials due to technological peculiarities of different periods and organisational factors of archive institutions; the issue of balance between the periods; the problems of genre comparability and different extent of diversity of genres in different periods, and continuity of genres. Finally, the composition and the size of the corpus are presented (63 hours of recordings, about 350 thousand words). The paper concludes that despite the limited availability of materials and other problems discussed above which is why the corpus cannot be regarded as perfectly representative and balanced, it is sufficient for research into public language change. This was confirmed by tentative research studies done on its basis. The corpus meets the usual technical requirements: the transcriptions have been made in CLAN software developed within the CHILDES project, the recordings have been transcribed, coded and morphologically annotated following the conventions of the CHILDES project, the speakers have been assigned individual codes, and the transcriptions have been linked to the sound/image files.https://www.journals.vu.lt/taikomojikalbotyra/article/view/17259spoken public languagespoken mass mediagenres of spoken mass mediacorpus samplinglanguage corpus
spellingShingle	Laima Nevinskaitė Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media Taikomoji kalbotyra spoken public language spoken mass media genres of spoken mass media corpus sampling language corpus
title	Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_full	Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_fullStr	Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_full_unstemmed	Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_short	Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media
title_sort	methodology and experience of building the retrospective corpus of lithuanian broadcast media
topic	spoken public language spoken mass media genres of spoken mass media corpus sampling language corpus
url	https://www.journals.vu.lt/taikomojikalbotyra/article/view/17259
work_keys_str_mv	AT laimanevinskaite methodologyandexperienceofbuildingtheretrospectivecorpusoflithuanianbroadcastmedia

Methodology and Experience of Building the Retrospective Corpus of Lithuanian Broadcast Media

Similar Items