CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)

Interest in spoken-language corpora has increased over the past two decades leading to the development of new corpora and the discovery of new facets of spoken language. These types of corpora represent the most comprehensive data source about the language of ordinary speakers. Such corpora are base...

Full description

Bibliographic Details
Main Authors:	Jelena Kuvač Kraljević, Gordana Hržica
Format:	Article
Language:	Bulgarian
Published:	University of Rijeka. Faculty of Humanities and Social Sciences 2016-01-01
Series:	Fluminensia: Journal for Philological Research
Subjects:	Croatian Adult Spoken Language Corpus (HrAL) language sampling spontaneous speech corpora
Online Access:	http://hrcak.srce.hr/file/256835

_version_	1818791215569567744
author	Jelena Kuvač Kraljević Gordana Hržica
author_facet	Jelena Kuvač Kraljević Gordana Hržica
author_sort	Jelena Kuvač Kraljević
collection	DOAJ
description	Interest in spoken-language corpora has increased over the past two decades leading to the development of new corpora and the discovery of new facets of spoken language. These types of corpora represent the most comprehensive data source about the language of ordinary speakers. Such corpora are based on spontaneous, unscripted speech defined by a variety of styles, registers and dialects. The aim of this paper is to present the Croatian Adult Spoken Language Corpus (HrAL), its structure and its possible applications in different linguistic subfields. HrAL was built by sampling spontaneous conversations among 617 speakers from all Croatian counties, and it comprises more than 250,000 tokens and more than 100,000 types. Data were collected during three time slots: from 2010 to 2012, from 2014 to 2015 and during 2016. HrAL is today available within TalkBank, a large database of spoken-language corpora covering different languages (https://talkbank.org), in the Conversational Analyses corpora within the subsection titled Conversational Banks. Data were transcribed, coded and segmented using the transcription format Codes for Human Analysis of Transcripts (CHAT) and the Computerised Language Analysis (CLAN) suite of programmes within the TalkBank toolkit. Speech streams were segmented into communication units (C-units) based on syntactic criteria. Most transcripts were linked to their source audios. The TalkBank is public free, i.e. all data stored in it can be shared by the wider community in accordance with the basic rules of the TalkBank. HrAL provides information about spoken grammar and lexicon, discourse skills, error production and productivity in general. It may be useful for sociolinguistic research and studies of synchronic language changes in Croatian.
first_indexed	2024-12-18T15:07:49Z
format	Article
id	doaj.art-625af33d04a442be992c0e9f11353dff
institution	Directory Open Access Journal
issn	0353-4642 1848-9680
language	Bulgarian
last_indexed	2024-12-18T15:07:49Z
publishDate	2016-01-01
publisher	University of Rijeka. Faculty of Humanities and Social Sciences
record_format	Article
series	Fluminensia: Journal for Philological Research
spelling	doaj.art-625af33d04a442be992c0e9f11353dff2022-12-21T21:03:44ZbulUniversity of Rijeka. Faculty of Humanities and Social SciencesFluminensia: Journal for Philological Research0353-46421848-96802016-01-0128287102CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)Jelena Kuvač KraljevićGordana HržicaInterest in spoken-language corpora has increased over the past two decades leading to the development of new corpora and the discovery of new facets of spoken language. These types of corpora represent the most comprehensive data source about the language of ordinary speakers. Such corpora are based on spontaneous, unscripted speech defined by a variety of styles, registers and dialects. The aim of this paper is to present the Croatian Adult Spoken Language Corpus (HrAL), its structure and its possible applications in different linguistic subfields. HrAL was built by sampling spontaneous conversations among 617 speakers from all Croatian counties, and it comprises more than 250,000 tokens and more than 100,000 types. Data were collected during three time slots: from 2010 to 2012, from 2014 to 2015 and during 2016. HrAL is today available within TalkBank, a large database of spoken-language corpora covering different languages (https://talkbank.org), in the Conversational Analyses corpora within the subsection titled Conversational Banks. Data were transcribed, coded and segmented using the transcription format Codes for Human Analysis of Transcripts (CHAT) and the Computerised Language Analysis (CLAN) suite of programmes within the TalkBank toolkit. Speech streams were segmented into communication units (C-units) based on syntactic criteria. Most transcripts were linked to their source audios. The TalkBank is public free, i.e. all data stored in it can be shared by the wider community in accordance with the basic rules of the TalkBank. HrAL provides information about spoken grammar and lexicon, discourse skills, error production and productivity in general. It may be useful for sociolinguistic research and studies of synchronic language changes in Croatian.http://hrcak.srce.hr/file/256835Croatian Adult Spoken Language Corpus (HrAL)language samplingspontaneous speech corpora
spellingShingle	Jelena Kuvač Kraljević Gordana Hržica CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL) Fluminensia: Journal for Philological Research Croatian Adult Spoken Language Corpus (HrAL) language sampling spontaneous speech corpora
title	CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)
title_full	CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)
title_fullStr	CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)
title_full_unstemmed	CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)
title_short	CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)
title_sort	croatian adult spoken language corpus hral
topic	Croatian Adult Spoken Language Corpus (HrAL) language sampling spontaneous speech corpora
url	http://hrcak.srce.hr/file/256835
work_keys_str_mv	AT jelenakuvackraljevic croatianadultspokenlanguagecorpushral AT gordanahrzica croatianadultspokenlanguagecorpushral

CROATIAN ADULT SPOKEN LANGUAGE CORPUS (HrAL)

Similar Items