Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data

The advantages of using linguistic corpus data in education and research are obvious and well covered in specialized literature. This tool considerably simplifies acquisition of linguistic data and their processing. Two main corpora have been built for the Tatar language by now, each in open access:...

Full description

Bibliographic Details
Main Authors:	A.M. Galieva, О.А. Nevzorova
Format:	Article
Language:	Russian
Published:	Kazan Federal University 2016-10-01
Series:	Učënye Zapiski Kazanskogo Universiteta: Seriâ Gumanitarnye Nauki
Subjects:	corpus tatar language search query grammar semantics
Online Access:	https://kpfu.ru/portal/docs/F385642576/158_5_gum_10.pdf

_version_	1797733493316255744
author	A.M. Galieva О.А. Nevzorova
author_facet	A.M. Galieva О.А. Nevzorova
author_sort	A.M. Galieva
collection	DOAJ
description	The advantages of using linguistic corpus data in education and research are obvious and well covered in specialized literature. This tool considerably simplifies acquisition of linguistic data and their processing. Two main corpora have been built for the Tatar language by now, each in open access: the Corpus of Written Tatar compiled in Kazan Federal University, (http://search.corpus.tatar/en) and the Tatar National Corpus (http://corpus.antat.ru/?lang=en) developed by researchers of the Institute of Applied Semiotics, Tatarstan Academy of Sciences, Russia. These corpora are being hourly replenished; the update of textual collections is mainly carried out through the use of media texts, which provides constant flow of fresh linguistic material. The Tatar language has complicated syntax and intricate agglutinative morphology, and corpus data is a reliable tool for enriching and deepening linguistic descriptions of Tatar. This paper is the first attempt to describe examples of complex queries to the search system of “Tugam Tel” Tatar National Corpus, these queries are aimed at studying complicated phenomena of Tatar linguistic semantics. The authors proceed from the viewpoint that correctly formulated queries to the Corpus provide data allowing to draw conclusions about theoretically relevant laws of the language system. The inventory of grammatical categories of the Tatar language and affixes that express these categories have been considered as a key to language semantics. The authors, by means of particular examples, have shown that search functionality of the Tatar National Corpus enables to extract data meeting certain semantic criteria, from semantically unstructured corpus data. Construction of special samples of corpus data requires an ability to formulate complex queries in a special language, designed for searching data in the corpus.
first_indexed	2024-03-12T12:29:53Z
format	Article
id	doaj.art-a49f93cb0cf7419c80156080629bb8f3
institution	Directory Open Access Journal
issn	2541-7738 2500-2171
language	Russian
last_indexed	2024-03-12T12:29:53Z
publishDate	2016-10-01
publisher	Kazan Federal University
record_format	Article
series	Učënye Zapiski Kazanskogo Universiteta: Seriâ Gumanitarnye Nauki
spelling	doaj.art-a49f93cb0cf7419c80156080629bb8f32023-08-29T13:19:28ZrusKazan Federal UniversityUčënye Zapiski Kazanskogo Universiteta: Seriâ Gumanitarnye Nauki2541-77382500-21712016-10-01158513151324Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus DataA.M. Galieva0О.А. Nevzorova1a Research Institute of Applied Semiotics, Tatarstan Academy of Sciences, Kazan, 420111 RussiaResearch Institute of Applied Semiotics, Tatarstan Academy of Sciences, Kazan, 420111 Russia; Kazan Federal University, Kazan, 420008 Russia The advantages of using linguistic corpus data in education and research are obvious and well covered in specialized literature. This tool considerably simplifies acquisition of linguistic data and their processing. Two main corpora have been built for the Tatar language by now, each in open access: the Corpus of Written Tatar compiled in Kazan Federal University, (http://search.corpus.tatar/en) and the Tatar National Corpus (http://corpus.antat.ru/?lang=en) developed by researchers of the Institute of Applied Semiotics, Tatarstan Academy of Sciences, Russia. These corpora are being hourly replenished; the update of textual collections is mainly carried out through the use of media texts, which provides constant flow of fresh linguistic material. The Tatar language has complicated syntax and intricate agglutinative morphology, and corpus data is a reliable tool for enriching and deepening linguistic descriptions of Tatar. This paper is the first attempt to describe examples of complex queries to the search system of “Tugam Tel” Tatar National Corpus, these queries are aimed at studying complicated phenomena of Tatar linguistic semantics. The authors proceed from the viewpoint that correctly formulated queries to the Corpus provide data allowing to draw conclusions about theoretically relevant laws of the language system. The inventory of grammatical categories of the Tatar language and affixes that express these categories have been considered as a key to language semantics. The authors, by means of particular examples, have shown that search functionality of the Tatar National Corpus enables to extract data meeting certain semantic criteria, from semantically unstructured corpus data. Construction of special samples of corpus data requires an ability to formulate complex queries in a special language, designed for searching data in the corpus.https://kpfu.ru/portal/docs/F385642576/158_5_gum_10.pdfcorpustatar languagesearch querygrammarsemantics
spellingShingle	A.M. Galieva О.А. Nevzorova Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data Učënye Zapiski Kazanskogo Universiteta: Seriâ Gumanitarnye Nauki corpus tatar language search query grammar semantics
title	Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data
title_full	Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data
title_fullStr	Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data
title_full_unstemmed	Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data
title_short	Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data
title_sort	study of linguistic semantics by means of formalisation of queries to corpus data
topic	corpus tatar language search query grammar semantics
url	https://kpfu.ru/portal/docs/F385642576/158_5_gum_10.pdf
work_keys_str_mv	AT amgalieva studyoflinguisticsemanticsbymeansofformalisationofqueriestocorpusdata AT oanevzorova studyoflinguisticsemanticsbymeansofformalisationofqueriestocorpusdata

Study of Linguistic Semantics by Means of Formalisation of Queries to Corpus Data

Similar Items