ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА
Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following specia...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Marina Sokolova Publishings
2016-09-01
|
Series: | Russian Linguistic Bulletin |
Subjects: | |
Online Access: | http://rulb.org/wp-content/uploads/wpem/pdf_compilations/3(7)/3(7).pdf#page=62 |
_version_ | 1818670937681166336 |
---|---|
author | Zheltov, P.V. Zheltov, V.P. Gubanov, A.R. |
author_facet | Zheltov, P.V. Zheltov, V.P. Gubanov, A.R. |
author_sort | Zheltov, P.V. |
collection | DOAJ |
description | Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following special data structures in the form of a set of classes described in the obtained as a result of operation of search engine components. Text tokenization component converts the text into a set of tokens. To define the rules of tokenization the configuration. |
first_indexed | 2024-12-17T07:16:03Z |
format | Article |
id | doaj.art-525f64d779834d8282c62ddf7c1e25b0 |
institution | Directory Open Access Journal |
issn | 2313-0288 2411-2968 |
language | English |
last_indexed | 2024-12-17T07:16:03Z |
publishDate | 2016-09-01 |
publisher | Marina Sokolova Publishings |
record_format | Article |
series | Russian Linguistic Bulletin |
spelling | doaj.art-525f64d779834d8282c62ddf7c1e25b02022-12-21T21:58:54ZengMarina Sokolova PublishingsRussian Linguistic Bulletin2313-02882411-29682016-09-0120163 (7)616310.18454/RULB.7.367.36ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКАZheltov, P.V.0Zheltov, V.P.1Gubanov, A.R.2Chuvash State University named after I.N. UlyanovChuvash State University named after I.N. UlyanovChuvash State University named after I.N. UlyanovText analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following special data structures in the form of a set of classes described in the obtained as a result of operation of search engine components. Text tokenization component converts the text into a set of tokens. To define the rules of tokenization the configuration.http://rulb.org/wp-content/uploads/wpem/pdf_compilations/3(7)/3(7).pdf#page=62indexingquerytext markuptext corporasearch engineиндексированиезапросразметка текстатекстовый корпуспоисковик |
spellingShingle | Zheltov, P.V. Zheltov, V.P. Gubanov, A.R. ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА Russian Linguistic Bulletin indexing query text markup text corpora search engine индексирование запрос разметка текста текстовый корпус поисковик |
title | ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА |
title_full | ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА |
title_fullStr | ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА |
title_full_unstemmed | ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА |
title_short | ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА |
title_sort | подсистема анализа текстов в поисковике для национального корпуса чувашского языка |
topic | indexing query text markup text corpora search engine индексирование запрос разметка текста текстовый корпус поисковик |
url | http://rulb.org/wp-content/uploads/wpem/pdf_compilations/3(7)/3(7).pdf#page=62 |
work_keys_str_mv | AT zheltovpv podsistemaanalizatekstovvpoiskovikedlânacionalʹnogokorpusačuvašskogoâzyka AT zheltovvp podsistemaanalizatekstovvpoiskovikedlânacionalʹnogokorpusačuvašskogoâzyka AT gubanovar podsistemaanalizatekstovvpoiskovikedlânacionalʹnogokorpusačuvašskogoâzyka |