ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА

Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following specia...

Full description

Bibliographic Details
Main Authors: Zheltov, P.V., Zheltov, V.P., Gubanov, A.R.
Format: Article
Language:English
Published: Marina Sokolova Publishings 2016-09-01
Series:Russian Linguistic Bulletin
Subjects:
Online Access:http://rulb.org/wp-content/uploads/wpem/pdf_compilations/3(7)/3(7).pdf#page=62
Description
Summary:Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following special data structures in the form of a set of classes described in the obtained as a result of operation of search engine components. Text tokenization component converts the text into a set of tokens. To define the rules of tokenization the configuration.
ISSN:2313-0288
2411-2968