ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА

Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following specia...

Full description

Bibliographic Details
Main Authors:	Zheltov, P.V., Zheltov, V.P., Gubanov, A.R.
Format:	Article
Language:	English
Published:	Marina Sokolova Publishings 2016-09-01
Series:	Russian Linguistic Bulletin
Subjects:	indexing query text markup text corpora search engine индексирование запрос разметка текста текстовый корпус поисковик
Online Access:	http://rulb.org/wp-content/uploads/wpem/pdf_compilations/3(7)/3(7).pdf#page=62

Description
Summary:	Text analysis subsystem in a search engine is discussed in this paper. At this stage, text analysis subsystem consists of the following features: components of text tokenization; component of separation of sentences in the text; components of morphological analysis of sentences. The following special data structures in the form of a set of classes described in the obtained as a result of operation of search engine components. Text tokenization component converts the text into a set of tokens. To define the rules of tokenization the configuration.
ISSN:	2313-0288 2411-2968

ПОДСИСТЕМА АНАЛИЗА ТЕКСТОВ В ПОИСКОВИКЕ ДЛЯ НАЦИОНАЛЬНОГО КОРПУСА ЧУВАШСКОГО ЯЗЫКА

Similar Items