APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA

Objectives. The aim of the study is to search for effective methods and approaches to the processing of heterogeneous data streams and the management of problems of infinite length, conceptual evolution and conceptual drift. A heterogeneous data stream can have infinite length and contain structured...

Full description

Bibliographic Details
Main Author: R. A. Bagutdinov
Format: Article
Language:Russian
Published: Dagestan State Technical University 2019-05-01
Series:Вестник Дагестанского государственного технического университета: Технические науки
Subjects:
Online Access:https://vestnik.dgtu.ru/jour/article/view/593
_version_ 1826560646695616512
author R. A. Bagutdinov
author_facet R. A. Bagutdinov
author_sort R. A. Bagutdinov
collection DOAJ
description Objectives. The aim of the study is to search for effective methods and approaches to the processing of heterogeneous data streams and the management of problems of infinite length, conceptual evolution and conceptual drift. A heterogeneous data stream can have infinite length and contain structured or unstructured data. Processing a heterogeneous and multi-scale data flow is a major challenge for researchers. Most of the research focuses on solving problems of infinite length and concept-drift.Method. New class detection strategies are classified as parametric and non-parametric. This work is based on a non-parametric approach. The classifier works on the ensemble of three models. The separation generates a different number of classes in each fragment. Classes are calculated by applying the K-Medoid clustering method on each fragment. The effectiveness of the K-media clustering method is more suitable for a data set containing anomalies.Result. The developed algorithm is capable of processing heterogeneous and multi-scale data. Each instance that is present in the model belongs to only one class. Experimental work was performed on four samples of stream data of 2000 lines each. After performing the pre-processing, the multi-valued characteristics of the data were found in the data set.Conclusion. This paper presents an effective approach for processing heterogeneous data streams and managing tasks of infinite length, conceptual evolution and conceptual drift. The developed approach is based on the string matching parameter instead of the distance for processing the four tasks of data streams. The level of false positives in the developed algorithm is rather low and can be considered insignificant. The approach does not classify a new instance of the class as an existing class, but can effectively handle the functional evolution.
first_indexed 2024-03-12T03:05:35Z
format Article
id doaj.art-e3f4b2596b2441178e54b8cc9ccb6453
institution Directory Open Access Journal
issn 2073-6185
2542-095X
language Russian
last_indexed 2025-03-14T09:19:36Z
publishDate 2019-05-01
publisher Dagestan State Technical University
record_format Article
series Вестник Дагестанского государственного технического университета: Технические науки
spelling doaj.art-e3f4b2596b2441178e54b8cc9ccb64532025-03-02T11:46:05ZrusDagestan State Technical UniversityВестник Дагестанского государственного технического университета: Технические науки2073-61852542-095X2019-05-01453859310.21822/2073-6185-2018-45-3-85-93451APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATAR. A. Bagutdinov0Tomsk Polytechnic University.Objectives. The aim of the study is to search for effective methods and approaches to the processing of heterogeneous data streams and the management of problems of infinite length, conceptual evolution and conceptual drift. A heterogeneous data stream can have infinite length and contain structured or unstructured data. Processing a heterogeneous and multi-scale data flow is a major challenge for researchers. Most of the research focuses on solving problems of infinite length and concept-drift.Method. New class detection strategies are classified as parametric and non-parametric. This work is based on a non-parametric approach. The classifier works on the ensemble of three models. The separation generates a different number of classes in each fragment. Classes are calculated by applying the K-Medoid clustering method on each fragment. The effectiveness of the K-media clustering method is more suitable for a data set containing anomalies.Result. The developed algorithm is capable of processing heterogeneous and multi-scale data. Each instance that is present in the model belongs to only one class. Experimental work was performed on four samples of stream data of 2000 lines each. After performing the pre-processing, the multi-valued characteristics of the data were found in the data set.Conclusion. This paper presents an effective approach for processing heterogeneous data streams and managing tasks of infinite length, conceptual evolution and conceptual drift. The developed approach is based on the string matching parameter instead of the distance for processing the four tasks of data streams. The level of false positives in the developed algorithm is rather low and can be considered insignificant. The approach does not classify a new instance of the class as an existing class, but can effectively handle the functional evolution.https://vestnik.dgtu.ru/jour/article/view/593data flowdata miningheterogeneous datamultiscale datadata processing
spellingShingle R. A. Bagutdinov
APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA
Вестник Дагестанского государственного технического университета: Технические науки
data flow
data mining
heterogeneous data
multiscale data
data processing
title APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA
title_full APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA
title_fullStr APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA
title_full_unstemmed APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA
title_short APPROACH OF PROCESSING, CLASSIFICATION AND DETECTION OF NEW CLASSES AND ANOMALIES IN HETEROGENIOUS AND DIFFERENT STREAMS OF DATA
title_sort approach of processing classification and detection of new classes and anomalies in heterogenious and different streams of data
topic data flow
data mining
heterogeneous data
multiscale data
data processing
url https://vestnik.dgtu.ru/jour/article/view/593
work_keys_str_mv AT rabagutdinov approachofprocessingclassificationanddetectionofnewclassesandanomaliesinheterogeniousanddifferentstreamsofdata