A meta-feature selection method based on the Auto-sklearn framework

In recent years, the task of selecting and tuning machine learning algorithms has been increasingly solved using automated frameworks. This is motivated by the fact that when dealing with large amounts of data, classical methods are not efficient in terms of time and quality. This paper discusses th...

Full description

Bibliographic Details
Main Authors:	Nikita I. Kulin, Sergey B. Muravyov
Format:	Article
Language:	English
Published:	Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University) 2021-10-01
Series:	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
Subjects:	automl automated machine learning machine learning meta-learning classification
Online Access:	https://ntv.ifmo.ru/file/article/20747.pdf

_version_	1818400408228331520
author	Nikita I. Kulin Sergey B. Muravyov
author_facet	Nikita I. Kulin Sergey B. Muravyov
author_sort	Nikita I. Kulin
collection	DOAJ
description	In recent years, the task of selecting and tuning machine learning algorithms has been increasingly solved using automated frameworks. This is motivated by the fact that when dealing with large amounts of data, classical methods are not efficient in terms of time and quality. This paper discusses the Auto-sklearn framework as one of the best solutions for automated selection and tuning machine learning algorithms. The problem of Auto-sklearn 1.0 solution based on Bayesian optimization and meta-learning is investigated. A solution to this problem is presented. A new method of operation based on meta-database optimization is proposed. The essence of the method is to use the BIRCH clustering algorithm to separate datasets into different groups. The selection criteria are the silhouette measure and the minimum number of initial Bayesian optimization configurations. The next step uses a random forest model, which is trained on a set of meta-features and the resulting labels. Important meta-features are selected from the entire set. As a result, an optimal set of important meta-features is obtained, which is used to find the initial Bayesian optimization configurations. The described method significantly speeds up the search for the best machine learning algorithm for classification tasks. The experiments were conducted with datasets from OpenML to compare Auto-sklearn 1.0, 2.0 and a new version that uses the proposed method. According to the results of the experiment and statistical Wilcoxon T-criterion tests, the new method was able to outperform the original versions in terms of time, outperforms Auto-sklearn 1.0 and competes with Auto-sklearn 2.0. The proposed method will help to speed up the time to find the best solution for machine learning tasks. Optimization of such frameworks is reasonable in terms of saving time and other resources, especially when working with large amounts of data.
first_indexed	2024-12-14T07:36:06Z
format	Article
id	doaj.art-499f782081c24fd19c19196a4d113fd7
institution	Directory Open Access Journal
issn	2226-1494 2500-0373
language	English
last_indexed	2024-12-14T07:36:06Z
publishDate	2021-10-01
publisher	Saint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)
record_format	Article
series	Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki
spelling	doaj.art-499f782081c24fd19c19196a4d113fd72022-12-21T23:11:12ZengSaint Petersburg National Research University of Information Technologies, Mechanics and Optics (ITMO University)Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki2226-14942500-03732021-10-0121570270810.17586/2226-1494-2021-21-5-702-708A meta-feature selection method based on the Auto-sklearn frameworkNikita I. Kulin0https://orcid.org/0000-0002-3952-6080Sergey B. Muravyov1https://orcid.org/0000-0002-4251-1744Student, ITMO University, Saint Petersburg, 197101, Russian FederationPhD, Assistant, ITMO University, Saint Petersburg, 197101, Russian FederationIn recent years, the task of selecting and tuning machine learning algorithms has been increasingly solved using automated frameworks. This is motivated by the fact that when dealing with large amounts of data, classical methods are not efficient in terms of time and quality. This paper discusses the Auto-sklearn framework as one of the best solutions for automated selection and tuning machine learning algorithms. The problem of Auto-sklearn 1.0 solution based on Bayesian optimization and meta-learning is investigated. A solution to this problem is presented. A new method of operation based on meta-database optimization is proposed. The essence of the method is to use the BIRCH clustering algorithm to separate datasets into different groups. The selection criteria are the silhouette measure and the minimum number of initial Bayesian optimization configurations. The next step uses a random forest model, which is trained on a set of meta-features and the resulting labels. Important meta-features are selected from the entire set. As a result, an optimal set of important meta-features is obtained, which is used to find the initial Bayesian optimization configurations. The described method significantly speeds up the search for the best machine learning algorithm for classification tasks. The experiments were conducted with datasets from OpenML to compare Auto-sklearn 1.0, 2.0 and a new version that uses the proposed method. According to the results of the experiment and statistical Wilcoxon T-criterion tests, the new method was able to outperform the original versions in terms of time, outperforms Auto-sklearn 1.0 and competes with Auto-sklearn 2.0. The proposed method will help to speed up the time to find the best solution for machine learning tasks. Optimization of such frameworks is reasonable in terms of saving time and other resources, especially when working with large amounts of data.https://ntv.ifmo.ru/file/article/20747.pdfautomlautomated machine learningmachine learningmeta-learningclassification
spellingShingle	Nikita I. Kulin Sergey B. Muravyov A meta-feature selection method based on the Auto-sklearn framework Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki automl automated machine learning machine learning meta-learning classification
title	A meta-feature selection method based on the Auto-sklearn framework
title_full	A meta-feature selection method based on the Auto-sklearn framework
title_fullStr	A meta-feature selection method based on the Auto-sklearn framework
title_full_unstemmed	A meta-feature selection method based on the Auto-sklearn framework
title_short	A meta-feature selection method based on the Auto-sklearn framework
title_sort	meta feature selection method based on the auto sklearn framework
topic	automl automated machine learning machine learning meta-learning classification
url	https://ntv.ifmo.ru/file/article/20747.pdf
work_keys_str_mv	AT nikitaikulin ametafeatureselectionmethodbasedontheautosklearnframework AT sergeybmuravyov ametafeatureselectionmethodbasedontheautosklearnframework AT nikitaikulin metafeatureselectionmethodbasedontheautosklearnframework AT sergeybmuravyov metafeatureselectionmethodbasedontheautosklearnframework

A meta-feature selection method based on the Auto-sklearn framework

Similar Items