Hierarchical ensemble learning method in diversified dataset analysis

The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how we...

Full description

Bibliographic Details
Main Authors: Liu, Zeyuan, Li, Xinlong
Other Authors: Nanyang Business School
Format: Journal Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161502
_version_ 1826109620860485632
author Liu, Zeyuan
Li, Xinlong
author2 Nanyang Business School
author_facet Nanyang Business School
Liu, Zeyuan
Li, Xinlong
author_sort Liu, Zeyuan
collection NTU
description The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly.
first_indexed 2024-10-01T02:20:54Z
format Journal Article
id ntu-10356/161502
institution Nanyang Technological University
language English
last_indexed 2024-10-01T02:20:54Z
publishDate 2022
record_format dspace
spelling ntu-10356/1615022023-05-19T07:31:19Z Hierarchical ensemble learning method in diversified dataset analysis Liu, Zeyuan Li, Xinlong Nanyang Business School Business::Information technology Categorical Variables Classification Accuracy The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly. Published version 2022-09-06T04:24:27Z 2022-09-06T04:24:27Z 2021 Journal Article Liu, Z. & Li, X. (2021). Hierarchical ensemble learning method in diversified dataset analysis. Journal of Physics: Conference Series, 2078(1), 012027-. https://dx.doi.org/10.1088/1742-6596/2078/1/012027 1742-6588 https://hdl.handle.net/10356/161502 10.1088/1742-6596/2078/1/012027 2-s2.0-85120488752 1 2078 012027 en Journal of Physics: Conference Series © 2021 The Authors. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd. application/pdf
spellingShingle Business::Information technology
Categorical Variables
Classification Accuracy
Liu, Zeyuan
Li, Xinlong
Hierarchical ensemble learning method in diversified dataset analysis
title Hierarchical ensemble learning method in diversified dataset analysis
title_full Hierarchical ensemble learning method in diversified dataset analysis
title_fullStr Hierarchical ensemble learning method in diversified dataset analysis
title_full_unstemmed Hierarchical ensemble learning method in diversified dataset analysis
title_short Hierarchical ensemble learning method in diversified dataset analysis
title_sort hierarchical ensemble learning method in diversified dataset analysis
topic Business::Information technology
Categorical Variables
Classification Accuracy
url https://hdl.handle.net/10356/161502
work_keys_str_mv AT liuzeyuan hierarchicalensemblelearningmethodindiversifieddatasetanalysis
AT lixinlong hierarchicalensemblelearningmethodindiversifieddatasetanalysis