Hierarchical ensemble learning method in diversified dataset analysis

The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how we...

Full description

Bibliographic Details
Main Authors: Liu, Zeyuan, Li, Xinlong
Other Authors: Nanyang Business School
Format: Journal Article
Language:English
Published: 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161502
Description
Summary:The remarkable advances in ensemble machine learning methods have led to a significant analysis in large data, such as random forest algorithms. However, the algorithms only use the current features during the process of learning, which caused the initial upper accuracy's limit no matter how well the algorithms are. Moreover, the low classification accuracy happened especially when one type of observation's proportion is much lower than the other types in training datasets. The aim of the present study is to design a hierarchical classifier which try to extract new features by ensemble machine learning regressors and statistical methods inside the whole machine learning process. In stage 1, all the categorical variables will be characterized by random forest algorithm to create a new variable through regression analysis while the numerical variables left will serve as the sample of factor analysis (FA) process to calculate the factors value of each observation. Then, all the features will be learned by random forest classifier in stage 2. Diversified datasets consist of categorical and numerical variables will be used in the method. The experiment results show that the classification accuracy increased by 8.61%. Meanwhile, it also improves the classification accuracy of observations with low proportion in the training dataset significantly.