Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization

There are a lot of redundant and irrelevant features in high-dimensional data,which seriously affect the efficiency and quality of data mining and the generalization performance of machine learning.Therefore,feature selection has become an important research direction in the computer field.In this p...

Full description

Bibliographic Details
Main Author: YANG Lei, JIANG Ai-lian, QIANG Yan
Format: Article
Language:zho
Published: Editorial office of Computer Science 2021-08-01
Series:Jisuanji kexue
Subjects:
Online Access:http://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2021-8-53.pdf
Description
Summary:There are a lot of redundant and irrelevant features in high-dimensional data,which seriously affect the efficiency and quality of data mining and the generalization performance of machine learning.Therefore,feature selection has become an important research direction in the computer field.In this paper,an unsupervised feature selection algorithm is proposed by using the non-linear learning ability of the autoencoder.First,based on the reconstruction error of the autoencoder,a single feature is selec-ted which is important for data reconstruction.Second,the feature weights finally select the feature subsets that contribute greatly to the reconstruction of other features.Manifold learning is introduced to capture the local and non-local structure of the original data space,and L2/1 sparse regularization is added to the feature weights to improve the sparsity of the feature weights so that they can select more distinctive features.Finally,a new objective function is constructed,and a gradient descent algorithm is used to optimize the proposed objective function.Experiments on six different types of typical data sets,and the proposed algorithm is compared with five commonly used unsupervised feature selection algorithms.Experiment results verify that the proposed algorithm can effectively select important features,significantly improve the classification accuracy rate and clustering accuracy rate.
ISSN:1002-137X