Summary: | Dimensionality reduction is an essential preprocessing step for data mining. Principal component analysis (PCA) is the most classical method of reducing dimension and a variety of methods based on it are extended. However, all these methods require at least one transposition and quadrature operation of the original high-dimensional matrix and the dimension reduction results loss the meaning of the original data, it will inevitably bring difficulties for people to the further analysis of classification or clustering results. We develop a novel algorithm named DRWPCA in this paper, it does not need to map the original data to the space of other dimensions for processing, but realizes the dimension reduction by analyzing the correlation between the dimensions, and therefore the physical meaning of the original data set is retained. It utilizes mathematical statistics to obtain the correlation coefficient or the degree of correlation between attributes. By statistical analysis of the degree of correlation between attributes, the feature with high correlation is removed so as to achieve the goal of reducing the dimension. DRWPCA is inspired by the content of the correlation coefficient part of the digital feature of a random variable, and the sliding window model for traffic control in network engineering. Experimental result demonstrates that the DRWPCA provides promising accuracy, higher ability to reduce dimension and preserves the original information of the data.
|