Summary: | This paper presents a new feature selection framework based on the -norm, in which data are summarized by their moments of the class conditional densities. However, discontinuity of the -norm makes it difficult to find the optimal solution. We apply a proper approximation of the -norm and a bound on the misclassification probability involving the mean and covariance of the dataset, to derive a robust difference of convex functions (DC) program formulation, while the DC optimization algorithm is used to solve the problem effectively. Furthermore, a kernelized version of this problem is also presented in this work. Experimental results on both real and synthetic datasets show that the proposed formulations can select fewer features than the traditional Minimax Probability Machine and the -norm state.
|