HOFS: Higher order mutual information approximation for feature selection in R

Feature selection is a process of choosing a subset of relevant features so that the quality of predictive models can be improved. An extensive body of work exists on information-theoretic feature selection, based on maximizing Mutual Information (MI) between subsets of features and class labels. Th...

Full description

Bibliographic Details
Main Authors:	Krzysztof Gajowniczek, Jialin Wu, Soumyajit Gupta, Chandrajit Bajaj
Format:	Article
Language:	English
Published:	Elsevier 2022-07-01
Series:	SoftwareX
Subjects:	Feature selection Machine learning Mutual information Higher order approximation
Online Access:	http://www.sciencedirect.com/science/article/pii/S2352711022000930

_version_	1828723696341090304
author	Krzysztof Gajowniczek Jialin Wu Soumyajit Gupta Chandrajit Bajaj
author_facet	Krzysztof Gajowniczek Jialin Wu Soumyajit Gupta Chandrajit Bajaj
author_sort	Krzysztof Gajowniczek
collection	DOAJ
description	Feature selection is a process of choosing a subset of relevant features so that the quality of predictive models can be improved. An extensive body of work exists on information-theoretic feature selection, based on maximizing Mutual Information (MI) between subsets of features and class labels. The current methods use a lower order approximation, by treating the joint entropy as a summation of several single variable entropies. This leads to locally optimal selections and misses correlated (multi-way) non-local feature combinations. In this article we present a higher order MI-based approximation technique called Higher Order Feature Selection (HOFS) implemented in R software. Instead of producing a single list of features, our method produces a ranked collection of feature subsets that maximizes MI, giving better comprehension (feature ranking) as to which features work best together when selected, due to their underlying interdependence. We demonstrate that the proposed method performs better than existing feature selection approaches while keeping similar running times and computational complexity.
first_indexed	2024-04-12T12:55:18Z
format	Article
id	doaj.art-dcbfbb9ec8834427a7e9e9a23d8e65ed
institution	Directory Open Access Journal
issn	2352-7110
language	English
last_indexed	2024-04-12T12:55:18Z
publishDate	2022-07-01
publisher	Elsevier
record_format	Article
series	SoftwareX
spelling	doaj.art-dcbfbb9ec8834427a7e9e9a23d8e65ed2022-12-22T03:32:21ZengElsevierSoftwareX2352-71102022-07-0119101148HOFS: Higher order mutual information approximation for feature selection in RKrzysztof Gajowniczek0Jialin Wu1Soumyajit Gupta2Chandrajit Bajaj3Department of Artificial Intelligence, Institute of Information Technology, Warsaw University of Life Sciences-SGGW, 02-776 Warsaw, Poland; Corresponding author.Department of Computer Science, Oden Institute of Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USADepartment of Computer Science, Oden Institute of Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USADepartment of Computer Science, Oden Institute of Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712, USAFeature selection is a process of choosing a subset of relevant features so that the quality of predictive models can be improved. An extensive body of work exists on information-theoretic feature selection, based on maximizing Mutual Information (MI) between subsets of features and class labels. The current methods use a lower order approximation, by treating the joint entropy as a summation of several single variable entropies. This leads to locally optimal selections and misses correlated (multi-way) non-local feature combinations. In this article we present a higher order MI-based approximation technique called Higher Order Feature Selection (HOFS) implemented in R software. Instead of producing a single list of features, our method produces a ranked collection of feature subsets that maximizes MI, giving better comprehension (feature ranking) as to which features work best together when selected, due to their underlying interdependence. We demonstrate that the proposed method performs better than existing feature selection approaches while keeping similar running times and computational complexity.http://www.sciencedirect.com/science/article/pii/S2352711022000930Feature selectionMachine learningMutual informationHigher order approximation
spellingShingle	Krzysztof Gajowniczek Jialin Wu Soumyajit Gupta Chandrajit Bajaj HOFS: Higher order mutual information approximation for feature selection in R SoftwareX Feature selection Machine learning Mutual information Higher order approximation
title	HOFS: Higher order mutual information approximation for feature selection in R
title_full	HOFS: Higher order mutual information approximation for feature selection in R
title_fullStr	HOFS: Higher order mutual information approximation for feature selection in R
title_full_unstemmed	HOFS: Higher order mutual information approximation for feature selection in R
title_short	HOFS: Higher order mutual information approximation for feature selection in R
title_sort	hofs higher order mutual information approximation for feature selection in r
topic	Feature selection Machine learning Mutual information Higher order approximation
url	http://www.sciencedirect.com/science/article/pii/S2352711022000930
work_keys_str_mv	AT krzysztofgajowniczek hofshigherordermutualinformationapproximationforfeatureselectioninr AT jialinwu hofshigherordermutualinformationapproximationforfeatureselectioninr AT soumyajitgupta hofshigherordermutualinformationapproximationforfeatureselectioninr AT chandrajitbajaj hofshigherordermutualinformationapproximationforfeatureselectioninr

HOFS: Higher order mutual information approximation for feature selection in R

Similar Items