Summary: | Due to the excellent classification performance and expressivity, the study of Bayesian network classifiers (BNCs) has attracted great attention ever since the success of Naive Bayes (NB). Information theory has established mathematical basis for the rapid development of BNC. In this paper we propose the definition of entropy function H<sub>B</sub>(D), which corresponds to the optimal number of bits encoded in the network structure B and can roughly measure the amount of information implicated in training data D. Each factor in H<sub>B</sub>(D) explicitly represents statements about causal relationships. An efficient heuristic search strategy is introduced to minimize H<sub>B</sub>(D) and explore the optimal topology of BNC. Our extensive experimental evaluation on 40 datasets reveals that this out-of-core algorithm achieves competitive classification performance compared to state-of-the-art learners such as tree augmented Naive Bayes, k-dependence Bayesian classifier, support vector machine, logistic regression and neural network.
|