Dirichlet Bayesian network scores and the maximum relative entropy principle

A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirich...

Full description

Bibliographic Details
Main Author:	Scutari, M
Format:	Journal article
Published:	Springer 2018

_version_	1826299681804648448
author	Scutari, M
author_facet	Scutari, M
author_sort	Scutari, M
collection	OXFORD
description	A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.
first_indexed	2024-03-07T05:05:39Z
format	Journal article
id	oxford-uuid:d9d19201-dc19-488e-b2d2-d99dfe7e17c0
institution	University of Oxford
last_indexed	2024-03-07T05:05:39Z
publishDate	2018
publisher	Springer
record_format	dspace
spelling	oxford-uuid:d9d19201-dc19-488e-b2d2-d99dfe7e17c02022-03-27T08:58:42ZDirichlet Bayesian network scores and the maximum relative entropy principleJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d9d19201-dc19-488e-b2d2-d99dfe7e17c0Symplectic Elements at OxfordSpringer2018Scutari, MA classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.
spellingShingle	Scutari, M Dirichlet Bayesian network scores and the maximum relative entropy principle
title	Dirichlet Bayesian network scores and the maximum relative entropy principle
title_full	Dirichlet Bayesian network scores and the maximum relative entropy principle
title_fullStr	Dirichlet Bayesian network scores and the maximum relative entropy principle
title_full_unstemmed	Dirichlet Bayesian network scores and the maximum relative entropy principle
title_short	Dirichlet Bayesian network scores and the maximum relative entropy principle
title_sort	dirichlet bayesian network scores and the maximum relative entropy principle
work_keys_str_mv	AT scutarim dirichletbayesiannetworkscoresandthemaximumrelativeentropyprinciple

Dirichlet Bayesian network scores and the maximum relative entropy principle

Similar Items