Dirichlet Bayesian network scores and the maximum relative entropy principle
A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirich...
Main Author: | |
---|---|
Format: | Journal article |
Published: |
Springer
2018
|
_version_ | 1826299681804648448 |
---|---|
author | Scutari, M |
author_facet | Scutari, M |
author_sort | Scutari, M |
collection | OXFORD |
description | A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior. |
first_indexed | 2024-03-07T05:05:39Z |
format | Journal article |
id | oxford-uuid:d9d19201-dc19-488e-b2d2-d99dfe7e17c0 |
institution | University of Oxford |
last_indexed | 2024-03-07T05:05:39Z |
publishDate | 2018 |
publisher | Springer |
record_format | dspace |
spelling | oxford-uuid:d9d19201-dc19-488e-b2d2-d99dfe7e17c02022-03-27T08:58:42ZDirichlet Bayesian network scores and the maximum relative entropy principleJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d9d19201-dc19-488e-b2d2-d99dfe7e17c0Symplectic Elements at OxfordSpringer2018Scutari, MA classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior. |
spellingShingle | Scutari, M Dirichlet Bayesian network scores and the maximum relative entropy principle |
title | Dirichlet Bayesian network scores and the maximum relative entropy principle |
title_full | Dirichlet Bayesian network scores and the maximum relative entropy principle |
title_fullStr | Dirichlet Bayesian network scores and the maximum relative entropy principle |
title_full_unstemmed | Dirichlet Bayesian network scores and the maximum relative entropy principle |
title_short | Dirichlet Bayesian network scores and the maximum relative entropy principle |
title_sort | dirichlet bayesian network scores and the maximum relative entropy principle |
work_keys_str_mv | AT scutarim dirichletbayesiannetworkscoresandthemaximumrelativeentropyprinciple |