Learning high-dimensional Markov forest distributions: Analysis of error rates

The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error proba...

Full description

Bibliographic Details
Main Authors:	Tan, Vincent Yan Fu, Anandkumar, Animashree, Willsky, Alan S.
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	MIT Press 2011
Online Access:	http://hdl.handle.net/1721.1/66514 https://orcid.org/0000-0003-0149-5888

_version_	1826197336599035904
author	Tan, Vincent Yan Fu Anandkumar, Animashree Willsky, Alan S.
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Tan, Vincent Yan Fu Anandkumar, Animashree Willsky, Alan S.
author_sort	Tan, Vincent Yan Fu
collection	MIT
description	The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp., tree) model is the hardest (resp., easiest) to learn using the proposed algorithm in terms of error rates for structure learning.
first_indexed	2024-09-23T10:46:28Z
format	Article
id	mit-1721.1/66514
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:46:28Z
publishDate	2011
publisher	MIT Press
record_format	dspace
spelling	mit-1721.1/665142022-09-30T22:53:15Z Learning high-dimensional Markov forest distributions: Analysis of error rates Tan, Vincent Yan Fu Anandkumar, Animashree Willsky, Alan S. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Laboratory for Information and Decision Systems Willsky, Alan S. Tan, Vincent Yan Fu Willsky, Alan S. The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp., tree) model is the hardest (resp., easiest) to learn using the proposed algorithm in terms of error rates for structure learning. 2011-10-20T15:01:40Z 2011-10-20T15:01:40Z 2011-05 2011-02 Article http://purl.org/eprint/type/JournalArticle 1532-4435 1533-7928 http://hdl.handle.net/1721.1/66514 Tan, Vincent Y.F., Animashree Anandkumar and Alan S. Willsky. "Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates." Journal of Machine Learning Research, 12 (2011) 1617-1653. https://orcid.org/0000-0003-0149-5888 en_US http://jmlr.csail.mit.edu/papers/volume12/tan11a/tan11a.pdf Journal of Machine Learning Research Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf MIT Press MIT Press
spellingShingle	Tan, Vincent Yan Fu Anandkumar, Animashree Willsky, Alan S. Learning high-dimensional Markov forest distributions: Analysis of error rates
title	Learning high-dimensional Markov forest distributions: Analysis of error rates
title_full	Learning high-dimensional Markov forest distributions: Analysis of error rates
title_fullStr	Learning high-dimensional Markov forest distributions: Analysis of error rates
title_full_unstemmed	Learning high-dimensional Markov forest distributions: Analysis of error rates
title_short	Learning high-dimensional Markov forest distributions: Analysis of error rates
title_sort	learning high dimensional markov forest distributions analysis of error rates
url	http://hdl.handle.net/1721.1/66514 https://orcid.org/0000-0003-0149-5888
work_keys_str_mv	AT tanvincentyanfu learninghighdimensionalmarkovforestdistributionsanalysisoferrorrates AT anandkumaranimashree learninghighdimensionalmarkovforestdistributionsanalysisoferrorrates AT willskyalans learninghighdimensionalmarkovforestdistributionsanalysisoferrorrates

Learning high-dimensional Markov forest distributions: Analysis of error rates

Similar Items