Summary: | <p>Bayesian modelling is a natural fit for tasks in computational linguistics, since it can provide interpretable structures, useful prior controls, and coherent management of uncertainty. However, exact Bayesian inference is intractable for many models of practical interest. Developing both accurate and efficient approximate Bayesian inference algorithms remains a fundamental challenge, especially for the field of computational linguistics where datasets are large and growing and models consist of complex latent structures.</p> <p>Collapsed variational inference (CVI) is an important milestone that combines the efficiency of variational inference (VI) and the accuracy of Markov chain Monte Carlo (MCMC) (Teh et al., 2006). However, its previous applications were limited to bag-of-words models whose hidden variables are conditionally independent given the parameters, whereas in computational linguistics, the hidden variable dependencies are crucial for modelling the underlying syntactic and semantic relations. To enlarge the application domain of CVI as well as to address the above Bayesian inference challenge, we investigate the applications of collapsed variational inference to computational linguistics.</p> <p>In this thesis, our contributions are three-fold. First, we solve a number of inference challenges arising from the hidden variable dependencies and derive a set of new CVI algorithms for the two ubiquitous and foundational models in computational linguistics, namely hidden Markov models (HMMs) and probabilistic context free grammars. We also propose CVI for hierarchical Dirichlet process (HDP) HMMs that are Bayesian nonparametric extensions of HMMs.</p> <p>Second, along the way we propose a set of novel algorithmic techniques, which are generally applicable to a wide variety of probabilistic graphical models in the conjugate exponential family and computational linguistic models using non-conjugate HDP constructions. Therefore, our work represents one step in bridging the gap between increasingly richer Bayesian models in computational linguistics and recent advances in approximate Bayesian inference.</p> <p>Third, we empirically evaluate our proposed CVI algorithms and their stochastic versions in a range of computational linguistic tasks, such as part-of-speech induction, grammar induction and many others. Experimental results consistently demonstrate that, using our techniques for handling the hidden variable dependencies, the empirical advantages of both VI and MCMC can be combined in a much larger domain of CVI applications.</p>
|