Summary: | <p>Learning distributed representations of natural language has become a common practice for Natural Language Processing (NLP). Non-contextual embeddings map each token in the vocabulary to a low-dimensional real-valued vector. Although these representations perform competitively on word-level tasks, e.g. measuring word similarities, they are context-independent and fail to distinguish the semantics of words in different contexts. This gives rise to contextual embeddings, where each token is associated with a representation that is a function of the entire input sequence. Contextual embeddings are shown to better capture complex characteristics of word use and model polysemous words.</p>
<p>In this thesis, we follow up the line of research on contextual embeddings and explore better methods of conditioning on context for natural language processing. Specifically, we study incorporating contextual information for Natural Language Generation (NLG). We explore three different sources of context including local context of each input sequence (Chapter 3), global context from training data (Chapter 4) and external context from the Web (Chapter 5). In Chapter 3, we utilize the local context of each input sequence to better model word polysemy. We introduce a technique called translation-counterfactual word replacement, which leverages both local context and alignment for augmenting machine translation data. In Chapter 4, we retrieve relevant global context about a particular entity from training data. A memory-augmented approach is presented to condition an autoregressive language model on global context for more coherent and logical generation. In Chapter 5, we pre-train a noisy channel model with external context (e.g. Reddit posts from the Web) for task-oriented dialogue and show that the noisy channel model decodes better responses compared to direct decoding. Finally, Chapter 6 summarizes our findings and directions for future work.</p>
|