MEng Thesis: Incorporating Structured Commonsense into Language Models

Machine learning has a wide variety of applications in the field of natural language processing (NLP). One such application is fine-tuning large pre-trained models to a wide variety of tasks. In this work, we propose methods to enhance these large language models by infusing them with information fo...

Full description

Bibliographic Details
Main Author:	Yin, Claire
Other Authors:	Katz, Boris
Format:	Thesis
Published:	Massachusetts Institute of Technology 2022
Online Access:	https://hdl.handle.net/1721.1/145141

Description
Summary:	Machine learning has a wide variety of applications in the field of natural language processing (NLP). One such application is fine-tuning large pre-trained models to a wide variety of tasks. In this work, we propose methods to enhance these large language models by infusing them with information found in commonsense knowledge bases. Commonsense is basic knowledge about the world that humans are expected to have and is needed to achieve efficient communication. Often times, to understand texts, a person must use their commonsense to make implicit inferences based on what is explicitly presented in text. We harness the power of relational graph convolutional networks (RGCNs) to encode meaningful commonsense information from graphs and introduce 3 simple methods to inject this knowledge to improve contextual language representations from transformer-based language models. We show that the representations learned from the RGCN are useful in the task of link prediction in a commonsense knowledge base. Additionally, we show that the methods that we introduce to combine the representations of structured commonsense information with a transformer-based language model shows promising results in a downstream information retrieval task and in most types of combinations gives better performance than a baseline transformer-based language model. Lastly, we show that the representations learned from a RGCN, although trained on considerably less data, still prove useful in a downstream information retrieval task when combined with a transformer-based language model.

MEng Thesis: Incorporating Structured Commonsense into Language Models

Similar Items