Application of Natural Language Processing to Unstructured Data: A Case Study of Climate Change

With the ascension of the Information Age and the widespread use of the Internet, a plethora of knowledge exists apropos of numerous areas of interest. The resurgence of big data and machine learning has brought a high hope that designers can learn from past successes and failures. However, when the...

Full description

Bibliographic Details
Main Author: Ceylan, Ceylan
Other Authors: Kim, Sang-Gook
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/144647
Description
Summary:With the ascension of the Information Age and the widespread use of the Internet, a plethora of knowledge exists apropos of numerous areas of interest. The resurgence of big data and machine learning has brought a high hope that designers can learn from past successes and failures. However, when the available data is in a mixture of textual, numerical or graphical form, then the currently popular deep learning tools cannot be applied directly. The question today is about the ability to represent this heterogeneous form of data and to find the relevant information from a huge depository of data in an efficient manner. My study of data preparation is a part of a big group effort in applying Artificial Intelligence based Natural Language Processing models to large corpora of technical design documentation such as climate change reports, which then enable the retrieval of accurate information via semantic search capabilities. The methodology was able to successfully retrieve suitable answers to the user’s questions without reading hundreds of pages of reports. Additionally, the query process was able to bring up Figures and Tables that provided meaningful context to the answers via associate data-linking during the data reading and embedding process.