Incorporating contexts to open information extraction

Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit...

Full description

Bibliographic Details
Main Author: Dong, Kuicai
Other Authors: Sun Aixin
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174529
_version_ 1811686901923446784
author Dong, Kuicai
author2 Sun Aixin
author_facet Sun Aixin
Dong, Kuicai
author_sort Dong, Kuicai
collection NTU
description Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE.
first_indexed 2024-10-01T05:07:47Z
format Thesis-Doctor of Philosophy
id ntu-10356/174529
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:07:47Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1745292024-05-03T02:58:52Z Incorporating contexts to open information extraction Dong, Kuicai Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Open information extraction Natural language processing Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE. Doctor of Philosophy 2024-04-01T06:00:06Z 2024-04-01T06:00:06Z 2024 Thesis-Doctor of Philosophy Dong, K. (2024). Incorporating contexts to open information extraction. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174529 https://hdl.handle.net/10356/174529 10.32657/10356/174529 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Open information extraction
Natural language processing
Dong, Kuicai
Incorporating contexts to open information extraction
title Incorporating contexts to open information extraction
title_full Incorporating contexts to open information extraction
title_fullStr Incorporating contexts to open information extraction
title_full_unstemmed Incorporating contexts to open information extraction
title_short Incorporating contexts to open information extraction
title_sort incorporating contexts to open information extraction
topic Computer and Information Science
Open information extraction
Natural language processing
url https://hdl.handle.net/10356/174529
work_keys_str_mv AT dongkuicai incorporatingcontextstoopeninformationextraction