Incorporating contexts to open information extraction

Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit...

Full description

Bibliographic Details
Main Author:	Dong, Kuicai
Other Authors:	Sun Aixin
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Open information extraction Natural language processing
Online Access:	https://hdl.handle.net/10356/174529

_version_	1826119907176087552
author	Dong, Kuicai
author2	Sun Aixin
author_facet	Sun Aixin Dong, Kuicai
author_sort	Dong, Kuicai
collection	NTU
description	Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE.
first_indexed	2024-10-01T05:07:47Z
format	Thesis-Doctor of Philosophy
id	ntu-10356/174529
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T05:07:47Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1745292024-05-03T02:58:52Z Incorporating contexts to open information extraction Dong, Kuicai Sun Aixin School of Computer Science and Engineering AXSun@ntu.edu.sg Computer and Information Science Open information extraction Natural language processing Open Information Extraction (OpenIE) is a critical NLP task that aims to extract structured relational tuples from unstructured open-domain text. The technique well suits many open-world natural language understanding scenarios, such as question answering, knowledge base/graph construction, explicit reasoning, and text summarization. Different from the closed Information Extraction (IE) tasks that have pre-defined ontology schema in predictable domains. OpenIE aims to extract succinct but meaningful entities/relations in open form. As a result, the format of relations and subject/objects of the extracted tuples are more flexible, making it challenging to evaluate. Meanwhile, the pattern learning for OpenIE is challenging, as there are insufficient gold-standard training data. Existing OpenIE models are trained in either unsupervised or distant-supervised way, so that the learnt patterns are inferior to gold-standard ones. In this thesis, we introduce several novel approaches to tackle the challenges in the pattern learning of OpenIE. The key theme of our approaches is to utilize various types of context to improve OpenIE. Firstly, we propose to improve OpenIE with document-level context. As a new task, we introduce DocOIE, the first expertannotated dataset for evaluating document-level OpenIE systems. In this setting, we present a neural OpenIE system named DocIE that can leverage document-level contexts for relational tuple extraction. Secondly, we study how to improve OpenIE with additional syntactic information as external context. We design a novel strategy to map phrase-level relations in constituency tree into word-level relations, and to enhance each word’s representation with constituency path information. We then propose SMiLe-OIE, the first neural OpenIE system that incorporates heterogeneous syntactic information through GCN encoders and multi-view learning. Thirdly, we study how to improve the efficiency and adaptability of OpenIE. Accordingly, we propose a novel notion of Sentence as Chunk sequence (SaC) as intermediate layer for OpenIE. Meanwhile, we propose Chunk-OIE, an end-to-end learning model that (i) represents a sentence as a SaC, and (ii) extracts tuples based on the SaC. Through data analysis against gold tuples, we show that chunks provide a suitable granularity of token spans for OpenIE. Finally, we propose and study a new research task to examine the reliability of OpenIE, by linking speculation detection and OpenIE. Formally, we propose to detect the tuple-level speculation, which aligns well with the goal of OpenIE to extract only factual information. Then, we propose SpecTup, a baseline model to detect tuple-level speculation. SpecTup leverages both semantic (BERT) and syntactic (Sub-Dependency-Graph) representations. All in all, despite the problems of OpenIE have been established and investigated, this thesis contributes several pivotal ideas/concepts that could further improve OpenIE. Additionally, the thesis sheds light on promising avenues for future research in OpenIE. Doctor of Philosophy 2024-04-01T06:00:06Z 2024-04-01T06:00:06Z 2024 Thesis-Doctor of Philosophy Dong, K. (2024). Incorporating contexts to open information extraction. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174529 https://hdl.handle.net/10356/174529 10.32657/10356/174529 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Open information extraction Natural language processing Dong, Kuicai Incorporating contexts to open information extraction
title	Incorporating contexts to open information extraction
title_full	Incorporating contexts to open information extraction
title_fullStr	Incorporating contexts to open information extraction
title_full_unstemmed	Incorporating contexts to open information extraction
title_short	Incorporating contexts to open information extraction
title_sort	incorporating contexts to open information extraction
topic	Computer and Information Science Open information extraction Natural language processing
url	https://hdl.handle.net/10356/174529
work_keys_str_mv	AT dongkuicai incorporatingcontextstoopeninformationextraction

Incorporating contexts to open information extraction

Similar Items