Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation

In this paper, we propose a fully automated system to extend knowledge graphs using external information from web-scale corpora. The designed system leverages a deep-learning-based technology for relation extraction that can be trained by a distantly supervised approach. In addition, the system uses...

Full description

Bibliographic Details
Main Authors: Sarthak Dash, Michael R. Glass, Alfio Gliozzo, Mustafa Canim, Gaetano Rossiello
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/12/8/316
_version_ 1797523506892636160
author Sarthak Dash
Michael R. Glass
Alfio Gliozzo
Mustafa Canim
Gaetano Rossiello
author_facet Sarthak Dash
Michael R. Glass
Alfio Gliozzo
Mustafa Canim
Gaetano Rossiello
author_sort Sarthak Dash
collection DOAJ
description In this paper, we propose a fully automated system to extend knowledge graphs using external information from web-scale corpora. The designed system leverages a deep-learning-based technology for relation extraction that can be trained by a distantly supervised approach. In addition, the system uses a deep learning approach for knowledge base completion by utilizing the global structure information of the induced KG to further refine the confidence of the newly discovered relations. The designed system does not require any effort for adaptation to new languages and domains as it does not use any hand-labeled data, NLP analytics, and inference rules. Our experiments, performed on a popular academic benchmark, demonstrate that the suggested system boosts the performance of relation extraction by a wide margin, reporting error reductions of 50%, resulting in relative improvement of up to 100%. Furthermore, a web-scale experiment conducted to extend DBPedia with knowledge from Common Crawl shows that our system is not only scalable but also does not require any adaptation cost, while yielding a substantial accuracy gain.
first_indexed 2024-03-10T08:43:58Z
format Article
id doaj.art-7bdd82f8c8b746c1a3965e3af319567e
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-10T08:43:58Z
publishDate 2021-08-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-7bdd82f8c8b746c1a3965e3af319567e2023-11-22T08:06:00ZengMDPI AGInformation2078-24892021-08-0112831610.3390/info12080316Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and ValidationSarthak Dash0Michael R. Glass1Alfio Gliozzo2Mustafa Canim3Gaetano Rossiello4IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USAIBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USAIn this paper, we propose a fully automated system to extend knowledge graphs using external information from web-scale corpora. The designed system leverages a deep-learning-based technology for relation extraction that can be trained by a distantly supervised approach. In addition, the system uses a deep learning approach for knowledge base completion by utilizing the global structure information of the induced KG to further refine the confidence of the newly discovered relations. The designed system does not require any effort for adaptation to new languages and domains as it does not use any hand-labeled data, NLP analytics, and inference rules. Our experiments, performed on a popular academic benchmark, demonstrate that the suggested system boosts the performance of relation extraction by a wide margin, reporting error reductions of 50%, resulting in relative improvement of up to 100%. Furthermore, a web-scale experiment conducted to extend DBPedia with knowledge from Common Crawl shows that our system is not only scalable but also does not require any adaptation cost, while yielding a substantial accuracy gain.https://www.mdpi.com/2078-2489/12/8/316information extractionknowledge graphsdeep learning
spellingShingle Sarthak Dash
Michael R. Glass
Alfio Gliozzo
Mustafa Canim
Gaetano Rossiello
Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
Information
information extraction
knowledge graphs
deep learning
title Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
title_full Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
title_fullStr Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
title_full_unstemmed Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
title_short Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
title_sort populating web scale knowledge graphs using distantly supervised relation extraction and validation
topic information extraction
knowledge graphs
deep learning
url https://www.mdpi.com/2078-2489/12/8/316
work_keys_str_mv AT sarthakdash populatingwebscaleknowledgegraphsusingdistantlysupervisedrelationextractionandvalidation
AT michaelrglass populatingwebscaleknowledgegraphsusingdistantlysupervisedrelationextractionandvalidation
AT alfiogliozzo populatingwebscaleknowledgegraphsusingdistantlysupervisedrelationextractionandvalidation
AT mustafacanim populatingwebscaleknowledgegraphsusingdistantlysupervisedrelationextractionandvalidation
AT gaetanorossiello populatingwebscaleknowledgegraphsusingdistantlysupervisedrelationextractionandvalidation