Minimally Supervised Relation Identification from Wikipedia Articles

Wikipedia is composed of millions of articles, each of which explains a particular entity with various languages in the real world. Since the articles are contributed and edited by a large population of diverse experts with no specific authority, Wikipedia can be seen as a naturally occurring body o...

Full description

Bibliographic Details
Main Authors: Heung-Seon Oh, Yuchul Jung
Format: Article
Language:English
Published: Korea Institute of Science and Technology Information 2018-12-01
Series:Journal of Information Science Theory and Practice
Subjects:
Online Access:https://doi.org/10.1633/JISTaP.2018.6.4.3
_version_ 1819319294834507776
author Heung-Seon Oh
Yuchul Jung
author_facet Heung-Seon Oh
Yuchul Jung
author_sort Heung-Seon Oh
collection DOAJ
description Wikipedia is composed of millions of articles, each of which explains a particular entity with various languages in the real world. Since the articles are contributed and edited by a large population of diverse experts with no specific authority, Wikipedia can be seen as a naturally occurring body of human knowledge. In this paper, we propose a method to automatically identify key entities and relations in Wikipedia articles, which can be used for automatic ontology construction. Compared to previous approaches to entity and relation extraction and/or identification from text, our goal is to capture naturally occurring entities and relations from Wikipedia while minimizing artificiality often introduced at the stages of constructing training and testing data. The titles of the articles and anchored phrases in their text are regarded as entities, and their types are automatically classified with minimal training. We attempt to automatically detect and identify possible relations among the entities based on clustering without training data, as opposed to the relation extraction approach that focuses on improvement of accuracy in selecting one of the several target relations for a given pair of entities. While the relation extraction approach with supervised learning requires a significant amount of annotation efforts for a predefined set of relations, our approach attempts to discover relations as they occur naturally. Unlike other unsupervised relation identification work where evaluation of automatically identified relations is done with the correct relations determined a priori by human judges, we attempted to evaluate appropriateness of the naturally occurring clusters of relations involving person-artifact and person-organization entities and their relation names.
first_indexed 2024-12-24T11:01:24Z
format Article
id doaj.art-943be1988b5e4c4c90e8e3c883b82e88
institution Directory Open Access Journal
issn 2287-9099
2287-4577
language English
last_indexed 2024-12-24T11:01:24Z
publishDate 2018-12-01
publisher Korea Institute of Science and Technology Information
record_format Article
series Journal of Information Science Theory and Practice
spelling doaj.art-943be1988b5e4c4c90e8e3c883b82e882022-12-21T16:58:43ZengKorea Institute of Science and Technology InformationJournal of Information Science Theory and Practice2287-90992287-45772018-12-0164283810.1633/JISTaP.2018.6.4.3Minimally Supervised Relation Identification from Wikipedia ArticlesHeung-Seon Oh0Yuchul Jung1Korea University of Technology and Education, Cheonan, KoreaKumoh National Institute of Technology, Gumi, Wikipedia is composed of millions of articles, each of which explains a particular entity with various languages in the real world. Since the articles are contributed and edited by a large population of diverse experts with no specific authority, Wikipedia can be seen as a naturally occurring body of human knowledge. In this paper, we propose a method to automatically identify key entities and relations in Wikipedia articles, which can be used for automatic ontology construction. Compared to previous approaches to entity and relation extraction and/or identification from text, our goal is to capture naturally occurring entities and relations from Wikipedia while minimizing artificiality often introduced at the stages of constructing training and testing data. The titles of the articles and anchored phrases in their text are regarded as entities, and their types are automatically classified with minimal training. We attempt to automatically detect and identify possible relations among the entities based on clustering without training data, as opposed to the relation extraction approach that focuses on improvement of accuracy in selecting one of the several target relations for a given pair of entities. While the relation extraction approach with supervised learning requires a significant amount of annotation efforts for a predefined set of relations, our approach attempts to discover relations as they occur naturally. Unlike other unsupervised relation identification work where evaluation of automatically identified relations is done with the correct relations determined a priori by human judges, we attempted to evaluate appropriateness of the naturally occurring clusters of relations involving person-artifact and person-organization entities and their relation names.https://doi.org/10.1633/JISTaP.2018.6.4.3relation identificationWikipedia miningunsupervised clustering
spellingShingle Heung-Seon Oh
Yuchul Jung
Minimally Supervised Relation Identification from Wikipedia Articles
Journal of Information Science Theory and Practice
relation identification
Wikipedia mining
unsupervised clustering
title Minimally Supervised Relation Identification from Wikipedia Articles
title_full Minimally Supervised Relation Identification from Wikipedia Articles
title_fullStr Minimally Supervised Relation Identification from Wikipedia Articles
title_full_unstemmed Minimally Supervised Relation Identification from Wikipedia Articles
title_short Minimally Supervised Relation Identification from Wikipedia Articles
title_sort minimally supervised relation identification from wikipedia articles
topic relation identification
Wikipedia mining
unsupervised clustering
url https://doi.org/10.1633/JISTaP.2018.6.4.3
work_keys_str_mv AT heungseonoh minimallysupervisedrelationidentificationfromwikipediaarticles
AT yuchuljung minimallysupervisedrelationidentificationfromwikipediaarticles