HDSKG: Harvesting domain specific knowledge graph from content of webpages

Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subje...

Full description

Bibliographic Details
Main Authors:	Zhao, Xuejiao, Xing, Zhenchang, Kabir, Muhammad Ashad, Sawada, Naoya, Li, Jing, Lin, Shang-Wei
Other Authors:	School of Computer Science and Engineering
Format:	Conference Paper
Language:	English
Published:	2017
Subjects:	Knowledge graph Structural information extraction
Online Access:	https://hdl.handle.net/10356/83091 http://hdl.handle.net/10220/42426

_version_	1824455972484022272
author	Zhao, Xuejiao Xing, Zhenchang Kabir, Muhammad Ashad Sawada, Naoya Li, Jing Lin, Shang-Wei
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zhao, Xuejiao Xing, Zhenchang Kabir, Muhammad Ashad Sawada, Naoya Li, Jing Lin, Shang-Wei
author_sort	Zhao, Xuejiao
collection	NTU
description	Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data.
first_indexed	2025-02-19T03:46:42Z
format	Conference Paper
id	ntu-10356/83091
institution	Nanyang Technological University
language	English
last_indexed	2025-02-19T03:46:42Z
publishDate	2017
record_format	dspace
spelling	ntu-10356/830912020-03-07T11:48:45Z HDSKG: Harvesting domain specific knowledge graph from content of webpages Zhao, Xuejiao Xing, Zhenchang Kabir, Muhammad Ashad Sawada, Naoya Li, Jing Lin, Shang-Wei School of Computer Science and Engineering 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) Rolls-Royce@NTU Corporate Lab NTU-UBC Research Centre of Excellence in Active Living for the Elderly Knowledge graph Structural information extraction Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data. NRF (Natl Research Foundation, S’pore) Accepted version 2017-05-15T08:27:32Z 2019-12-06T15:11:40Z 2017-05-15T08:27:32Z 2019-12-06T15:11:40Z 2017-02-01 2017 Conference Paper Zhao, X., Xing, Z., Kabir, M. A., Sawada, N., Li, J., & Lin, S.-W. (2017). HDSKG: Harvesting domain specific knowledge graph from content of webpages. 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), 56-67. 978-1-5090-5501-2 https://hdl.handle.net/10356/83091 http://hdl.handle.net/10220/42426 10.1109/SANER.2017.7884609 200438 en © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [https://doi.org/10.1109/SANER.2017.7884609]. 12 p. application/pdf
spellingShingle	Knowledge graph Structural information extraction Zhao, Xuejiao Xing, Zhenchang Kabir, Muhammad Ashad Sawada, Naoya Li, Jing Lin, Shang-Wei HDSKG: Harvesting domain specific knowledge graph from content of webpages
title	HDSKG: Harvesting domain specific knowledge graph from content of webpages
title_full	HDSKG: Harvesting domain specific knowledge graph from content of webpages
title_fullStr	HDSKG: Harvesting domain specific knowledge graph from content of webpages
title_full_unstemmed	HDSKG: Harvesting domain specific knowledge graph from content of webpages
title_short	HDSKG: Harvesting domain specific knowledge graph from content of webpages
title_sort	hdskg harvesting domain specific knowledge graph from content of webpages
topic	Knowledge graph Structural information extraction
url	https://hdl.handle.net/10356/83091 http://hdl.handle.net/10220/42426
work_keys_str_mv	AT zhaoxuejiao hdskgharvestingdomainspecificknowledgegraphfromcontentofwebpages AT xingzhenchang hdskgharvestingdomainspecificknowledgegraphfromcontentofwebpages AT kabirmuhammadashad hdskgharvestingdomainspecificknowledgegraphfromcontentofwebpages AT sawadanaoya hdskgharvestingdomainspecificknowledgegraphfromcontentofwebpages AT lijing hdskgharvestingdomainspecificknowledgegraphfromcontentofwebpages AT linshangwei hdskgharvestingdomainspecificknowledgegraphfromcontentofwebpages

HDSKG: Harvesting domain specific knowledge graph from content of webpages

Similar Items