Triplet extraction leveraging sentence transformers and dependency parsing

Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless...

Full description

Bibliographic Details
Main Authors:	Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação
Format:	Article
Language:	English
Published:	Elsevier 2024-03-01
Series:	Array
Subjects:	Triplet extraction NLP Natural language processing Knowledge Graph
Online Access:	http://www.sciencedirect.com/science/article/pii/S2590005623000590

_version_	1827321177609076736
author	Stuart Gallina Ottersen Flávio Pinheiro Fernando Bação
author_facet	Stuart Gallina Ottersen Flávio Pinheiro Fernando Bação
author_sort	Stuart Gallina Ottersen
collection	DOAJ
description	Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.
first_indexed	2024-03-08T17:07:10Z
format	Article
id	doaj.art-ac4cc6ef1f134b39afae9d52fc458629
institution	Directory Open Access Journal
issn	2590-0056
language	English
last_indexed	2024-04-25T01:00:48Z
publishDate	2024-03-01
publisher	Elsevier
record_format	Article
series	Array
spelling	doaj.art-ac4cc6ef1f134b39afae9d52fc4586292024-03-11T04:11:04ZengElsevierArray2590-00562024-03-0121100334Triplet extraction leveraging sentence transformers and dependency parsingStuart Gallina Ottersen0Flávio Pinheiro1Fernando Bação2Corresponding author.; NOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalNOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalNOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalKnowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.http://www.sciencedirect.com/science/article/pii/S2590005623000590Triplet extractionNLPNatural language processingKnowledge Graph
spellingShingle	Stuart Gallina Ottersen Flávio Pinheiro Fernando Bação Triplet extraction leveraging sentence transformers and dependency parsing Array Triplet extraction NLP Natural language processing Knowledge Graph
title	Triplet extraction leveraging sentence transformers and dependency parsing
title_full	Triplet extraction leveraging sentence transformers and dependency parsing
title_fullStr	Triplet extraction leveraging sentence transformers and dependency parsing
title_full_unstemmed	Triplet extraction leveraging sentence transformers and dependency parsing
title_short	Triplet extraction leveraging sentence transformers and dependency parsing
title_sort	triplet extraction leveraging sentence transformers and dependency parsing
topic	Triplet extraction NLP Natural language processing Knowledge Graph
url	http://www.sciencedirect.com/science/article/pii/S2590005623000590
work_keys_str_mv	AT stuartgallinaottersen tripletextractionleveragingsentencetransformersanddependencyparsing AT flaviopinheiro tripletextractionleveragingsentencetransformersanddependencyparsing AT fernandobacao tripletextractionleveragingsentencetransformersanddependencyparsing

Triplet extraction leveraging sentence transformers and dependency parsing

Similar Items