Triplet extraction leveraging sentence transformers and dependency parsing

Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless...

Full description

Bibliographic Details
Main Authors: Stuart Gallina Ottersen, Flávio Pinheiro, Fernando Bação
Format: Article
Language:English
Published: Elsevier 2024-03-01
Series:Array
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590005623000590
_version_ 1827321177609076736
author Stuart Gallina Ottersen
Flávio Pinheiro
Fernando Bação
author_facet Stuart Gallina Ottersen
Flávio Pinheiro
Fernando Bação
author_sort Stuart Gallina Ottersen
collection DOAJ
description Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.
first_indexed 2024-03-08T17:07:10Z
format Article
id doaj.art-ac4cc6ef1f134b39afae9d52fc458629
institution Directory Open Access Journal
issn 2590-0056
language English
last_indexed 2024-04-25T01:00:48Z
publishDate 2024-03-01
publisher Elsevier
record_format Article
series Array
spelling doaj.art-ac4cc6ef1f134b39afae9d52fc4586292024-03-11T04:11:04ZengElsevierArray2590-00562024-03-0121100334Triplet extraction leveraging sentence transformers and dependency parsingStuart Gallina Ottersen0Flávio Pinheiro1Fernando Bação2Corresponding author.; NOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalNOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalNOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalKnowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.http://www.sciencedirect.com/science/article/pii/S2590005623000590Triplet extractionNLPNatural language processingKnowledge Graph
spellingShingle Stuart Gallina Ottersen
Flávio Pinheiro
Fernando Bação
Triplet extraction leveraging sentence transformers and dependency parsing
Array
Triplet extraction
NLP
Natural language processing
Knowledge Graph
title Triplet extraction leveraging sentence transformers and dependency parsing
title_full Triplet extraction leveraging sentence transformers and dependency parsing
title_fullStr Triplet extraction leveraging sentence transformers and dependency parsing
title_full_unstemmed Triplet extraction leveraging sentence transformers and dependency parsing
title_short Triplet extraction leveraging sentence transformers and dependency parsing
title_sort triplet extraction leveraging sentence transformers and dependency parsing
topic Triplet extraction
NLP
Natural language processing
Knowledge Graph
url http://www.sciencedirect.com/science/article/pii/S2590005623000590
work_keys_str_mv AT stuartgallinaottersen tripletextractionleveragingsentencetransformersanddependencyparsing
AT flaviopinheiro tripletextractionleveragingsentencetransformersanddependencyparsing
AT fernandobacao tripletextractionleveragingsentencetransformersanddependencyparsing