Triplet extraction leveraging sentence transformers and dependency parsing
Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-03-01
|
Series: | Array |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2590005623000590 |
_version_ | 1827321177609076736 |
---|---|
author | Stuart Gallina Ottersen Flávio Pinheiro Fernando Bação |
author_facet | Stuart Gallina Ottersen Flávio Pinheiro Fernando Bação |
author_sort | Stuart Gallina Ottersen |
collection | DOAJ |
description | Knowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context. |
first_indexed | 2024-03-08T17:07:10Z |
format | Article |
id | doaj.art-ac4cc6ef1f134b39afae9d52fc458629 |
institution | Directory Open Access Journal |
issn | 2590-0056 |
language | English |
last_indexed | 2024-04-25T01:00:48Z |
publishDate | 2024-03-01 |
publisher | Elsevier |
record_format | Article |
series | Array |
spelling | doaj.art-ac4cc6ef1f134b39afae9d52fc4586292024-03-11T04:11:04ZengElsevierArray2590-00562024-03-0121100334Triplet extraction leveraging sentence transformers and dependency parsingStuart Gallina Ottersen0Flávio Pinheiro1Fernando Bação2Corresponding author.; NOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalNOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalNOVA IMS, Campus de Campolide, 1070-312, Lisbon, PortugalKnowledge Graphs are a tool to structure (entity, relation, entity) triples. One possible way to construct these knowledge graphs is by extracting triples from unstructured text. The aim when doing this is to maximise the number of useful triples while minimising the triples containing no or useless information. Most previous work in this field uses supervised learning techniques that can be expensive both computationally and in that they require labelled data. While the existing unsupervised methods often produce an excessive amount of triples with low value, base themselves on empirical rules when extracting triples or struggle with the order of the entities relative to the relation. To address these issues this paper suggests a new model: Unsupervised Dependency parsing Aided Semantic Triple Extraction (UDASTE) that leverages sentence structure and allows defining restrictive triple relation types to generate high-quality triples while removing the need for mapping extracted triples to relation schemas. This is done by leveraging pre-trained language models. UDASTE is compared with two baseline models on three datasets. UDASTE outperforms the baselines on all three datasets. Its limitations and possible further work are discussed in addition to the implementation of the model in a computational intelligence context.http://www.sciencedirect.com/science/article/pii/S2590005623000590Triplet extractionNLPNatural language processingKnowledge Graph |
spellingShingle | Stuart Gallina Ottersen Flávio Pinheiro Fernando Bação Triplet extraction leveraging sentence transformers and dependency parsing Array Triplet extraction NLP Natural language processing Knowledge Graph |
title | Triplet extraction leveraging sentence transformers and dependency parsing |
title_full | Triplet extraction leveraging sentence transformers and dependency parsing |
title_fullStr | Triplet extraction leveraging sentence transformers and dependency parsing |
title_full_unstemmed | Triplet extraction leveraging sentence transformers and dependency parsing |
title_short | Triplet extraction leveraging sentence transformers and dependency parsing |
title_sort | triplet extraction leveraging sentence transformers and dependency parsing |
topic | Triplet extraction NLP Natural language processing Knowledge Graph |
url | http://www.sciencedirect.com/science/article/pii/S2590005623000590 |
work_keys_str_mv | AT stuartgallinaottersen tripletextractionleveragingsentencetransformersanddependencyparsing AT flaviopinheiro tripletextractionleveragingsentencetransformersanddependencyparsing AT fernandobacao tripletextractionleveragingsentencetransformersanddependencyparsing |