Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish

Abstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when...

Full description

Bibliographic Details
Main Authors: Ana B. Rios-Alvarado, Jose L. Martinez-Rodriguez, Andrea G. Garcia-Perez, Tania Y. Guerrero-Melendez, Ivan Lopez-Arevalo, Jose Luis Gonzalez-Compean
Format: Article
Language:English
Published: Springer 2022-08-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-022-00805-7
_version_ 1797840758097575936
author Ana B. Rios-Alvarado
Jose L. Martinez-Rodriguez
Andrea G. Garcia-Perez
Tania Y. Guerrero-Melendez
Ivan Lopez-Arevalo
Jose Luis Gonzalez-Compean
author_facet Ana B. Rios-Alvarado
Jose L. Martinez-Rodriguez
Andrea G. Garcia-Perez
Tania Y. Guerrero-Melendez
Ivan Lopez-Arevalo
Jose Luis Gonzalez-Compean
author_sort Ana B. Rios-Alvarado
collection DOAJ
description Abstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.
first_indexed 2024-04-09T16:19:54Z
format Article
id doaj.art-ef8c716bd38c44e694e4b4d0d8a159db
institution Directory Open Access Journal
issn 2199-4536
2198-6053
language English
last_indexed 2024-04-09T16:19:54Z
publishDate 2022-08-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj.art-ef8c716bd38c44e694e4b4d0d8a159db2023-04-23T11:32:51ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-08-01921281129710.1007/s40747-022-00805-7Exploiting lexical patterns for knowledge graph construction from unstructured text in SpanishAna B. Rios-Alvarado0Jose L. Martinez-Rodriguez1Andrea G. Garcia-Perez2Tania Y. Guerrero-Melendez3Ivan Lopez-Arevalo4Jose Luis Gonzalez-Compean5Faculty of Engineering and Science, Autonomous University of TamaulipasUAM-Rodhe, Autonomous University of TamaulipasFaculty of Engineering and Science, Autonomous University of TamaulipasFaculty of Engineering and Science, Autonomous University of TamaulipasCinvestav-TamaulipasCinvestav-TamaulipasAbstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.https://doi.org/10.1007/s40747-022-00805-7Knowledge graphsLexical patternsKnowledge representationSpanish RDF graphKnowledge linking
spellingShingle Ana B. Rios-Alvarado
Jose L. Martinez-Rodriguez
Andrea G. Garcia-Perez
Tania Y. Guerrero-Melendez
Ivan Lopez-Arevalo
Jose Luis Gonzalez-Compean
Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
Complex & Intelligent Systems
Knowledge graphs
Lexical patterns
Knowledge representation
Spanish RDF graph
Knowledge linking
title Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
title_full Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
title_fullStr Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
title_full_unstemmed Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
title_short Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
title_sort exploiting lexical patterns for knowledge graph construction from unstructured text in spanish
topic Knowledge graphs
Lexical patterns
Knowledge representation
Spanish RDF graph
Knowledge linking
url https://doi.org/10.1007/s40747-022-00805-7
work_keys_str_mv AT anabriosalvarado exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish
AT joselmartinezrodriguez exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish
AT andreaggarciaperez exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish
AT taniayguerreromelendez exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish
AT ivanlopezarevalo exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish
AT joseluisgonzalezcompean exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish