Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish
Abstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2022-08-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-022-00805-7 |
_version_ | 1797840758097575936 |
---|---|
author | Ana B. Rios-Alvarado Jose L. Martinez-Rodriguez Andrea G. Garcia-Perez Tania Y. Guerrero-Melendez Ivan Lopez-Arevalo Jose Luis Gonzalez-Compean |
author_facet | Ana B. Rios-Alvarado Jose L. Martinez-Rodriguez Andrea G. Garcia-Perez Tania Y. Guerrero-Melendez Ivan Lopez-Arevalo Jose Luis Gonzalez-Compean |
author_sort | Ana B. Rios-Alvarado |
collection | DOAJ |
description | Abstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature. |
first_indexed | 2024-04-09T16:19:54Z |
format | Article |
id | doaj.art-ef8c716bd38c44e694e4b4d0d8a159db |
institution | Directory Open Access Journal |
issn | 2199-4536 2198-6053 |
language | English |
last_indexed | 2024-04-09T16:19:54Z |
publishDate | 2022-08-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj.art-ef8c716bd38c44e694e4b4d0d8a159db2023-04-23T11:32:51ZengSpringerComplex & Intelligent Systems2199-45362198-60532022-08-01921281129710.1007/s40747-022-00805-7Exploiting lexical patterns for knowledge graph construction from unstructured text in SpanishAna B. Rios-Alvarado0Jose L. Martinez-Rodriguez1Andrea G. Garcia-Perez2Tania Y. Guerrero-Melendez3Ivan Lopez-Arevalo4Jose Luis Gonzalez-Compean5Faculty of Engineering and Science, Autonomous University of TamaulipasUAM-Rodhe, Autonomous University of TamaulipasFaculty of Engineering and Science, Autonomous University of TamaulipasFaculty of Engineering and Science, Autonomous University of TamaulipasCinvestav-TamaulipasCinvestav-TamaulipasAbstract Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.https://doi.org/10.1007/s40747-022-00805-7Knowledge graphsLexical patternsKnowledge representationSpanish RDF graphKnowledge linking |
spellingShingle | Ana B. Rios-Alvarado Jose L. Martinez-Rodriguez Andrea G. Garcia-Perez Tania Y. Guerrero-Melendez Ivan Lopez-Arevalo Jose Luis Gonzalez-Compean Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish Complex & Intelligent Systems Knowledge graphs Lexical patterns Knowledge representation Spanish RDF graph Knowledge linking |
title | Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish |
title_full | Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish |
title_fullStr | Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish |
title_full_unstemmed | Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish |
title_short | Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish |
title_sort | exploiting lexical patterns for knowledge graph construction from unstructured text in spanish |
topic | Knowledge graphs Lexical patterns Knowledge representation Spanish RDF graph Knowledge linking |
url | https://doi.org/10.1007/s40747-022-00805-7 |
work_keys_str_mv | AT anabriosalvarado exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish AT joselmartinezrodriguez exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish AT andreaggarciaperez exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish AT taniayguerreromelendez exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish AT ivanlopezarevalo exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish AT joseluisgonzalezcompean exploitinglexicalpatternsforknowledgegraphconstructionfromunstructuredtextinspanish |