Topic identification from news blog in Spanish language

Currently exist a large amount of news in a digital format that need to be classified or labeled automatically according to their content.  LDA is an unsupervised technique that automatically creates topics based on words in documents. The present work aims to apply LDA in order to analyze and extra...

Full description

Bibliographic Details
Main Authors: Lizbeth Pacheco-Guevara, Ruth Reátegui, Priscila Valdiviezo-Díaz
Format: Article
Language:English
Published: Universidad Técnica de Manabí 2022-05-01
Series:Informática y Sistemas
Subjects:
Online Access:https://revistas.utm.edu.ec/index.php/Informaticaysistemas/article/view/4514
_version_ 1797858238248517632
author Lizbeth Pacheco-Guevara
Ruth Reátegui
Priscila Valdiviezo-Díaz
author_facet Lizbeth Pacheco-Guevara
Ruth Reátegui
Priscila Valdiviezo-Díaz
author_sort Lizbeth Pacheco-Guevara
collection DOAJ
description Currently exist a large amount of news in a digital format that need to be classified or labeled automatically according to their content.  LDA is an unsupervised technique that automatically creates topics based on words in documents. The present work aims to apply LDA in order to analyze and extract topic from digital news in Spanish language. A total of 198 digital news was collected from a university news blog. A data pre-processing and representation in vector spaces was carried out and k values were selected based on coherence metric. A TF_IDF matrix and a combination of unigrams and bigrams produce topics with a variety of terms and topics related to university activities like study programs, research, projects for innovation and social responsibility. Furthermore, with the manual validation process, terms in topics correspond with hashtags written by the communication professionals.
first_indexed 2024-04-09T21:11:11Z
format Article
id doaj.art-be24305013674ef9b2f6e48b3c2563f4
institution Directory Open Access Journal
issn 2550-6730
language English
last_indexed 2024-04-09T21:11:11Z
publishDate 2022-05-01
publisher Universidad Técnica de Manabí
record_format Article
series Informática y Sistemas
spelling doaj.art-be24305013674ef9b2f6e48b3c2563f42023-03-28T21:04:47ZengUniversidad Técnica de ManabíInformática y Sistemas2550-67302022-05-0161223410.33936/isrtic.v6i1.45143370Topic identification from news blog in Spanish languageLizbeth Pacheco-Guevara0Ruth Reátegui1Priscila Valdiviezo-Díaz2Universidad Técnica Particular de LojaUniversidad Técnica Particular de LojaUniversidad Técnica Particular de LojaCurrently exist a large amount of news in a digital format that need to be classified or labeled automatically according to their content.  LDA is an unsupervised technique that automatically creates topics based on words in documents. The present work aims to apply LDA in order to analyze and extract topic from digital news in Spanish language. A total of 198 digital news was collected from a university news blog. A data pre-processing and representation in vector spaces was carried out and k values were selected based on coherence metric. A TF_IDF matrix and a combination of unigrams and bigrams produce topics with a variety of terms and topics related to university activities like study programs, research, projects for innovation and social responsibility. Furthermore, with the manual validation process, terms in topics correspond with hashtags written by the communication professionals.https://revistas.utm.edu.ec/index.php/Informaticaysistemas/article/view/4514lda, modelado de tópicos, noticias, blog
spellingShingle Lizbeth Pacheco-Guevara
Ruth Reátegui
Priscila Valdiviezo-Díaz
Topic identification from news blog in Spanish language
Informática y Sistemas
lda, modelado de tópicos, noticias, blog
title Topic identification from news blog in Spanish language
title_full Topic identification from news blog in Spanish language
title_fullStr Topic identification from news blog in Spanish language
title_full_unstemmed Topic identification from news blog in Spanish language
title_short Topic identification from news blog in Spanish language
title_sort topic identification from news blog in spanish language
topic lda, modelado de tópicos, noticias, blog
url https://revistas.utm.edu.ec/index.php/Informaticaysistemas/article/view/4514
work_keys_str_mv AT lizbethpachecoguevara topicidentificationfromnewsbloginspanishlanguage
AT ruthreategui topicidentificationfromnewsbloginspanishlanguage
AT priscilavaldiviezodiaz topicidentificationfromnewsbloginspanishlanguage