Topic identification from news blog in Spanish language
Currently exist a large amount of news in a digital format that need to be classified or labeled automatically according to their content. LDA is an unsupervised technique that automatically creates topics based on words in documents. The present work aims to apply LDA in order to analyze and extra...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universidad Técnica de Manabí
2022-05-01
|
Series: | Informática y Sistemas |
Subjects: | |
Online Access: | https://revistas.utm.edu.ec/index.php/Informaticaysistemas/article/view/4514 |
_version_ | 1797858238248517632 |
---|---|
author | Lizbeth Pacheco-Guevara Ruth Reátegui Priscila Valdiviezo-Díaz |
author_facet | Lizbeth Pacheco-Guevara Ruth Reátegui Priscila Valdiviezo-Díaz |
author_sort | Lizbeth Pacheco-Guevara |
collection | DOAJ |
description | Currently exist a large amount of news in a digital format that need to be classified or labeled automatically according to their content. LDA is an unsupervised technique that automatically creates topics based on words in documents. The present work aims to apply LDA in order to analyze and extract topic from digital news in Spanish language. A total of 198 digital news was collected from a university news blog. A data pre-processing and representation in vector spaces was carried out and k values were selected based on coherence metric. A TF_IDF matrix and a combination of unigrams and bigrams produce topics with a variety of terms and topics related to university activities like study programs, research, projects for innovation and social responsibility. Furthermore, with the manual validation process, terms in topics correspond with hashtags written by the communication professionals. |
first_indexed | 2024-04-09T21:11:11Z |
format | Article |
id | doaj.art-be24305013674ef9b2f6e48b3c2563f4 |
institution | Directory Open Access Journal |
issn | 2550-6730 |
language | English |
last_indexed | 2024-04-09T21:11:11Z |
publishDate | 2022-05-01 |
publisher | Universidad Técnica de Manabí |
record_format | Article |
series | Informática y Sistemas |
spelling | doaj.art-be24305013674ef9b2f6e48b3c2563f42023-03-28T21:04:47ZengUniversidad Técnica de ManabíInformática y Sistemas2550-67302022-05-0161223410.33936/isrtic.v6i1.45143370Topic identification from news blog in Spanish languageLizbeth Pacheco-Guevara0Ruth Reátegui1Priscila Valdiviezo-Díaz2Universidad Técnica Particular de LojaUniversidad Técnica Particular de LojaUniversidad Técnica Particular de LojaCurrently exist a large amount of news in a digital format that need to be classified or labeled automatically according to their content. LDA is an unsupervised technique that automatically creates topics based on words in documents. The present work aims to apply LDA in order to analyze and extract topic from digital news in Spanish language. A total of 198 digital news was collected from a university news blog. A data pre-processing and representation in vector spaces was carried out and k values were selected based on coherence metric. A TF_IDF matrix and a combination of unigrams and bigrams produce topics with a variety of terms and topics related to university activities like study programs, research, projects for innovation and social responsibility. Furthermore, with the manual validation process, terms in topics correspond with hashtags written by the communication professionals.https://revistas.utm.edu.ec/index.php/Informaticaysistemas/article/view/4514lda, modelado de tópicos, noticias, blog |
spellingShingle | Lizbeth Pacheco-Guevara Ruth Reátegui Priscila Valdiviezo-Díaz Topic identification from news blog in Spanish language Informática y Sistemas lda, modelado de tópicos, noticias, blog |
title | Topic identification from news blog in Spanish language |
title_full | Topic identification from news blog in Spanish language |
title_fullStr | Topic identification from news blog in Spanish language |
title_full_unstemmed | Topic identification from news blog in Spanish language |
title_short | Topic identification from news blog in Spanish language |
title_sort | topic identification from news blog in spanish language |
topic | lda, modelado de tópicos, noticias, blog |
url | https://revistas.utm.edu.ec/index.php/Informaticaysistemas/article/view/4514 |
work_keys_str_mv | AT lizbethpachecoguevara topicidentificationfromnewsbloginspanishlanguage AT ruthreategui topicidentificationfromnewsbloginspanishlanguage AT priscilavaldiviezodiaz topicidentificationfromnewsbloginspanishlanguage |