Social media mining: a genetic based multiobjective clustering approach to topic modelling

Social media mining is the process of collecting large datasets from user-generated content and extracting and analyzing social media interactions to recognize meaningful patterns in individual and social behavior. Everyday, more contents related to social media are generated by social media users (...

Full description

Bibliographic Details
Main Authors: Rayner Alfred, Loo Yew Jie, Joe Henry Obit, Yuto Lim, Haviluddin Haviluddin, Azreen Azman
Format: Article
Language:English
English
Published: International Association of Engineers 2021
Subjects:
Online Access:https://eprints.ums.edu.my/id/eprint/30053/1/Social%20media%20mining%2C%20a%20genetic%20based%20multiobjective%20clustering%20approach%20to%20topic%20modelling-Abstract.pdf
https://eprints.ums.edu.my/id/eprint/30053/2/Social%20Media%20Mining%2C%20A%20Genetic%20Based%20Multiobjective%20Clustering%20Approach%20to%20Topic%20Modelling.pdf
_version_ 1825714288508010496
author Rayner Alfred
Loo Yew Jie
Joe Henry Obit
Yuto Lim
Haviluddin Haviluddin
Azreen Azman
author_facet Rayner Alfred
Loo Yew Jie
Joe Henry Obit
Yuto Lim
Haviluddin Haviluddin
Azreen Azman
author_sort Rayner Alfred
collection UMS
description Social media mining is the process of collecting large datasets from user-generated content and extracting and analyzing social media interactions to recognize meaningful patterns in individual and social behavior. Everyday, more contents related to social media are generated by social media users (e.g., Facebook, Twitter). As the components of big data continue to expand, the task of extracting useful information becomes critical. Topic extraction refers to the process of extracting main topics from the pool of news feed and a typical method to perform topic extraction is through clustering. Clustering defines or organizes a group of patterns or objects into clusters, allows high-dimensional data to be presented in an apprehensive fashion to humans. Although effective, the performance of the k-means clustering algorithm depends heavily on the initial centroids and the number of clusters, k. Recently, several effective supervised and unsupervised machine learning methods have been developed in the domain of topics extraction. However, less works have been conducted in applying multiobjective based algorithm for topic extraction. Most of these algorithms are not optimized, even if they are, they are only optimized by using a single objective method and may underperform when solving real-world problems which are typically multi-objectives in nature. This paper investigates the effects of using a multiobjective genetic algorithm (MOGA) based clustering technique to cluster texts for topic extraction which is designed based on the structure and purity of the clusters in order to determine the optimal initial centroids and the number of clusters, k. Then, the mapping percentages between the predefined and produced clusters are used to assess the performance of the proposed algorithm. The best mapping percentage of 62.7 obtained using the proposed algorithm when k = 15 is obtained to outperform the performance of the generic k-means. The top five most representative words from each cluster are also extracted and validated by computing the number of tweets related to the predefined tags.
first_indexed 2024-03-06T03:09:34Z
format Article
id ums.eprints-30053
institution Universiti Malaysia Sabah
language English
English
last_indexed 2024-03-06T03:09:34Z
publishDate 2021
publisher International Association of Engineers
record_format dspace
spelling ums.eprints-300532021-07-23T03:59:35Z https://eprints.ums.edu.my/id/eprint/30053/ Social media mining: a genetic based multiobjective clustering approach to topic modelling Rayner Alfred Loo Yew Jie Joe Henry Obit Yuto Lim Haviluddin Haviluddin Azreen Azman QA Mathematics T Technology (General) Social media mining is the process of collecting large datasets from user-generated content and extracting and analyzing social media interactions to recognize meaningful patterns in individual and social behavior. Everyday, more contents related to social media are generated by social media users (e.g., Facebook, Twitter). As the components of big data continue to expand, the task of extracting useful information becomes critical. Topic extraction refers to the process of extracting main topics from the pool of news feed and a typical method to perform topic extraction is through clustering. Clustering defines or organizes a group of patterns or objects into clusters, allows high-dimensional data to be presented in an apprehensive fashion to humans. Although effective, the performance of the k-means clustering algorithm depends heavily on the initial centroids and the number of clusters, k. Recently, several effective supervised and unsupervised machine learning methods have been developed in the domain of topics extraction. However, less works have been conducted in applying multiobjective based algorithm for topic extraction. Most of these algorithms are not optimized, even if they are, they are only optimized by using a single objective method and may underperform when solving real-world problems which are typically multi-objectives in nature. This paper investigates the effects of using a multiobjective genetic algorithm (MOGA) based clustering technique to cluster texts for topic extraction which is designed based on the structure and purity of the clusters in order to determine the optimal initial centroids and the number of clusters, k. Then, the mapping percentages between the predefined and produced clusters are used to assess the performance of the proposed algorithm. The best mapping percentage of 62.7 obtained using the proposed algorithm when k = 15 is obtained to outperform the performance of the generic k-means. The top five most representative words from each cluster are also extracted and validated by computing the number of tweets related to the predefined tags. International Association of Engineers 2021 Article PeerReviewed text en https://eprints.ums.edu.my/id/eprint/30053/1/Social%20media%20mining%2C%20a%20genetic%20based%20multiobjective%20clustering%20approach%20to%20topic%20modelling-Abstract.pdf text en https://eprints.ums.edu.my/id/eprint/30053/2/Social%20Media%20Mining%2C%20A%20Genetic%20Based%20Multiobjective%20Clustering%20Approach%20to%20Topic%20Modelling.pdf Rayner Alfred and Loo Yew Jie and Joe Henry Obit and Yuto Lim and Haviluddin Haviluddin and Azreen Azman (2021) Social media mining: a genetic based multiobjective clustering approach to topic modelling. IAENG International Journal of Computer Science, 48. ISSN 1819-656X http://www.iaeng.org/IJCS/issues_v48/issue_1/IJCS_48_1_04.pdf
spellingShingle QA Mathematics
T Technology (General)
Rayner Alfred
Loo Yew Jie
Joe Henry Obit
Yuto Lim
Haviluddin Haviluddin
Azreen Azman
Social media mining: a genetic based multiobjective clustering approach to topic modelling
title Social media mining: a genetic based multiobjective clustering approach to topic modelling
title_full Social media mining: a genetic based multiobjective clustering approach to topic modelling
title_fullStr Social media mining: a genetic based multiobjective clustering approach to topic modelling
title_full_unstemmed Social media mining: a genetic based multiobjective clustering approach to topic modelling
title_short Social media mining: a genetic based multiobjective clustering approach to topic modelling
title_sort social media mining a genetic based multiobjective clustering approach to topic modelling
topic QA Mathematics
T Technology (General)
url https://eprints.ums.edu.my/id/eprint/30053/1/Social%20media%20mining%2C%20a%20genetic%20based%20multiobjective%20clustering%20approach%20to%20topic%20modelling-Abstract.pdf
https://eprints.ums.edu.my/id/eprint/30053/2/Social%20Media%20Mining%2C%20A%20Genetic%20Based%20Multiobjective%20Clustering%20Approach%20to%20Topic%20Modelling.pdf
work_keys_str_mv AT rayneralfred socialmediaminingageneticbasedmultiobjectiveclusteringapproachtotopicmodelling
AT looyewjie socialmediaminingageneticbasedmultiobjectiveclusteringapproachtotopicmodelling
AT joehenryobit socialmediaminingageneticbasedmultiobjectiveclusteringapproachtotopicmodelling
AT yutolim socialmediaminingageneticbasedmultiobjectiveclusteringapproachtotopicmodelling
AT haviluddinhaviluddin socialmediaminingageneticbasedmultiobjectiveclusteringapproachtotopicmodelling
AT azreenazman socialmediaminingageneticbasedmultiobjectiveclusteringapproachtotopicmodelling