Enhancing topic clustering for Arabic security news based on k‐means and topic modelling

Abstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental refl...

Full description

Bibliographic Details
Main Authors: Adel R. Alharbi, Mohammad Hijji, Amer Aljaedi
Format: Article
Language:English
Published: Wiley 2021-11-01
Series:IET Networks
Subjects:
Online Access:https://doi.org/10.1049/ntw2.12017
_version_ 1811221012124008448
author Adel R. Alharbi
Mohammad Hijji
Amer Aljaedi
author_facet Adel R. Alharbi
Mohammad Hijji
Amer Aljaedi
author_sort Adel R. Alharbi
collection DOAJ
description Abstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental reflection on the views and impressions of people utilising natural language processing and analytical linguistics. Therefore, we have collected corpus from popular Arabic websites that publish articles related to recent security issues, and we provide light weight preprocessing techniques where data is term matrix is transformed. We also present an intensive lexical‐driven data analysis with visualised data views, as our topic modelling technique can effectively extract significant topics from all the collected text from different websites. Our experiments validate the k‐means clustering algorithm with and without the latent Dirichlet allocation topic modelling method, and we adopted various validation techniques to measure the topic clustering internally and externally. As shown in the experiments' results, our proposed combined method has a high round index rate of 87.2%, with a large number of topics and clusters.
first_indexed 2024-04-12T07:52:31Z
format Article
id doaj.art-9c37dcf5ebbe4cbe9b0d6fd2970c44e4
institution Directory Open Access Journal
issn 2047-4954
2047-4962
language English
last_indexed 2024-04-12T07:52:31Z
publishDate 2021-11-01
publisher Wiley
record_format Article
series IET Networks
spelling doaj.art-9c37dcf5ebbe4cbe9b0d6fd2970c44e42022-12-22T03:41:34ZengWileyIET Networks2047-49542047-49622021-11-0110627829410.1049/ntw2.12017Enhancing topic clustering for Arabic security news based on k‐means and topic modellingAdel R. Alharbi0Mohammad Hijji1Amer Aljaedi2College of Computing and Information Technology University of Tabuk Tabuk Saudi ArabiaCollege of Computing and Information Technology University of Tabuk Tabuk Saudi ArabiaCollege of Computing and Information Technology University of Tabuk Tabuk Saudi ArabiaAbstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental reflection on the views and impressions of people utilising natural language processing and analytical linguistics. Therefore, we have collected corpus from popular Arabic websites that publish articles related to recent security issues, and we provide light weight preprocessing techniques where data is term matrix is transformed. We also present an intensive lexical‐driven data analysis with visualised data views, as our topic modelling technique can effectively extract significant topics from all the collected text from different websites. Our experiments validate the k‐means clustering algorithm with and without the latent Dirichlet allocation topic modelling method, and we adopted various validation techniques to measure the topic clustering internally and externally. As shown in the experiments' results, our proposed combined method has a high round index rate of 87.2%, with a large number of topics and clusters.https://doi.org/10.1049/ntw2.12017computational linguisticsdata analysisInternetnatural language processingpattern clusteringtext analysis
spellingShingle Adel R. Alharbi
Mohammad Hijji
Amer Aljaedi
Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
IET Networks
computational linguistics
data analysis
Internet
natural language processing
pattern clustering
text analysis
title Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
title_full Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
title_fullStr Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
title_full_unstemmed Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
title_short Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
title_sort enhancing topic clustering for arabic security news based on k means and topic modelling
topic computational linguistics
data analysis
Internet
natural language processing
pattern clustering
text analysis
url https://doi.org/10.1049/ntw2.12017
work_keys_str_mv AT adelralharbi enhancingtopicclusteringforarabicsecuritynewsbasedonkmeansandtopicmodelling
AT mohammadhijji enhancingtopicclusteringforarabicsecuritynewsbasedonkmeansandtopicmodelling
AT ameraljaedi enhancingtopicclusteringforarabicsecuritynewsbasedonkmeansandtopicmodelling