Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
Abstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental refl...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-11-01
|
Series: | IET Networks |
Subjects: | |
Online Access: | https://doi.org/10.1049/ntw2.12017 |
_version_ | 1811221012124008448 |
---|---|
author | Adel R. Alharbi Mohammad Hijji Amer Aljaedi |
author_facet | Adel R. Alharbi Mohammad Hijji Amer Aljaedi |
author_sort | Adel R. Alharbi |
collection | DOAJ |
description | Abstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental reflection on the views and impressions of people utilising natural language processing and analytical linguistics. Therefore, we have collected corpus from popular Arabic websites that publish articles related to recent security issues, and we provide light weight preprocessing techniques where data is term matrix is transformed. We also present an intensive lexical‐driven data analysis with visualised data views, as our topic modelling technique can effectively extract significant topics from all the collected text from different websites. Our experiments validate the k‐means clustering algorithm with and without the latent Dirichlet allocation topic modelling method, and we adopted various validation techniques to measure the topic clustering internally and externally. As shown in the experiments' results, our proposed combined method has a high round index rate of 87.2%, with a large number of topics and clusters. |
first_indexed | 2024-04-12T07:52:31Z |
format | Article |
id | doaj.art-9c37dcf5ebbe4cbe9b0d6fd2970c44e4 |
institution | Directory Open Access Journal |
issn | 2047-4954 2047-4962 |
language | English |
last_indexed | 2024-04-12T07:52:31Z |
publishDate | 2021-11-01 |
publisher | Wiley |
record_format | Article |
series | IET Networks |
spelling | doaj.art-9c37dcf5ebbe4cbe9b0d6fd2970c44e42022-12-22T03:41:34ZengWileyIET Networks2047-49542047-49622021-11-0110627829410.1049/ntw2.12017Enhancing topic clustering for Arabic security news based on k‐means and topic modellingAdel R. Alharbi0Mohammad Hijji1Amer Aljaedi2College of Computing and Information Technology University of Tabuk Tabuk Saudi ArabiaCollege of Computing and Information Technology University of Tabuk Tabuk Saudi ArabiaCollege of Computing and Information Technology University of Tabuk Tabuk Saudi ArabiaAbstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental reflection on the views and impressions of people utilising natural language processing and analytical linguistics. Therefore, we have collected corpus from popular Arabic websites that publish articles related to recent security issues, and we provide light weight preprocessing techniques where data is term matrix is transformed. We also present an intensive lexical‐driven data analysis with visualised data views, as our topic modelling technique can effectively extract significant topics from all the collected text from different websites. Our experiments validate the k‐means clustering algorithm with and without the latent Dirichlet allocation topic modelling method, and we adopted various validation techniques to measure the topic clustering internally and externally. As shown in the experiments' results, our proposed combined method has a high round index rate of 87.2%, with a large number of topics and clusters.https://doi.org/10.1049/ntw2.12017computational linguisticsdata analysisInternetnatural language processingpattern clusteringtext analysis |
spellingShingle | Adel R. Alharbi Mohammad Hijji Amer Aljaedi Enhancing topic clustering for Arabic security news based on k‐means and topic modelling IET Networks computational linguistics data analysis Internet natural language processing pattern clustering text analysis |
title | Enhancing topic clustering for Arabic security news based on k‐means and topic modelling |
title_full | Enhancing topic clustering for Arabic security news based on k‐means and topic modelling |
title_fullStr | Enhancing topic clustering for Arabic security news based on k‐means and topic modelling |
title_full_unstemmed | Enhancing topic clustering for Arabic security news based on k‐means and topic modelling |
title_short | Enhancing topic clustering for Arabic security news based on k‐means and topic modelling |
title_sort | enhancing topic clustering for arabic security news based on k means and topic modelling |
topic | computational linguistics data analysis Internet natural language processing pattern clustering text analysis |
url | https://doi.org/10.1049/ntw2.12017 |
work_keys_str_mv | AT adelralharbi enhancingtopicclusteringforarabicsecuritynewsbasedonkmeansandtopicmodelling AT mohammadhijji enhancingtopicclusteringforarabicsecuritynewsbasedonkmeansandtopicmodelling AT ameraljaedi enhancingtopicclusteringforarabicsecuritynewsbasedonkmeansandtopicmodelling |