Dataset of Arabic spam and ham tweets
This data article provides a dataset of 132421 posts and their corresponding information collected from Twitter social media. The data has two classes, ham or spam, where ham indicates non-spam clean tweets. The main target of this dataset is to study a way to classify whether a post is a spam or no...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2024-02-01
|
Series: | Data in Brief |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340923009472 |
_version_ | 1827353697252802560 |
---|---|
author | Sanaa Kaddoura Safaa Henno |
author_facet | Sanaa Kaddoura Safaa Henno |
author_sort | Sanaa Kaddoura |
collection | DOAJ |
description | This data article provides a dataset of 132421 posts and their corresponding information collected from Twitter social media. The data has two classes, ham or spam, where ham indicates non-spam clean tweets. The main target of this dataset is to study a way to classify whether a post is a spam or not automatically. The data is in Arabic language only, which makes the data essential to the researchers in Arabic natural language processing (NLP) due to the lack of resources in this language. The data is made publicly available to allow researchers to use it as a benchmark for their research in Arabic NLP. The dataset was collected using the Twitter REST API between January 27, 2021, and March 10, 2021. An ad-hoc crawler was constructed using Python programming language to collect the data. Many scientists and researchers will benefit from this dataset in the domain of cybersecurity, NLP, data science and social networking analysis. |
first_indexed | 2024-03-08T03:30:14Z |
format | Article |
id | doaj.art-5542bfcc0aae4d998cdd9a264876563a |
institution | Directory Open Access Journal |
issn | 2352-3409 |
language | English |
last_indexed | 2024-03-08T03:30:14Z |
publishDate | 2024-02-01 |
publisher | Elsevier |
record_format | Article |
series | Data in Brief |
spelling | doaj.art-5542bfcc0aae4d998cdd9a264876563a2024-02-11T05:10:30ZengElsevierData in Brief2352-34092024-02-0152109904Dataset of Arabic spam and ham tweetsSanaa Kaddoura0Safaa Henno1Corresponding author.; Zayed University, Abu Dhabi, UAEZayed University, Abu Dhabi, UAEThis data article provides a dataset of 132421 posts and their corresponding information collected from Twitter social media. The data has two classes, ham or spam, where ham indicates non-spam clean tweets. The main target of this dataset is to study a way to classify whether a post is a spam or not automatically. The data is in Arabic language only, which makes the data essential to the researchers in Arabic natural language processing (NLP) due to the lack of resources in this language. The data is made publicly available to allow researchers to use it as a benchmark for their research in Arabic NLP. The dataset was collected using the Twitter REST API between January 27, 2021, and March 10, 2021. An ad-hoc crawler was constructed using Python programming language to collect the data. Many scientists and researchers will benefit from this dataset in the domain of cybersecurity, NLP, data science and social networking analysis.http://www.sciencedirect.com/science/article/pii/S2352340923009472TwitterLabelled dataClassificationMachine learningDeep learningCybersecurity |
spellingShingle | Sanaa Kaddoura Safaa Henno Dataset of Arabic spam and ham tweets Data in Brief Labelled data Classification Machine learning Deep learning Cybersecurity |
title | Dataset of Arabic spam and ham tweets |
title_full | Dataset of Arabic spam and ham tweets |
title_fullStr | Dataset of Arabic spam and ham tweets |
title_full_unstemmed | Dataset of Arabic spam and ham tweets |
title_short | Dataset of Arabic spam and ham tweets |
title_sort | dataset of arabic spam and ham tweets |
topic | Twitter Labelled data Classification Machine learning Deep learning Cybersecurity |
url | http://www.sciencedirect.com/science/article/pii/S2352340923009472 |
work_keys_str_mv | AT sanaakaddoura datasetofarabicspamandhamtweets AT safaahenno datasetofarabicspamandhamtweets |