ANAD: Arabic news article dataset

In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including s...

Full description

Bibliographic Details
Main Authors: Mohammed Altamimi, Abdulaziz M. Alayba
Format: Article
Language:English
Published: Elsevier 2023-10-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340923005607
_version_ 1797660464332668928
author Mohammed Altamimi
Abdulaziz M. Alayba
author_facet Mohammed Altamimi
Abdulaziz M. Alayba
author_sort Mohammed Altamimi
collection DOAJ
description In this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download athttps://github.com/alaybaa/ArabicArticlesDataset/tree/main.
first_indexed 2024-03-11T18:31:11Z
format Article
id doaj.art-ab153bffb8964010a788149ca0dec8f0
institution Directory Open Access Journal
issn 2352-3409
language English
last_indexed 2024-03-11T18:31:11Z
publishDate 2023-10-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj.art-ab153bffb8964010a788149ca0dec8f02023-10-13T11:04:42ZengElsevierData in Brief2352-34092023-10-0150109460ANAD: Arabic news article datasetMohammed Altamimi0Abdulaziz M. Alayba1Corresponding authors.; Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha'il, Ha'il, 81481, Saudi ArabiaCorresponding authors.; Department of Information and Computer Science, College of Computer Science and Engineering, University of Ha'il, Ha'il, 81481, Saudi ArabiaIn this paper, we present a modern standard Arabic dataset based on Arabic news articles collected over a one-year period from 01/01/2021 to 12/31/2021. In total, from 12 Arabic news websites, over 500,000 articles were collected, the selection of which was driven by a variety of topics, including sports, economies, local news, politics, tech, tourism, entertainment, cars, health, and art. The development of this dataset will enable data scientists to explore and experiment effectively in the field of natural language processing, and the dataset can also be used to develop machine learning and deep learning models to classify articles according to topic. The dataset is available for download athttps://github.com/alaybaa/ArabicArticlesDataset/tree/main.http://www.sciencedirect.com/science/article/pii/S2352340923005607Arabic news articlesData analysisClassificationNatural language processing (NLP)
spellingShingle Mohammed Altamimi
Abdulaziz M. Alayba
ANAD: Arabic news article dataset
Data in Brief
Arabic news articles
Data analysis
Classification
Natural language processing (NLP)
title ANAD: Arabic news article dataset
title_full ANAD: Arabic news article dataset
title_fullStr ANAD: Arabic news article dataset
title_full_unstemmed ANAD: Arabic news article dataset
title_short ANAD: Arabic news article dataset
title_sort anad arabic news article dataset
topic Arabic news articles
Data analysis
Classification
Natural language processing (NLP)
url http://www.sciencedirect.com/science/article/pii/S2352340923005607
work_keys_str_mv AT mohammedaltamimi anadarabicnewsarticledataset
AT abdulazizmalayba anadarabicnewsarticledataset