Attributes Reduction in Big Data

Processing big data requires serious computing resources. Because of this challenge, big data processing is an issue not only for algorithms but also for computing resources. This article analyzes a large amount of data from different points of view. One perspective is the processing of reduced coll...

Full description

Bibliographic Details
Main Authors: Waleed Albattah, Rehan Ullah Khan, Khalil Khan
Format: Article
Language:English
Published: MDPI AG 2020-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/14/4901
_version_ 1797562235299561472
author Waleed Albattah
Rehan Ullah Khan
Khalil Khan
author_facet Waleed Albattah
Rehan Ullah Khan
Khalil Khan
author_sort Waleed Albattah
collection DOAJ
description Processing big data requires serious computing resources. Because of this challenge, big data processing is an issue not only for algorithms but also for computing resources. This article analyzes a large amount of data from different points of view. One perspective is the processing of reduced collections of big data with less computing resources. Therefore, the study analyzed 40 GB data to test various strategies to reduce data processing. Thus, the goal is to reduce this data, but not to compromise on the detection and model learning in machine learning. Several alternatives were analyzed, and it is found that in many cases and types of settings, data can be reduced to some extent without compromising detection efficiency. Tests of 200 attributes showed that with a performance loss of only 4%, more than 80% of the data could be ignored. The results found in the study, thus provide useful insights into large data analytics.
first_indexed 2024-03-10T18:25:18Z
format Article
id doaj.art-0c25f8846cf84fcbb09209efd98045be
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T18:25:18Z
publishDate 2020-07-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-0c25f8846cf84fcbb09209efd98045be2023-11-20T07:02:19ZengMDPI AGApplied Sciences2076-34172020-07-011014490110.3390/app10144901Attributes Reduction in Big DataWaleed Albattah0Rehan Ullah Khan1Khalil Khan2Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi ArabiaDepartment of Information Technology, College of Computer, Qassim University, Buraydah, Saudi ArabiaDepartment of Electrical Engineering, University of Azad Jammu and Kashmir, Muzaffarabad 13100, PakistanProcessing big data requires serious computing resources. Because of this challenge, big data processing is an issue not only for algorithms but also for computing resources. This article analyzes a large amount of data from different points of view. One perspective is the processing of reduced collections of big data with less computing resources. Therefore, the study analyzed 40 GB data to test various strategies to reduce data processing. Thus, the goal is to reduce this data, but not to compromise on the detection and model learning in machine learning. Several alternatives were analyzed, and it is found that in many cases and types of settings, data can be reduced to some extent without compromising detection efficiency. Tests of 200 attributes showed that with a performance loss of only 4%, more than 80% of the data could be ignored. The results found in the study, thus provide useful insights into large data analytics.https://www.mdpi.com/2076-3417/10/14/4901attributes samplingcontent-based filteringSupport Vector Machinesmachine learning
spellingShingle Waleed Albattah
Rehan Ullah Khan
Khalil Khan
Attributes Reduction in Big Data
Applied Sciences
attributes sampling
content-based filtering
Support Vector Machines
machine learning
title Attributes Reduction in Big Data
title_full Attributes Reduction in Big Data
title_fullStr Attributes Reduction in Big Data
title_full_unstemmed Attributes Reduction in Big Data
title_short Attributes Reduction in Big Data
title_sort attributes reduction in big data
topic attributes sampling
content-based filtering
Support Vector Machines
machine learning
url https://www.mdpi.com/2076-3417/10/14/4901
work_keys_str_mv AT waleedalbattah attributesreductioninbigdata
AT rehanullahkhan attributesreductioninbigdata
AT khalilkhan attributesreductioninbigdata