Comparative Study of Classification Techniques For Large Scale Data - Case Study

The existence of Massive datasets that are generated in many applications provides various opportunities and challenges. Especially, scalable mining of such large-scale datasets is a challenging issue that attracted some recent research. In the present study, the main focus is to analyse the classif...

Full description

Bibliographic Details
Main Authors: Nigar M.Shafiq Surameery, Dana Lattef Hussein
Format: Article
Language:English
Published: Sulaimani Polytechnic University 2017-08-01
Series:Kurdistan Journal of Applied Research
Subjects:
Online Access:http://kjar.spu.edu.iq/index.php/kjar/article/view/67
Description
Summary:The existence of Massive datasets that are generated in many applications provides various opportunities and challenges. Especially, scalable mining of such large-scale datasets is a challenging issue that attracted some recent research. In the present study, the main focus is to analyse the classification techniques using WEKA machine learning workbench. Moreover, a large-scale dataset was used. This dataset comes from the protein structure prediction field. It has already been partitioned into training and test sets using the ten-fold cross-validation methodology. In this experiment, nine different methods have been tested. As a result, it became obvious that it is not applicable to test more than one classifier from the (tree) family in the same experiment. On the other hand, using (NaiveBayes) Classifier with the default properties of the attribute selection filter has a great time consuming. Finally, varying the parameters of the attribute selections should be prioritized for more accurate results.
ISSN:2411-7684
2411-7706