On experimenting large dataset for visualization using distributed learning and tree plotting techniques

Visualization as one major field making up data science has played significant roles in data exploration. With visualization at the center of every data analysis and application, exploratory analysis has proved the basis for which data analyst comparatively implement what-if scenario before and afte...

Full description

Bibliographic Details
Main Authors: Olanrewaju V. Johnson, Olayinka T. Jinadu, Olomi I. Aladesote
Format: Article
Language:English
Published: Elsevier 2020-07-01
Series:Scientific African
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2468227620302040
Description
Summary:Visualization as one major field making up data science has played significant roles in data exploration. With visualization at the center of every data analysis and application, exploratory analysis has proved the basis for which data analyst comparatively implement what-if scenario before and after processing. Interesting patterns generated from models visualized is very helpful in fast decision-making, model tuning and optimization. Although, conventional methods such as histogram, pie chart, box plot and bar graph are in most occasions not adequate enough to effectively convey the interesting pattern to be mined in large dataset. This paper therefore, presents a tree-plot approach which make use of an In-memory node mechanism from the h2o package to place the large dataset in memory. A Gradient Boosted Model (GBM) from same was implemented as the underlying learning algorithm to build the tree model, while the modeled trees were plotted using plotting techniques in data.tree. Execution process time, AUC, MSE and RMSE results obtained provide basis for evaluating how well the data was trained and for visualizing the modeled tree. It further substantial how a learner algorithm could work with a plotting method with less computational cost.
ISSN:2468-2276