A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supp...

Full description

Bibliographic Details
Main Authors:	Roberto Tardío, Alejandro Maté, Juan Trujillo
Format:	Article
Language:	English
Published:	MDPI AG 2020-12-01
Series:	Applied Sciences
Subjects:	OLAP big data benchmarking data warehousing
Online Access:	https://www.mdpi.com/2076-3417/10/23/8674

_version_	1797545730639921152
author	Roberto Tardío Alejandro Maté Juan Trujillo
author_facet	Roberto Tardío Alejandro Maté Juan Trujillo
author_sort	Roberto Tardío
collection	DOAJ
description	In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.
first_indexed	2024-03-10T14:20:08Z
format	Article
id	doaj.art-1b14c662e2244c7fb77948879a28342b
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T14:20:08Z
publishDate	2020-12-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-1b14c662e2244c7fb77948879a28342b2023-11-20T23:27:22ZengMDPI AGApplied Sciences2076-34172020-12-011023867410.3390/app10238674A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation TechniquesRoberto Tardío0Alejandro Maté1Juan Trujillo2Stratebi Business Solutions Ltd., 28020 Madrid, SpainLucentia Lab Ltd., University of Alicante, 03690 Alicante, SpainLucentia Lab Ltd., University of Alicante, 03690 Alicante, SpainIn recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.https://www.mdpi.com/2076-3417/10/23/8674OLAPbig databenchmarkingdata warehousing
spellingShingle	Roberto Tardío Alejandro Maté Juan Trujillo A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques Applied Sciences OLAP big data benchmarking data warehousing
title	A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_full	A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_fullStr	A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_full_unstemmed	A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_short	A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_sort	new big data benchmark for olap cube design using data pre aggregation techniques
topic	OLAP big data benchmarking data warehousing
url	https://www.mdpi.com/2076-3417/10/23/8674
work_keys_str_mv	AT robertotardio anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT alejandromate anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT juantrujillo anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT robertotardio newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT alejandromate newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT juantrujillo newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques

A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

Similar Items