A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques

In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supp...

Full description

Bibliographic Details
Main Authors: Roberto Tardío, Alejandro Maté, Juan Trujillo
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/23/8674
_version_ 1797545730639921152
author Roberto Tardío
Alejandro Maté
Juan Trujillo
author_facet Roberto Tardío
Alejandro Maté
Juan Trujillo
author_sort Roberto Tardío
collection DOAJ
description In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.
first_indexed 2024-03-10T14:20:08Z
format Article
id doaj.art-1b14c662e2244c7fb77948879a28342b
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T14:20:08Z
publishDate 2020-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-1b14c662e2244c7fb77948879a28342b2023-11-20T23:27:22ZengMDPI AGApplied Sciences2076-34172020-12-011023867410.3390/app10238674A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation TechniquesRoberto Tardío0Alejandro Maté1Juan Trujillo2Stratebi Business Solutions Ltd., 28020 Madrid, SpainLucentia Lab Ltd., University of Alicante, 03690 Alicante, SpainLucentia Lab Ltd., University of Alicante, 03690 Alicante, SpainIn recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.https://www.mdpi.com/2076-3417/10/23/8674OLAPbig databenchmarkingdata warehousing
spellingShingle Roberto Tardío
Alejandro Maté
Juan Trujillo
A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
Applied Sciences
OLAP
big data
benchmarking
data warehousing
title A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_full A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_fullStr A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_full_unstemmed A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_short A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
title_sort new big data benchmark for olap cube design using data pre aggregation techniques
topic OLAP
big data
benchmarking
data warehousing
url https://www.mdpi.com/2076-3417/10/23/8674
work_keys_str_mv AT robertotardio anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques
AT alejandromate anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques
AT juantrujillo anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques
AT robertotardio newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques
AT alejandromate newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques
AT juantrujillo newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques