A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques
In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supp...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-12-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/23/8674 |
_version_ | 1797545730639921152 |
---|---|
author | Roberto Tardío Alejandro Maté Juan Trujillo |
author_facet | Roberto Tardío Alejandro Maté Juan Trujillo |
author_sort | Roberto Tardío |
collection | DOAJ |
description | In recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study. |
first_indexed | 2024-03-10T14:20:08Z |
format | Article |
id | doaj.art-1b14c662e2244c7fb77948879a28342b |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T14:20:08Z |
publishDate | 2020-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-1b14c662e2244c7fb77948879a28342b2023-11-20T23:27:22ZengMDPI AGApplied Sciences2076-34172020-12-011023867410.3390/app10238674A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation TechniquesRoberto Tardío0Alejandro Maté1Juan Trujillo2Stratebi Business Solutions Ltd., 28020 Madrid, SpainLucentia Lab Ltd., University of Alicante, 03690 Alicante, SpainLucentia Lab Ltd., University of Alicante, 03690 Alicante, SpainIn recent years, several new technologies have enabled OLAP processing over Big Data sources. Among these technologies, we highlight those that allow data pre-aggregation because of their demonstrated performance in data querying. This is the case of Apache Kylin, a Hadoop based technology that supports sub-second queries over fact tables with billions of rows combined with ultra high cardinality dimensions. However, taking advantage of data pre-aggregation techniques to designing analytic models for Big Data OLAP is not a trivial task. It requires very advanced knowledge of the underlying technologies and user querying patterns. A wrong design of the OLAP cube alters significantly several key performance metrics, including: (i) the analytic capabilities of the cube (time and ability to provide an answer to a query), (ii) size of the OLAP cube, and (iii) time required to build the OLAP cube. Therefore, in this paper we (i) propose a benchmark to aid Big Data OLAP designers to choose the most suitable cube design for their goals, (ii) we identify and describe the main requirements and trade-offs for effectively designing a Big Data OLAP cube taking advantage of data pre-aggregation techniques, and (iii) we validate our benchmark in a case study.https://www.mdpi.com/2076-3417/10/23/8674OLAPbig databenchmarkingdata warehousing |
spellingShingle | Roberto Tardío Alejandro Maté Juan Trujillo A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques Applied Sciences OLAP big data benchmarking data warehousing |
title | A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques |
title_full | A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques |
title_fullStr | A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques |
title_full_unstemmed | A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques |
title_short | A New Big Data Benchmark for OLAP Cube Design Using Data Pre-Aggregation Techniques |
title_sort | new big data benchmark for olap cube design using data pre aggregation techniques |
topic | OLAP big data benchmarking data warehousing |
url | https://www.mdpi.com/2076-3417/10/23/8674 |
work_keys_str_mv | AT robertotardio anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT alejandromate anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT juantrujillo anewbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT robertotardio newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT alejandromate newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques AT juantrujillo newbigdatabenchmarkforolapcubedesignusingdatapreaggregationtechniques |