A layered aggregate engine for analytics workloads
This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their da...
Main Authors: | , , , , |
---|---|
Format: | Conference item |
Published: |
Association for Computing Machinery
2019
|
_version_ | 1797054173234069504 |
---|---|
author | Schleich, M Olteanu, D Abo Khamis, M Ngo, H Nguyen, L |
author_facet | Schleich, M Olteanu, D Abo Khamis, M Ngo, H Nguyen, L |
author_sort | Schleich, M |
collection | OXFORD |
description | This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing. LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization. We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases. |
first_indexed | 2024-03-06T18:53:30Z |
format | Conference item |
id | oxford-uuid:1109582d-c96f-44f5-ae31-b5189057080a |
institution | University of Oxford |
last_indexed | 2024-03-06T18:53:30Z |
publishDate | 2019 |
publisher | Association for Computing Machinery |
record_format | dspace |
spelling | oxford-uuid:1109582d-c96f-44f5-ae31-b5189057080a2022-03-26T09:59:54ZA layered aggregate engine for analytics workloadsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:1109582d-c96f-44f5-ae31-b5189057080aSymplectic Elements at OxfordAssociation for Computing Machinery2019Schleich, MOlteanu, DAbo Khamis, MNgo, HNguyen, LThis paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing. LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization. We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases. |
spellingShingle | Schleich, M Olteanu, D Abo Khamis, M Ngo, H Nguyen, L A layered aggregate engine for analytics workloads |
title | A layered aggregate engine for analytics workloads |
title_full | A layered aggregate engine for analytics workloads |
title_fullStr | A layered aggregate engine for analytics workloads |
title_full_unstemmed | A layered aggregate engine for analytics workloads |
title_short | A layered aggregate engine for analytics workloads |
title_sort | layered aggregate engine for analytics workloads |
work_keys_str_mv | AT schleichm alayeredaggregateengineforanalyticsworkloads AT olteanud alayeredaggregateengineforanalyticsworkloads AT abokhamism alayeredaggregateengineforanalyticsworkloads AT ngoh alayeredaggregateengineforanalyticsworkloads AT nguyenl alayeredaggregateengineforanalyticsworkloads AT schleichm layeredaggregateengineforanalyticsworkloads AT olteanud layeredaggregateengineforanalyticsworkloads AT abokhamism layeredaggregateengineforanalyticsworkloads AT ngoh layeredaggregateengineforanalyticsworkloads AT nguyenl layeredaggregateengineforanalyticsworkloads |