A layered aggregate engine for analytics workloads

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their da...

Full description

Bibliographic Details
Main Authors: Schleich, M, Olteanu, D, Abo Khamis, M, Ngo, H, Nguyen, L
Format: Conference item
Published: Association for Computing Machinery 2019
_version_ 1797054173234069504
author Schleich, M
Olteanu, D
Abo Khamis, M
Ngo, H
Nguyen, L
author_facet Schleich, M
Olteanu, D
Abo Khamis, M
Ngo, H
Nguyen, L
author_sort Schleich, M
collection OXFORD
description This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing. LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization. We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.
first_indexed 2024-03-06T18:53:30Z
format Conference item
id oxford-uuid:1109582d-c96f-44f5-ae31-b5189057080a
institution University of Oxford
last_indexed 2024-03-06T18:53:30Z
publishDate 2019
publisher Association for Computing Machinery
record_format dspace
spelling oxford-uuid:1109582d-c96f-44f5-ae31-b5189057080a2022-03-26T09:59:54ZA layered aggregate engine for analytics workloadsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:1109582d-c96f-44f5-ae31-b5189057080aSymplectic Elements at OxfordAssociation for Computing Machinery2019Schleich, MOlteanu, DAbo Khamis, MNgo, HNguyen, LThis paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing. LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization. We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.
spellingShingle Schleich, M
Olteanu, D
Abo Khamis, M
Ngo, H
Nguyen, L
A layered aggregate engine for analytics workloads
title A layered aggregate engine for analytics workloads
title_full A layered aggregate engine for analytics workloads
title_fullStr A layered aggregate engine for analytics workloads
title_full_unstemmed A layered aggregate engine for analytics workloads
title_short A layered aggregate engine for analytics workloads
title_sort layered aggregate engine for analytics workloads
work_keys_str_mv AT schleichm alayeredaggregateengineforanalyticsworkloads
AT olteanud alayeredaggregateengineforanalyticsworkloads
AT abokhamism alayeredaggregateengineforanalyticsworkloads
AT ngoh alayeredaggregateengineforanalyticsworkloads
AT nguyenl alayeredaggregateengineforanalyticsworkloads
AT schleichm layeredaggregateengineforanalyticsworkloads
AT olteanud layeredaggregateengineforanalyticsworkloads
AT abokhamism layeredaggregateengineforanalyticsworkloads
AT ngoh layeredaggregateengineforanalyticsworkloads
AT nguyenl layeredaggregateengineforanalyticsworkloads