Tuplex: robust, efficient analytics when Python rules

© 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools whi...

Full description

Bibliographic Details
Main Authors: Spiegelberg, Leonhard F, Kraska, Tim
Format: Article
Language:English
Published: VLDB Endowment 2021
Online Access:https://hdl.handle.net/1721.1/132284
_version_ 1826205377007452160
author Spiegelberg, Leonhard F
Kraska, Tim
author_facet Spiegelberg, Leonhard F
Kraska, Tim
author_sort Spiegelberg, Leonhard F
collection MIT
description © 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools which better integrate with the Python landscape and do not have the impedance mismatch like Spark. In this paper, we demonstrate Tuplex (short for tuples and exceptions), a Pythonnative data preparation framework that allows users to develop and deploy pipelines faster and more robustly while providing bare-metal execution times through code compilation whenever possible.
first_indexed 2024-09-23T13:12:01Z
format Article
id mit-1721.1/132284
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T13:12:01Z
publishDate 2021
publisher VLDB Endowment
record_format dspace
spelling mit-1721.1/1322842021-09-21T03:31:07Z Tuplex: robust, efficient analytics when Python rules Spiegelberg, Leonhard F Kraska, Tim © 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools which better integrate with the Python landscape and do not have the impedance mismatch like Spark. In this paper, we demonstrate Tuplex (short for tuples and exceptions), a Pythonnative data preparation framework that allows users to develop and deploy pipelines faster and more robustly while providing bare-metal execution times through code compilation whenever possible. 2021-09-20T18:21:39Z 2021-09-20T18:21:39Z 2021-01-11T16:52:56Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/132284 en 10.14778/3352063.3352109 Proceedings of the VLDB Endowment Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf VLDB Endowment VLDB Endowment
spellingShingle Spiegelberg, Leonhard F
Kraska, Tim
Tuplex: robust, efficient analytics when Python rules
title Tuplex: robust, efficient analytics when Python rules
title_full Tuplex: robust, efficient analytics when Python rules
title_fullStr Tuplex: robust, efficient analytics when Python rules
title_full_unstemmed Tuplex: robust, efficient analytics when Python rules
title_short Tuplex: robust, efficient analytics when Python rules
title_sort tuplex robust efficient analytics when python rules
url https://hdl.handle.net/1721.1/132284
work_keys_str_mv AT spiegelbergleonhardf tuplexrobustefficientanalyticswhenpythonrules
AT kraskatim tuplexrobustefficientanalyticswhenpythonrules