Tuplex: robust, efficient analytics when Python rules
© 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools whi...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
VLDB Endowment
2021
|
Online Access: | https://hdl.handle.net/1721.1/132284 |
_version_ | 1826205377007452160 |
---|---|
author | Spiegelberg, Leonhard F Kraska, Tim |
author_facet | Spiegelberg, Leonhard F Kraska, Tim |
author_sort | Spiegelberg, Leonhard F |
collection | MIT |
description | © 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools which better integrate with the Python landscape and do not have the impedance mismatch like Spark. In this paper, we demonstrate Tuplex (short for tuples and exceptions), a Pythonnative data preparation framework that allows users to develop and deploy pipelines faster and more robustly while providing bare-metal execution times through code compilation whenever possible. |
first_indexed | 2024-09-23T13:12:01Z |
format | Article |
id | mit-1721.1/132284 |
institution | Massachusetts Institute of Technology |
language | English |
last_indexed | 2024-09-23T13:12:01Z |
publishDate | 2021 |
publisher | VLDB Endowment |
record_format | dspace |
spelling | mit-1721.1/1322842021-09-21T03:31:07Z Tuplex: robust, efficient analytics when Python rules Spiegelberg, Leonhard F Kraska, Tim © 2019 VLDB Endowment. Spark became the defacto industry standard as an execution engine for data preparation, cleaning, distributed machine learning, streaming and, warehousing over raw data. However, with the success of Python the landscape is shifting again; there is a strong demand for tools which better integrate with the Python landscape and do not have the impedance mismatch like Spark. In this paper, we demonstrate Tuplex (short for tuples and exceptions), a Pythonnative data preparation framework that allows users to develop and deploy pipelines faster and more robustly while providing bare-metal execution times through code compilation whenever possible. 2021-09-20T18:21:39Z 2021-09-20T18:21:39Z 2021-01-11T16:52:56Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/132284 en 10.14778/3352063.3352109 Proceedings of the VLDB Endowment Creative Commons Attribution-NonCommercial-NoDerivs License http://creativecommons.org/licenses/by-nc-nd/4.0/ application/pdf VLDB Endowment VLDB Endowment |
spellingShingle | Spiegelberg, Leonhard F Kraska, Tim Tuplex: robust, efficient analytics when Python rules |
title | Tuplex: robust, efficient analytics when Python rules |
title_full | Tuplex: robust, efficient analytics when Python rules |
title_fullStr | Tuplex: robust, efficient analytics when Python rules |
title_full_unstemmed | Tuplex: robust, efficient analytics when Python rules |
title_short | Tuplex: robust, efficient analytics when Python rules |
title_sort | tuplex robust efficient analytics when python rules |
url | https://hdl.handle.net/1721.1/132284 |
work_keys_str_mv | AT spiegelbergleonhardf tuplexrobustefficientanalyticswhenpythonrules AT kraskatim tuplexrobustefficientanalyticswhenpythonrules |