MacroBase: Prioritizing Attention in Fast Data

© 2018 Association for Computing Machinery. As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables eficient, accurate, a...

Full description

Bibliographic Details
Main Authors: Abuzaid, Firas, Bailis, Peter, Ding, Jialin, Gan, Edward, Madden, Samuel, Narayanan, Deepak, Rong, Kexin, Suri, Sahaana
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Association for Computing Machinery (ACM) 2021
Online Access:https://hdl.handle.net/1721.1/135069
_version_ 1826215032944328704
author Abuzaid, Firas
Bailis, Peter
Ding, Jialin
Gan, Edward
Madden, Samuel
Narayanan, Deepak
Rong, Kexin
Suri, Sahaana
author2 Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Abuzaid, Firas
Bailis, Peter
Ding, Jialin
Gan, Edward
Madden, Samuel
Narayanan, Deepak
Rong, Kexin
Suri, Sahaana
author_sort Abuzaid, Firas
collection MIT
description © 2018 Association for Computing Machinery. As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables eficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles.
first_indexed 2024-09-23T16:15:41Z
format Article
id mit-1721.1/135069
institution Massachusetts Institute of Technology
language English
last_indexed 2024-09-23T16:15:41Z
publishDate 2021
publisher Association for Computing Machinery (ACM)
record_format dspace
spelling mit-1721.1/1350692023-01-20T21:23:05Z MacroBase: Prioritizing Attention in Fast Data Abuzaid, Firas Bailis, Peter Ding, Jialin Gan, Edward Madden, Samuel Narayanan, Deepak Rong, Kexin Suri, Sahaana Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory © 2018 Association for Computing Machinery. As data volumes continue to rise, manual inspection is becoming increasingly untenable. In response, we present MacroBase, a data analytics engine that prioritizes end-user attention in high-volume fast data streams. MacroBase enables eficient, accurate, and modular analyses that highlight and aggregate important and unusual behavior, acting as a search engine for fast data. MacroBase is able to deliver order-of-magnitude speedups over alternatives by optimizing the combination of explanation (i.e., feature selection) and classification tasks and by leveraging a new reservoir sampler and heavy-hitters sketch specialized for fast data streams. As a result, MacroBase delivers accurate results at speeds of up to 2M events per second per query on a single core. The system has delivered meaningful results in production, including at a telematics company monitoring hundreds of thousands of vehicles. 2021-10-27T20:10:35Z 2021-10-27T20:10:35Z 2018 2019-06-18T17:06:52Z Article http://purl.org/eprint/type/JournalArticle https://hdl.handle.net/1721.1/135069 en 10.1145/3276463 ACM Transactions on Database Systems Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) Other repository
spellingShingle Abuzaid, Firas
Bailis, Peter
Ding, Jialin
Gan, Edward
Madden, Samuel
Narayanan, Deepak
Rong, Kexin
Suri, Sahaana
MacroBase: Prioritizing Attention in Fast Data
title MacroBase: Prioritizing Attention in Fast Data
title_full MacroBase: Prioritizing Attention in Fast Data
title_fullStr MacroBase: Prioritizing Attention in Fast Data
title_full_unstemmed MacroBase: Prioritizing Attention in Fast Data
title_short MacroBase: Prioritizing Attention in Fast Data
title_sort macrobase prioritizing attention in fast data
url https://hdl.handle.net/1721.1/135069
work_keys_str_mv AT abuzaidfiras macrobaseprioritizingattentioninfastdata
AT bailispeter macrobaseprioritizingattentioninfastdata
AT dingjialin macrobaseprioritizingattentioninfastdata
AT ganedward macrobaseprioritizingattentioninfastdata
AT maddensamuel macrobaseprioritizingattentioninfastdata
AT narayanandeepak macrobaseprioritizingattentioninfastdata
AT rongkexin macrobaseprioritizingattentioninfastdata
AT surisahaana macrobaseprioritizingattentioninfastdata