Fault-tolerance and load management in a distributed stream processing system

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.

Bibliographic Details
Main Author: Balazinska, Magdalena
Other Authors: Hari Balakrishnan.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2007
Subjects:
Online Access:http://hdl.handle.net/1721.1/35287
_version_ 1826206169938526208
author Balazinska, Magdalena
author2 Hari Balakrishnan.
author_facet Hari Balakrishnan.
Balazinska, Magdalena
author_sort Balazinska, Magdalena
collection MIT
description Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.
first_indexed 2024-09-23T13:25:11Z
format Thesis
id mit-1721.1/35287
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T13:25:11Z
publishDate 2007
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/352872019-04-10T12:50:24Z Fault-tolerance and load management in a distributed stream processing system Balazinska, Magdalena Hari Balakrishnan. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 187-199). Advances in monitoring technology (e.g., sensors) and an increased demand for online information processing have given rise to a new class of applications that require continuous, low-latency processing of large-volume data streams. These "stream processing applications" arise in many areas such as sensor-based environment monitoring, financial services, network monitoring, and military applications. Because traditional database management systems are ill-suited for high-volume, low-latency stream processing, new systems, called stream processing engines (SPEs), have been developed. Furthermore, because stream processing applications are inherently distributed, and because distribution can improve performance and scalability, researchers have also proposed and developed distributed SPEs. In this dissertation, we address two challenges faced by a distributed SPE: (1) faulttolerant operation in the face of node failures, network failures, and network partitions, and (2) federated load management. For fault-tolerance, we present a replication-based scheme, called Delay, Process, and Correct (DPC), that masks most node and network failures. (cont.) When network partitions occur, DPC addresses the traditional availability-consistency trade-off by maintaining, when possible, a desired availability specified by the application or user, but eventually also delivering the correct results. While maintaining the desired availability bounds, DPC also strives to minimize the number of inaccurate results that must later be corrected. In contrast to previous proposals for fault tolerance in SPEs, DPC simultaneously supports a variety of applications that differ in their preferred trade-off between availability and consistency. For load management, we present a Bounded-Price Mechanism (BPM) that enables autonomous participants to collaboratively handle their load without individually owning the resources necessary for peak operation. BPM is based on contracts that participants negotiate offline. At runtime, participants move load only to partners with whom they have a contract and pay each other the contracted price. We show that BPM provides incentives that foster participation and leads to good system-wide load distribution. In contrast to earlier proposals based on computational economies, BPM is lightweight, enables participants to develop and exploit preferential relationships, and provides stability and predictability. (cont.) Although motivated by stream processing, BPM is general and can be applied to any federated system. We have implemented both schemes in the Borealis distributed stream processing engine. They will be available with the next release of the system. by Magdalena Balazinska. Ph.D. 2007-01-10T15:34:31Z 2007-01-10T15:34:31Z 2005 2006 Thesis http://hdl.handle.net/1721.1/35287 72680369 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 199 p. 6212416 bytes 7581442 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Balazinska, Magdalena
Fault-tolerance and load management in a distributed stream processing system
title Fault-tolerance and load management in a distributed stream processing system
title_full Fault-tolerance and load management in a distributed stream processing system
title_fullStr Fault-tolerance and load management in a distributed stream processing system
title_full_unstemmed Fault-tolerance and load management in a distributed stream processing system
title_short Fault-tolerance and load management in a distributed stream processing system
title_sort fault tolerance and load management in a distributed stream processing system
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/35287
work_keys_str_mv AT balazinskamagdalena faulttoleranceandloadmanagementinadistributedstreamprocessingsystem