Fault-tolerance and load management in a distributed stream processing system
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | eng |
Published: |
Massachusetts Institute of Technology
2007
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/35287 |
_version_ | 1826206169938526208 |
---|---|
author | Balazinska, Magdalena |
author2 | Hari Balakrishnan. |
author_facet | Hari Balakrishnan. Balazinska, Magdalena |
author_sort | Balazinska, Magdalena |
collection | MIT |
description | Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006. |
first_indexed | 2024-09-23T13:25:11Z |
format | Thesis |
id | mit-1721.1/35287 |
institution | Massachusetts Institute of Technology |
language | eng |
last_indexed | 2024-09-23T13:25:11Z |
publishDate | 2007 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/352872019-04-10T12:50:24Z Fault-tolerance and load management in a distributed stream processing system Balazinska, Magdalena Hari Balakrishnan. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Includes bibliographical references (p. 187-199). Advances in monitoring technology (e.g., sensors) and an increased demand for online information processing have given rise to a new class of applications that require continuous, low-latency processing of large-volume data streams. These "stream processing applications" arise in many areas such as sensor-based environment monitoring, financial services, network monitoring, and military applications. Because traditional database management systems are ill-suited for high-volume, low-latency stream processing, new systems, called stream processing engines (SPEs), have been developed. Furthermore, because stream processing applications are inherently distributed, and because distribution can improve performance and scalability, researchers have also proposed and developed distributed SPEs. In this dissertation, we address two challenges faced by a distributed SPE: (1) faulttolerant operation in the face of node failures, network failures, and network partitions, and (2) federated load management. For fault-tolerance, we present a replication-based scheme, called Delay, Process, and Correct (DPC), that masks most node and network failures. (cont.) When network partitions occur, DPC addresses the traditional availability-consistency trade-off by maintaining, when possible, a desired availability specified by the application or user, but eventually also delivering the correct results. While maintaining the desired availability bounds, DPC also strives to minimize the number of inaccurate results that must later be corrected. In contrast to previous proposals for fault tolerance in SPEs, DPC simultaneously supports a variety of applications that differ in their preferred trade-off between availability and consistency. For load management, we present a Bounded-Price Mechanism (BPM) that enables autonomous participants to collaboratively handle their load without individually owning the resources necessary for peak operation. BPM is based on contracts that participants negotiate offline. At runtime, participants move load only to partners with whom they have a contract and pay each other the contracted price. We show that BPM provides incentives that foster participation and leads to good system-wide load distribution. In contrast to earlier proposals based on computational economies, BPM is lightweight, enables participants to develop and exploit preferential relationships, and provides stability and predictability. (cont.) Although motivated by stream processing, BPM is general and can be applied to any federated system. We have implemented both schemes in the Borealis distributed stream processing engine. They will be available with the next release of the system. by Magdalena Balazinska. Ph.D. 2007-01-10T15:34:31Z 2007-01-10T15:34:31Z 2005 2006 Thesis http://hdl.handle.net/1721.1/35287 72680369 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 199 p. 6212416 bytes 7581442 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology |
spellingShingle | Electrical Engineering and Computer Science. Balazinska, Magdalena Fault-tolerance and load management in a distributed stream processing system |
title | Fault-tolerance and load management in a distributed stream processing system |
title_full | Fault-tolerance and load management in a distributed stream processing system |
title_fullStr | Fault-tolerance and load management in a distributed stream processing system |
title_full_unstemmed | Fault-tolerance and load management in a distributed stream processing system |
title_short | Fault-tolerance and load management in a distributed stream processing system |
title_sort | fault tolerance and load management in a distributed stream processing system |
topic | Electrical Engineering and Computer Science. |
url | http://hdl.handle.net/1721.1/35287 |
work_keys_str_mv | AT balazinskamagdalena faulttoleranceandloadmanagementinadistributedstreamprocessingsystem |