Automatic Reconfiguration for Large-Scale Reliable Storage Systems

Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in...

Full description

Bibliographic Details
Main Authors:	Rodrigues, Rodrigo, Liskov, Barbara H., Chen, Kathryn, Liskov, Moses, Schultz, David
Other Authors:	Massachusetts Institute of Technology. System Design and Management Program
Format:	Article
Language:	en_US
Published:	Institute of Electrical and Electronics Engineers (IEEE) 2012
Online Access:	http://hdl.handle.net/1721.1/72134 https://orcid.org/0000-0002-5914-1866

_version_	1826196486008864768
author	Rodrigues, Rodrigo Liskov, Barbara H. Chen, Kathryn Liskov, Moses Schultz, David
author2	Massachusetts Institute of Technology. System Design and Management Program
author_facet	Massachusetts Institute of Technology. System Design and Management Program Rodrigues, Rodrigo Liskov, Barbara H. Chen, Kathryn Liskov, Moses Schultz, David
author_sort	Rodrigues, Rodrigo
collection	MIT
description	Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low.
first_indexed	2024-09-23T10:27:42Z
format	Article
id	mit-1721.1/72134
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T10:27:42Z
publishDate	2012
publisher	Institute of Electrical and Electronics Engineers (IEEE)
record_format	dspace
spelling	mit-1721.1/721342022-09-27T09:37:50Z Automatic Reconfiguration for Large-Scale Reliable Storage Systems Rodrigues, Rodrigo Liskov, Barbara H. Chen, Kathryn Liskov, Moses Schultz, David Massachusetts Institute of Technology. System Design and Management Program Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Liskov, Barbara H. Liskov, Barbara H. Chen, Kathryn Schultz, David Byzantine-fault-tolerant replication enhances the availability and reliability of Internet services that store critical state and preserve it despite attacks or software errors. However, existing Byzantine-fault-tolerant storage systems either assume a static set of replicas, or have limitations in how they handle reconfigurations (e.g., in terms of the scalability of the solutions or the consistency levels they provide). This can be problematic in long-lived, large-scale systems where system membership is likely to change during the system lifetime. In this paper, we present a complete solution for dynamically changing system membership in a large-scale Byzantine-fault-tolerant system. We present a service that tracks system membership and periodically notifies other system nodes of membership changes. The membership service runs mostly automatically, to avoid human configuration errors; is itself Byzantine-fault-tolerant and reconfigurable; and provides applications with a sequence of consistent views of the system membership. We demonstrate the utility of this membership service by using it in a novel distributed hash table called dBQS that provides atomic semantics even across changes in replica sets. dBQS is interesting in its own right because its storage algorithms extend existing Byzantine quorum protocols to handle changes in the replica set, and because it differs from previous DHTs by providing Byzantine fault tolerance and offering strong semantics. We implemented the membership service and dBQS. Our results show that the approach works well, in practice: the membership service is able to manage a large system and the cost to change the system membership is low. 2012-08-15T13:10:09Z 2012-08-15T13:10:09Z 2010-09 2010-01 Article http://purl.org/eprint/type/JournalArticle 1545-5971 http://hdl.handle.net/1721.1/72134 Rodrigues, Rodrigo et al. “Automatic Reconfiguration for Large-Scale Reliable Storage Systems.” IEEE Transactions on Dependable and Secure Computing 9.2 (2010): 145–158. https://orcid.org/0000-0002-5914-1866 en_US http://dx.doi.org/10.1109/tdsc.2010.52 IEEE Transactions on Dependable and Secure Computing Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/ application/pdf Institute of Electrical and Electronics Engineers (IEEE) Other University Web Domain
spellingShingle	Rodrigues, Rodrigo Liskov, Barbara H. Chen, Kathryn Liskov, Moses Schultz, David Automatic Reconfiguration for Large-Scale Reliable Storage Systems
title	Automatic Reconfiguration for Large-Scale Reliable Storage Systems
title_full	Automatic Reconfiguration for Large-Scale Reliable Storage Systems
title_fullStr	Automatic Reconfiguration for Large-Scale Reliable Storage Systems
title_full_unstemmed	Automatic Reconfiguration for Large-Scale Reliable Storage Systems
title_short	Automatic Reconfiguration for Large-Scale Reliable Storage Systems
title_sort	automatic reconfiguration for large scale reliable storage systems
url	http://hdl.handle.net/1721.1/72134 https://orcid.org/0000-0002-5914-1866
work_keys_str_mv	AT rodriguesrodrigo automaticreconfigurationforlargescalereliablestoragesystems AT liskovbarbarah automaticreconfigurationforlargescalereliablestoragesystems AT chenkathryn automaticreconfigurationforlargescalereliablestoragesystems AT liskovmoses automaticreconfigurationforlargescalereliablestoragesystems AT schultzdavid automaticreconfigurationforlargescalereliablestoragesystems

Automatic Reconfiguration for Large-Scale Reliable Storage Systems

Similar Items