Efficient replication of large data objects

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.

Bibliographic Details
Main Author: Fan, Rui, 1977-
Other Authors: Nancy A. Lynch.
Format: Thesis
Language:eng
Published: Massachusetts Institute of Technology 2006
Subjects:
Online Access:http://hdl.handle.net/1721.1/29581
_version_ 1826210079690457088
author Fan, Rui, 1977-
author2 Nancy A. Lynch.
author_facet Nancy A. Lynch.
Fan, Rui, 1977-
author_sort Fan, Rui, 1977-
collection MIT
description Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.
first_indexed 2024-09-23T14:42:07Z
format Thesis
id mit-1721.1/29581
institution Massachusetts Institute of Technology
language eng
last_indexed 2024-09-23T14:42:07Z
publishDate 2006
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/295812019-04-12T15:56:09Z Efficient replication of large data objects Fan, Rui, 1977- Nancy A. Lynch. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003. Includes bibliographical references (p. 77-78). Replication is an important technique for improving the reliability and scalability of data services. The primary problem encountered in replication is the trade-off between amount of replication, performance, and consistency. A rule of thumb states that any replication algorithm must sacrifice at least one of these criteria. In this thesis, we investigate replicating large data objects, such as files, whose size is large compared to metadata used by the replication algorithm. With this assumption, we present a distributed replication algorithm which simultaneously achieves a high replication factor, nearly optimal performance, and strong data consistency. Furthermore, our algorithm makes only basic assumptions about its environment. Our algorithm works in any asynchronous, reliable message-passing network, without relying on higher level functions such as distributed locking or group communication. Our algorithm is suitable for implementation in both LAN and WAN settings. This thesis is divided into two parts. In the first part, we formally state the assumptions and guarantees of our replication algorithm in terms of its trace properties. We then formally implement our algorithm in the IOA modeling language. We also give rigorous proofs of the algorithm's correctness and its performance analysis. The main idea of our algorithm is to separately maintain copies of the data, and information about the locations of the up-to-date copies. Our algorithm then mostly performs cheap operations on the location information, and avoids expensive operations on the actual data. The second part of this thesis presents two lower bounds on the costs of data replication. The first lower bound gives the minimum number of writes that must occur during a read operation. The second lower bound states that for a certain class of efficient replication algorithms, the replicas must use storage proportional to the maximum number of concurrent writers. The motivation for these lower bounds was certain algorithmic techniques we used in our replication algorithm. The lower bounds suggest that these techniques are necessary. The lower bounds are also of independent interest. by Rui Fan. S.M. 2006-03-24T16:04:29Z 2006-03-24T16:04:29Z 2003 2003 Thesis http://hdl.handle.net/1721.1/29581 52758605 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 78 p. 3363563 bytes 3363369 bytes application/pdf application/pdf application/pdf Massachusetts Institute of Technology
spellingShingle Electrical Engineering and Computer Science.
Fan, Rui, 1977-
Efficient replication of large data objects
title Efficient replication of large data objects
title_full Efficient replication of large data objects
title_fullStr Efficient replication of large data objects
title_full_unstemmed Efficient replication of large data objects
title_short Efficient replication of large data objects
title_sort efficient replication of large data objects
topic Electrical Engineering and Computer Science.
url http://hdl.handle.net/1721.1/29581
work_keys_str_mv AT fanrui1977 efficientreplicationoflargedataobjects