The storage vs repair bandwidth trade-off for multiple failures in clustered storage networks

We study the trade-off between storage overhead and inter-cluster repair bandwidth in clustered storage systems, while recovering from multiple node failures within a cluster. A cluster is a collection of m nodes, and there are n clusters. For data collection, we download the entire content from any...

Full description

Bibliographic Details
Main Authors: Medard, Muriel, Abdrashitov, Vitaly, Prakash, N.
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE) 2019
Online Access:https://hdl.handle.net/1721.1/121591
Description
Summary:We study the trade-off between storage overhead and inter-cluster repair bandwidth in clustered storage systems, while recovering from multiple node failures within a cluster. A cluster is a collection of m nodes, and there are n clusters. For data collection, we download the entire content from any k clusters. For repair of t ≥ 2 nodes within a cluster, we take help from ℓ local nodes, as well as d helper clusters. We characterize the optimal trade-off under functional repair, and also under exact repair for the minimum storage and minimum inter-cluster bandwidth (MBR) operating points. Our bounds show the following interesting facts: 1) When t(m - ℓ) the tradeoff is the same as that under t = 1, and thus there is no advantage in jointly repairing multiple nodes, 2) When t (m - ℓ), the optimal file-size at the MBR point under exact repair can be strictly less than that under functional repair. 3) Unlike the case of t = 1, increasing the number of local helper nodes does not necessarily increase the system capacity under functional repair.