On the weakest failure detector ever

Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. Th...

Full description

Bibliographic Details
Main Authors: Kuznetsov, Petr, Herlihy, Maurice, Newport, Calvin Charles, Lynch, Nancy Ann, Guerraoui, Rachid
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:en_US
Published: Springer Berlin Heidelberg 2010
Online Access:http://hdl.handle.net/1721.1/51039
https://orcid.org/0000-0003-3045-265X
_version_ 1811080887042834432
author Kuznetsov, Petr
Herlihy, Maurice
Newport, Calvin Charles
Lynch, Nancy Ann
Guerraoui, Rachid
author2 Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
author_facet Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Kuznetsov, Petr
Herlihy, Maurice
Newport, Calvin Charles
Lynch, Nancy Ann
Guerraoui, Rachid
author_sort Kuznetsov, Petr
collection MIT
description Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted $${\Upsilon}$$ , that provides very little information about failures. In every run of the distributed system, $${\Upsilon}$$ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, $${\Upsilon}$$ still captures non-trivial failure information. We show that $${\Upsilon}$$ is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that $${\Upsilon}$$ is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as $${\Upsilon}$$ does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction $${\Upsilon^f}$$ that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.
first_indexed 2024-09-23T11:38:27Z
format Article
id mit-1721.1/51039
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T11:38:27Z
publishDate 2010
publisher Springer Berlin Heidelberg
record_format dspace
spelling mit-1721.1/510392022-10-01T04:59:12Z On the weakest failure detector ever Kuznetsov, Petr Herlihy, Maurice Newport, Calvin Charles Lynch, Nancy Ann Guerraoui, Rachid Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Lynch, Nancy Ann Lynch, Nancy Ann Guerraoui, Rachid Newport, Calvin Charles Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted $${\Upsilon}$$ , that provides very little information about failures. In every run of the distributed system, $${\Upsilon}$$ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, $${\Upsilon}$$ still captures non-trivial failure information. We show that $${\Upsilon}$$ is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that $${\Upsilon}$$ is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as $${\Upsilon}$$ does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction $${\Upsilon^f}$$ that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement. 2010-01-29T18:51:35Z 2010-01-29T18:51:35Z 2009-01 2008-11 Article http://purl.org/eprint/type/SubmittedJournalArticle 1432-0452 0178-2770 http://hdl.handle.net/1721.1/51039 Guerraoui, Rachid et al. “On the weakest failure detector ever.” Distributed Computing 21.5 (2009): 353-366. https://orcid.org/0000-0003-3045-265X en_US http://dx.doi.org/10.1007/s00446-009-0079-3 Distributed Computing Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Springer Berlin Heidelberg Joanne Hanley
spellingShingle Kuznetsov, Petr
Herlihy, Maurice
Newport, Calvin Charles
Lynch, Nancy Ann
Guerraoui, Rachid
On the weakest failure detector ever
title On the weakest failure detector ever
title_full On the weakest failure detector ever
title_fullStr On the weakest failure detector ever
title_full_unstemmed On the weakest failure detector ever
title_short On the weakest failure detector ever
title_sort on the weakest failure detector ever
url http://hdl.handle.net/1721.1/51039
https://orcid.org/0000-0003-3045-265X
work_keys_str_mv AT kuznetsovpetr ontheweakestfailuredetectorever
AT herlihymaurice ontheweakestfailuredetectorever
AT newportcalvincharles ontheweakestfailuredetectorever
AT lynchnancyann ontheweakestfailuredetectorever
AT guerraouirachid ontheweakestfailuredetectorever