Summary: | The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound Δ on the time required for a message to be sent from one processor to another and a known fixed upper bound Φ on the relative speeds of different processors. In an asynchronous system, no fixed uppper bounds Δ and Φ exist. In one version of partial synchrony, fixed bounds Δ and Φ exist but they are not know a priori. The problem is to design protocols which work correctly in the partially synchronous system regardless of the actual values of the bounds Δ and Φ. In another version of partial synchrony, the bounds are known but they are only guaranteed to hold starting at some unknown time T, and protocols must be designed to work correctly regardless of when the time T occurs. Fault tolerant consensus protocols are given for various cases of parial synchrony and various fault models. Lower bounds are also given which show in many cases that out protocols are optimal with respect to the number of faults tolerated. Our consensus protocols for partially synchronous processors use new protocols for fault-tolerant "distributed clocks" which allow partially synchronous processors to reach some approximately common notion of time.
|