Optimal Reissue Policies for Reducing Tail Latency

Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These addi- tional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the probability of re- ceiving an on-time response. There...

Full description

Bibliographic Details
Main Authors:	Elnikety, Sameh, Kaler, Timothy, He, Yuxiong
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	en_US
Published:	2018
Online Access:	http://hdl.handle.net/1721.1/116708 https://orcid.org/0000-0002-3831-8255

_version_	1811084840372535296
author	Elnikety, Sameh Kaler, Timothy He, Yuxiong
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Elnikety, Sameh Kaler, Timothy He, Yuxiong
author_sort	Elnikety, Sameh
collection	MIT
description	Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These addi- tional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the probability of re- ceiving an on-time response. There are two existing approaches of using reissue requests to reduce tail latency. (1) Reissue requests immediately to one or more replicas, which multiplies the load and runs the risk of overloading the system. (2) Reissue requests if not completed after a fixed delay. The delay helps to bound the number of extra reissue requests, but it also reduces the chance for those requests to respond before a tail latency target. We introduce a new family of reissue policies, Single-Time / Random ( SingleR ), that reissue requests after a delay d with probability q . SingleR employs randomness to bound the reissue rate, while allowing requests to be reissued early enough so they have sufficient time to respond, exploiting the benefits of both immediate and delayed reissue of prior work. We formally prove, within a simplified analytical model, that SingleR is optimal even when compared to more complex policies that reissue multiple times. To use SingleR for interactive services, we provide efficient algorithms for calculating optimal reissue delay and probability from response time logs through data-driven approach. We apply itera- tive adaptation for systems with load-dependent queuing delays. The key advantage of this data-driven approach is its wide applica- bility and effectiveness to systems with various design choices and workload properties. We evaluated SingleR policies thoroughly. We use simulation to illustrate its internals and demonstrate its robustness to a wide range of workloads. We conduct system experiments on the Re- dis key-value store and Lucene search server. The results show that for utilizations ranging from 40 - 60% , SingleR reduces the 99 th-percentile latency of Redis by 30 - 70% by reissuing only 2% of requests, and the 99 th-percentile latency of Lucene by 15 - 25% by reissuing 1% only.
first_indexed	2024-09-23T12:58:25Z
format	Article
id	mit-1721.1/116708
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T12:58:25Z
publishDate	2018
record_format	dspace
spelling	mit-1721.1/1167082022-09-28T11:14:59Z Optimal Reissue Policies for Reducing Tail Latency Elnikety, Sameh Kaler, Timothy He, Yuxiong Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology. Department of Mathematics Kaler, Tim Kaler, Timothy He, Yuxiong Interactive services send redundant requests to multiple different replicas to meet stringent tail latency requirements. These addi- tional (reissue) requests mitigate the impact of non-deterministic delays within the system and thus increase the probability of re- ceiving an on-time response. There are two existing approaches of using reissue requests to reduce tail latency. (1) Reissue requests immediately to one or more replicas, which multiplies the load and runs the risk of overloading the system. (2) Reissue requests if not completed after a fixed delay. The delay helps to bound the number of extra reissue requests, but it also reduces the chance for those requests to respond before a tail latency target. We introduce a new family of reissue policies, Single-Time / Random ( SingleR ), that reissue requests after a delay d with probability q . SingleR employs randomness to bound the reissue rate, while allowing requests to be reissued early enough so they have sufficient time to respond, exploiting the benefits of both immediate and delayed reissue of prior work. We formally prove, within a simplified analytical model, that SingleR is optimal even when compared to more complex policies that reissue multiple times. To use SingleR for interactive services, we provide efficient algorithms for calculating optimal reissue delay and probability from response time logs through data-driven approach. We apply itera- tive adaptation for systems with load-dependent queuing delays. The key advantage of this data-driven approach is its wide applica- bility and effectiveness to systems with various design choices and workload properties. We evaluated SingleR policies thoroughly. We use simulation to illustrate its internals and demonstrate its robustness to a wide range of workloads. We conduct system experiments on the Re- dis key-value store and Lucene search server. The results show that for utilizations ranging from 40 - 60% , SingleR reduces the 99 th-percentile latency of Redis by 30 - 70% by reissuing only 2% of requests, and the 99 th-percentile latency of Lucene by 15 - 25% by reissuing 1% only. 2018-07-02T13:49:18Z 2018-07-02T13:49:18Z 2017-07 Article http://purl.org/eprint/type/ConferencePaper 9781450345934 http://hdl.handle.net/1721.1/116708 Kaler, Tim, Yuxiong He, and Sameh Elnikety. “Optimal Reissue Policies for Reducing Tail Latency.” Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures - SPAA ’17 (2017). https://orcid.org/0000-0002-3831-8255 en_US https://doi.org/10.1145/3087556.3087566 Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures - SPAA '17 Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf Kaler
spellingShingle	Elnikety, Sameh Kaler, Timothy He, Yuxiong Optimal Reissue Policies for Reducing Tail Latency
title	Optimal Reissue Policies for Reducing Tail Latency
title_full	Optimal Reissue Policies for Reducing Tail Latency
title_fullStr	Optimal Reissue Policies for Reducing Tail Latency
title_full_unstemmed	Optimal Reissue Policies for Reducing Tail Latency
title_short	Optimal Reissue Policies for Reducing Tail Latency
title_sort	optimal reissue policies for reducing tail latency
url	http://hdl.handle.net/1721.1/116708 https://orcid.org/0000-0002-3831-8255
work_keys_str_mv	AT elniketysameh optimalreissuepoliciesforreducingtaillatency AT kalertimothy optimalreissuepoliciesforreducingtaillatency AT heyuxiong optimalreissuepoliciesforreducingtaillatency

Optimal Reissue Policies for Reducing Tail Latency

Similar Items