SLA-Based Adaptation Schemes in Distributed Stream Processing Engines

With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing...

Full description

Bibliographic Details
Main Authors:	Muhammad Hanif, Eunsam Kim, Sumi Helal, Choonhwa Lee
Format:	Article
Language:	English
Published:	MDPI AG 2019-03-01
Series:	Applied Sciences
Subjects:	big data distributed computing modern stream processing engine SLA watermarking cloud computing
Online Access:	http://www.mdpi.com/2076-3417/9/6/1045

_version_	1819170272088948736
author	Muhammad Hanif Eunsam Kim Sumi Helal Choonhwa Lee
author_facet	Muhammad Hanif Eunsam Kim Sumi Helal Choonhwa Lee
author_sort	Muhammad Hanif
collection	DOAJ
description	With the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like naïve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).
first_indexed	2024-12-22T19:32:45Z
format	Article
id	doaj.art-ebdddf06867b403cb0704df145dc200c
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-22T19:32:45Z
publishDate	2019-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-ebdddf06867b403cb0704df145dc200c2022-12-21T18:15:03ZengMDPI AGApplied Sciences2076-34172019-03-0196104510.3390/app9061045app9061045SLA-Based Adaptation Schemes in Distributed Stream Processing EnginesMuhammad Hanif0Eunsam Kim1Sumi Helal2Choonhwa Lee3Division of Computer Science and Engineering, Hanyang University, Seoul 133-791, KoreaDepartment of Computer Engineering, Hongik University, Seoul 121-791, KoreaSchool of Computing and Communications, Lancaster University, Lancaster, UKDivision of Computer Science and Engineering, Hanyang University, Seoul 133-791, KoreaWith the upswing in the volume of data, information online, and magnanimous cloud applications, big data analytics becomes mainstream in the research communities in the industry as well as in the scholarly world. This prompted the emergence and development of real-time distributed stream processing frameworks, such as Flink, Storm, Spark, and Samza. These frameworks endorse complex queries on streaming data to be distributed across multiple worker nodes in a cluster. Few of these stream processing frameworks provides fundamental support for controlling the latency and throughput of the system as well as the correctness of the results. However, none has the ability to handle them on the fly at runtime. We present a well-informed and efficient adaptive watermarking and dynamic buffering timeout mechanism for the distributed streaming frameworks. It is designed to increase the overall throughput of the system by making the watermarks adaptive towards the stream of incoming workload, and scale the buffering timeout dynamically for each task tracker on the fly while maintaining the Service Level Agreement (SLA)-based end-to-end latency of the system. This work focuses on tuning the parameters of the system (such as window correctness, buffering timeout, and so on) based on the prediction of incoming workloads and assesses whether a given workload will breach an SLA using output metrics including latency, throughput, and correctness of both intermediate and final results. We used Apache Flink as our testbed distributed processing engine for this work. However, the proposed mechanism can be applied to other streaming frameworks as well. Our results on the testbed model indicate that the proposed system outperforms the status quo of stream processing. With the inclusion of learning models like naïve Bayes, multilayer perceptron (MLP), and sequential minimal optimization (SMO)., the system shows more progress in terms of keeping the SLA intact as well as quality of service (QoS).http://www.mdpi.com/2076-3417/9/6/1045big datadistributed computingmodern stream processing engineSLAwatermarkingcloud computing
spellingShingle	Muhammad Hanif Eunsam Kim Sumi Helal Choonhwa Lee SLA-Based Adaptation Schemes in Distributed Stream Processing Engines Applied Sciences big data distributed computing modern stream processing engine SLA watermarking cloud computing
title	SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
title_full	SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
title_fullStr	SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
title_full_unstemmed	SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
title_short	SLA-Based Adaptation Schemes in Distributed Stream Processing Engines
title_sort	sla based adaptation schemes in distributed stream processing engines
topic	big data distributed computing modern stream processing engine SLA watermarking cloud computing
url	http://www.mdpi.com/2076-3417/9/6/1045
work_keys_str_mv	AT muhammadhanif slabasedadaptationschemesindistributedstreamprocessingengines AT eunsamkim slabasedadaptationschemesindistributedstreamprocessingengines AT sumihelal slabasedadaptationschemesindistributedstreamprocessingengines AT choonhwalee slabasedadaptationschemesindistributedstreamprocessingengines

SLA-Based Adaptation Schemes in Distributed Stream Processing Engines

Similar Items