Learned scheduling for database management systems

Parallel database management systems need efficient job scheduling. Currently systems use simple heuristics ignoring the characteristics of database workloads. Therefore, we created an effective scheduler that uses machine learning techniques, such as reinforcement learning and neural networks, and...

Full description

Bibliographic Details
Main Author: Ukyab, Tenzin Samten
Other Authors: Kraska, Tim
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139086
Description
Summary:Parallel database management systems need efficient job scheduling. Currently systems use simple heuristics ignoring the characteristics of database workloads. Therefore, we created an effective scheduler that uses machine learning techniques, such as reinforcement learning and neural networks, and does not require human intervention beyond an objective, such as reducing average job completion time. We use existing training techniques for job schedulers with dependency constraints. However, the model is specialized for database workloads using features specific to database queries, such as node operator type. In addition, we represent pipelining scheduling opportunities between operator tasks. With further training time our learned scheduler will be able to improve the average job completion time in comparison to heuristic schedulers, such as FIFO and fair scheduling.