Using Application-Level Thread Progress Information to Manage Power and Performance

© 2017 IEEE. Power and thermal limitations make it impossible to run all cores on a multicore system at their maximum frequency. Therefore, modern systems require careful power management. These systems must manage complex tradeoffs between energy, power, and frequency, choosing which cores to accel...

Full description

Bibliographic Details
Main Authors: Neuman, Sabrina M, Miller, Jason E, Sanchez, Daniel, Devadas, Srinivas
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE) 2021
Online Access:https://hdl.handle.net/1721.1/137122
Description
Summary:© 2017 IEEE. Power and thermal limitations make it impossible to run all cores on a multicore system at their maximum frequency. Therefore, modern systems require careful power management. These systems must manage complex tradeoffs between energy, power, and frequency, choosing which cores to accelerate to achieve good performance while maintaining energy efficiency or operating under a power budget. Navigating these tradeoffs is especially hard with multi-threaded applications, where performance depends on the relative progress of parallel worker threads between synchronization points. Prior work on chip-level power management for multi-threaded applications has largely relied on indirect heuristics and metrics calculated from low-level performance counters to estimate each thread's progress. However, these indirect metrics are often inaccurate. Instead, we propose to gather progress information directly from software itself. We present ThreadBeats, a simple application-level annotation framework that directly and accurately conveys thread progress information to hardware. We design DVFS controllers that exploit ThreadBeats information for two purposes: (i) improving performance by equalizing thread progress and (ii) minimizing runtime under a power budget constraint. These controllers reduce wait time at barriers by 77% on average and improve energy-delay product under a power budget by 23% over prior work.