Thread Scheduling Mechanisms for Multiple-Context Parallel Processors

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory...

Full description

Bibliographic Details
Main Author: Fiske, James A. Stuart
Language:en_US
Published: 2004
Online Access:http://hdl.handle.net/1721.1/7063
_version_ 1826201501062660096
author Fiske, James A. Stuart
author_facet Fiske, James A. Stuart
author_sort Fiske, James A. Stuart
collection MIT
description Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
first_indexed 2024-09-23T11:52:26Z
id mit-1721.1/7063
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T11:52:26Z
publishDate 2004
record_format dspace
spelling mit-1721.1/70632019-04-12T08:33:52Z Thread Scheduling Mechanisms for Multiple-Context Parallel Processors Fiske, James A. Stuart Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies. 2004-10-20T20:27:52Z 2004-10-20T20:27:52Z 1995-06-01 AITR-1545 http://hdl.handle.net/1721.1/7063 en_US AITR-1545 3195889 bytes 3161096 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle Fiske, James A. Stuart
Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_full Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_fullStr Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_full_unstemmed Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_short Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_sort thread scheduling mechanisms for multiple context parallel processors
url http://hdl.handle.net/1721.1/7063
work_keys_str_mv AT fiskejamesastuart threadschedulingmechanismsformultiplecontextparallelprocessors