Thread Scheduling Mechanisms for Multiple-Context Parallel Processors

Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory...

Full description

Bibliographic Details
Main Author:	Fiske, James A. Stuart
Language:	en_US
Published:	2004
Online Access:	http://hdl.handle.net/1721.1/7063

_version_	1826201501062660096
author	Fiske, James A. Stuart
author_facet	Fiske, James A. Stuart
author_sort	Fiske, James A. Stuart
collection	MIT
description	Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies.
first_indexed	2024-09-23T11:52:26Z
id	mit-1721.1/7063
institution	Massachusetts Institute of Technology
language	en_US
last_indexed	2024-09-23T11:52:26Z
publishDate	2004
record_format	dspace
spelling	mit-1721.1/70632019-04-12T08:33:52Z Thread Scheduling Mechanisms for Multiple-Context Parallel Processors Fiske, James A. Stuart Scheduling tasks to efficiently use the available processor resources is crucial to minimizing the runtime of applications on shared-memory parallel processors. One factor that contributes to poor processor utilization is the idle time caused by long latency operations, such as remote memory references or processor synchronization operations. One way of tolerating this latency is to use a processor with multiple hardware contexts that can rapidly switch to executing another thread of computation whenever a long latency operation occurs, thus increasing processor utilization by overlapping computation with communication. Although multiple contexts are effective for tolerating latency, this effectiveness can be limited by memory and network bandwidth, by cache interference effects among the multiple contexts, and by critical tasks sharing processor resources with less critical tasks. This thesis presents techniques that increase the effectiveness of multiple contexts by intelligently scheduling threads to make more efficient use of processor pipeline, bandwidth, and cache resources. This thesis proposes thread prioritization as a fundamental mechanism for directing the thread schedule on a multiple-context processor. A priority is assigned to each thread either statically or dynamically and is used by the thread scheduler to decide which threads to load in the contexts, and to decide which context to switch to on a context switch. We develop a multiple-context model that integrates both cache and network effects, and shows how thread prioritization can both maintain high processor utilization, and limit increases in critical path runtime caused by multithreading. The model also shows that in order to be effective in bandwidth limited applications, thread prioritization must be extended to prioritize memory requests. We show how simple hardware can prioritize the running of threads in the multiple contexts, and the issuing of requests to both the local memory and the network. Simulation experiments show how thread prioritization is used in a variety of applications. Thread prioritization can improve the performance of synchronization primitives by minimizing the number of processor cycles wasted in spinning and devoting more cycles to critical threads. Thread prioritization can be used in combination with other techniques to improve cache performance and minimize cache interference between different working sets in the cache. For applications that are critical path limited, thread prioritization can improve performance by allowing processor resources to be devoted preferentially to critical threads. These experimental results show that thread prioritization is a mechanism that can be used to implement a wide range of scheduling policies. 2004-10-20T20:27:52Z 2004-10-20T20:27:52Z 1995-06-01 AITR-1545 http://hdl.handle.net/1721.1/7063 en_US AITR-1545 3195889 bytes 3161096 bytes application/postscript application/pdf application/postscript application/pdf
spellingShingle	Fiske, James A. Stuart Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title	Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_full	Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_fullStr	Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_full_unstemmed	Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_short	Thread Scheduling Mechanisms for Multiple-Context Parallel Processors
title_sort	thread scheduling mechanisms for multiple context parallel processors
url	http://hdl.handle.net/1721.1/7063
work_keys_str_mv	AT fiskejamesastuart threadschedulingmechanismsformultiplecontextparallelprocessors

Thread Scheduling Mechanisms for Multiple-Context Parallel Processors

Similar Items