Compilation techniques for short-vector instructions

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.

Bibliographic Details
Main Author:	Larsen, Samuel (Samuel Barton), 1975-
Other Authors:	Saman P. Amarasinghe.
Format:	Thesis
Language:	eng
Published:	Massachusetts Institute of Technology 2007
Subjects:	Electrical Engineering and Computer Science.
Online Access:	http://hdl.handle.net/1721.1/37890

_version_	1826216667001126912
author	Larsen, Samuel (Samuel Barton), 1975-
author2	Saman P. Amarasinghe.
author_facet	Saman P. Amarasinghe. Larsen, Samuel (Samuel Barton), 1975-
author_sort	Larsen, Samuel (Samuel Barton), 1975-
collection	MIT
description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
first_indexed	2024-09-23T16:51:27Z
format	Thesis
id	mit-1721.1/37890
institution	Massachusetts Institute of Technology
language	eng
last_indexed	2024-09-23T16:51:27Z
publishDate	2007
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/378902019-04-11T09:35:53Z Compilation techniques for short-vector instructions Larsen, Samuel (Samuel Barton), 1975- Saman P. Amarasinghe. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006. Includes bibliographical references (p. 127-133). Multimedia extensions are nearly ubiquitous in today's general-purpose processors. These extensions consist primarily of a set of short-vector instructions that apply the same opcode to a vector of operands. This design introduces a data-parallel component to processors that exploit instruction-level parallelism, and presents an opportunity for increased performance. In fact, ignoring a processor's vector opcodes can leave a significant portion of the available resources unused. In order for software developers to find short-vector instructions generally useful, the compiler must target these extensions with complete transparency and consistent performance. This thesis develops compiler techniques to target short-vector instructions automatically and efficiently. One important aspect of compilation is the effective management of memory alignment. As with scalar loads and stores, vector references are typically more efficient when accessing aligned regions. In many cases, the compiler can glean no alignment information and must emit conservative code sequences. In response, I introduce a range of compiler techniques for detecting and enforcing aligned references. In my benchmark suite, the most practical method ensures alignment for roughly 75% of dynamic memory references. (cont.) This thesis also introduces selective vectorization, a technique for balancing computation across a processor's scalar and vector resources. Current approaches for targeting short-vector instructions directly adopt vectorizing technology first developed for supercomputers. Traditional vectorization, however, can lead to a performance degradation since it fails to account for a processor's scalar execution resources. I formulate selective vectorization in the context of software pipelining. My approach creates software pipelines with shorter initiation intervals, and therefore, higher performance. In contrast to conventional methods, selective vectorization operates on a low-level intermediate representation. This technique allows the algorithm to accurately measure the performance trade-offs of code selection alternatives. A key aspect of selective vectorization is its ability to manage communication of operands between vector and scalar instructions. Even when operand transfer is expensive, the technique is sufficiently sophisticated to achieve significant performance gains. I evaluate selective vectorization on a set of SPEC FP benchmarks. On a realistic VLIW processor model, the approach achieves whole-program speedups of up to 1.35x over existing approaches. For individual loops, it provides speedups of up to 1.75x. by Samuel Larsen. Ph.D. 2007-07-18T13:04:48Z 2007-07-18T13:04:48Z 2006 2006 Thesis http://hdl.handle.net/1721.1/37890 131316320 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 133 p. application/pdf Massachusetts Institute of Technology
spellingShingle	Electrical Engineering and Computer Science. Larsen, Samuel (Samuel Barton), 1975- Compilation techniques for short-vector instructions
title	Compilation techniques for short-vector instructions
title_full	Compilation techniques for short-vector instructions
title_fullStr	Compilation techniques for short-vector instructions
title_full_unstemmed	Compilation techniques for short-vector instructions
title_short	Compilation techniques for short-vector instructions
title_sort	compilation techniques for short vector instructions
topic	Electrical Engineering and Computer Science.
url	http://hdl.handle.net/1721.1/37890
work_keys_str_mv	AT larsensamuelsamuelbarton1975 compilationtechniquesforshortvectorinstructions

Compilation techniques for short-vector instructions

Similar Items