Format Abstractions for the Compilation of Sparse Tensor Algebra

Tensors are commonly used to represent data in many domains, including data analytics, machine learning, science, and engineering. Many highly-optimized libraries and compilers have been developed for efficiently computing on dense tensors. However, existing libraries and compilers are limited in th...

Full description

Bibliographic Details
Main Author:	Chou, Stephen
Other Authors:	Amarasinghe, Saman
Format:	Thesis
Published:	Massachusetts Institute of Technology 2023
Online Access:	https://hdl.handle.net/1721.1/147453

_version_	1811096413903257600
author	Chou, Stephen
author2	Amarasinghe, Saman
author_facet	Amarasinghe, Saman Chou, Stephen
author_sort	Chou, Stephen
collection	MIT
description	Tensors are commonly used to represent data in many domains, including data analytics, machine learning, science, and engineering. Many highly-optimized libraries and compilers have been developed for efficiently computing on dense tensors. However, existing libraries and compilers are limited in their ability to support real-world applications that work with sparse tensors, which contain mostly zeros. In particular, there exist countless specialized formats for storing sparse tensors in memory, each suited to specific types of applications and data. Since different formats often use very different data structures to store nonzeros though, computing with sparse tensors that are stored in different formats can require vastly dissimilar code that are all difficult to implement by hand and non-trivial to generate automatically. Existing libraries and compilers must therefore limit the set of computations and formats that they directly support, sacrificing usability and performance as a result. In this dissertation, I describe how to build a compiler that supports efficiently computing on sparse tensors that may be stored in a wide variety of formats. I first show how many commonly-used sparse tensor formats—from array-based formats like CSR, COO, and DIA to formats that store nonzeros using pointer-based data structures like linked lists, BSTs, and C-trees—can all be expressed as compositions of per-dimension formats. I further show how such per-dimension formats can be precisely defined by implementing a common set of abstractions that capture how their underlying data structures store nonzeros in memory and that capture how these data structures can be efficiently accessed or constructed. I then demonstrate how, with such specifications of per-dimension formats at hand, a compiler can generate code to efficiently compute on tensors that are stored in any of the aforementioned—and countless other—formats. We have implemented our technique in the TACO sparse tensor algebra compiler, which is the first compiler to generate code that computes any basic tensor algebra expression with sparse tensors that may be stored in arbitrary formats. Our technique generates code that has performance competitive with, if not better than, equivalent code in hand-optimized libraries and frameworks.
first_indexed	2024-09-23T16:43:21Z
format	Thesis
id	mit-1721.1/147453
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T16:43:21Z
publishDate	2023
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1474532023-01-20T03:02:29Z Format Abstractions for the Compilation of Sparse Tensor Algebra Chou, Stephen Amarasinghe, Saman Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Tensors are commonly used to represent data in many domains, including data analytics, machine learning, science, and engineering. Many highly-optimized libraries and compilers have been developed for efficiently computing on dense tensors. However, existing libraries and compilers are limited in their ability to support real-world applications that work with sparse tensors, which contain mostly zeros. In particular, there exist countless specialized formats for storing sparse tensors in memory, each suited to specific types of applications and data. Since different formats often use very different data structures to store nonzeros though, computing with sparse tensors that are stored in different formats can require vastly dissimilar code that are all difficult to implement by hand and non-trivial to generate automatically. Existing libraries and compilers must therefore limit the set of computations and formats that they directly support, sacrificing usability and performance as a result. In this dissertation, I describe how to build a compiler that supports efficiently computing on sparse tensors that may be stored in a wide variety of formats. I first show how many commonly-used sparse tensor formats—from array-based formats like CSR, COO, and DIA to formats that store nonzeros using pointer-based data structures like linked lists, BSTs, and C-trees—can all be expressed as compositions of per-dimension formats. I further show how such per-dimension formats can be precisely defined by implementing a common set of abstractions that capture how their underlying data structures store nonzeros in memory and that capture how these data structures can be efficiently accessed or constructed. I then demonstrate how, with such specifications of per-dimension formats at hand, a compiler can generate code to efficiently compute on tensors that are stored in any of the aforementioned—and countless other—formats. We have implemented our technique in the TACO sparse tensor algebra compiler, which is the first compiler to generate code that computes any basic tensor algebra expression with sparse tensors that may be stored in arbitrary formats. Our technique generates code that has performance competitive with, if not better than, equivalent code in hand-optimized libraries and frameworks. Ph.D. 2023-01-19T19:51:34Z 2023-01-19T19:51:34Z 2022-09 2022-10-19T19:07:52.557Z Thesis https://hdl.handle.net/1721.1/147453 0000-0002-5048-7131 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Chou, Stephen Format Abstractions for the Compilation of Sparse Tensor Algebra
title	Format Abstractions for the Compilation of Sparse Tensor Algebra
title_full	Format Abstractions for the Compilation of Sparse Tensor Algebra
title_fullStr	Format Abstractions for the Compilation of Sparse Tensor Algebra
title_full_unstemmed	Format Abstractions for the Compilation of Sparse Tensor Algebra
title_short	Format Abstractions for the Compilation of Sparse Tensor Algebra
title_sort	format abstractions for the compilation of sparse tensor algebra
url	https://hdl.handle.net/1721.1/147453
work_keys_str_mv	AT choustephen formatabstractionsforthecompilationofsparsetensoralgebra

Format Abstractions for the Compilation of Sparse Tensor Algebra

Similar Items