ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language

Compilers use cost models to choose between different optimization opportunities, and increasingly these cost models are developed using data-driven techniques. Compilers for general-purpose languages rely on large real-world program datasets to train their cost models. However, cost models for doma...

Full description

Bibliographic Details
Main Author: Holbrook, Zachary
Other Authors: Amarasinghe, Saman
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/139199
_version_ 1811073017333153792
author Holbrook, Zachary
author2 Amarasinghe, Saman
author_facet Amarasinghe, Saman
Holbrook, Zachary
author_sort Holbrook, Zachary
collection MIT
description Compilers use cost models to choose between different optimization opportunities, and increasingly these cost models are developed using data-driven techniques. Compilers for general-purpose languages rely on large real-world program datasets to train their cost models. However, cost models for domain-specific languages often have to use program generators due to a lack of large datasets of real-world programs. Program dataset generators are typically manually constructed or handwritten to generate programs in a randomly guided way. However, writing a program generator is time-consuming and requires considerable tuning to produce programs with realistic computation patterns in the desired domain. This thesis presents ProgGen, a program generator inspired by genetic programming for automatically generating program datasets used in training compiler cost models. ProgGen automatically produces program datasets in different domains by starting with a small initial set of programs in the desired domain. I compare ProgGen with the random program generator used in the Halide Autoscheduler [1]. While the Halide random program generator performs better in the image processing and neural network domains it was designed for, ProgGen is competitive in video processing and linear algebra domains. Due to the automatic nature of ProgGen, ProgGen can also generate programs in new domains with far less engineering time.
first_indexed 2024-09-23T09:27:21Z
format Thesis
id mit-1721.1/139199
institution Massachusetts Institute of Technology
last_indexed 2024-09-23T09:27:21Z
publishDate 2022
publisher Massachusetts Institute of Technology
record_format dspace
spelling mit-1721.1/1391992022-01-15T04:00:28Z ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language Holbrook, Zachary Amarasinghe, Saman Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Compilers use cost models to choose between different optimization opportunities, and increasingly these cost models are developed using data-driven techniques. Compilers for general-purpose languages rely on large real-world program datasets to train their cost models. However, cost models for domain-specific languages often have to use program generators due to a lack of large datasets of real-world programs. Program dataset generators are typically manually constructed or handwritten to generate programs in a randomly guided way. However, writing a program generator is time-consuming and requires considerable tuning to produce programs with realistic computation patterns in the desired domain. This thesis presents ProgGen, a program generator inspired by genetic programming for automatically generating program datasets used in training compiler cost models. ProgGen automatically produces program datasets in different domains by starting with a small initial set of programs in the desired domain. I compare ProgGen with the random program generator used in the Halide Autoscheduler [1]. While the Halide random program generator performs better in the image processing and neural network domains it was designed for, ProgGen is competitive in video processing and linear algebra domains. Due to the automatic nature of ProgGen, ProgGen can also generate programs in new domains with far less engineering time. M.Eng. 2022-01-14T14:56:15Z 2022-01-14T14:56:15Z 2021-06 2021-06-17T20:13:19.168Z Thesis https://hdl.handle.net/1721.1/139199 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle Holbrook, Zachary
ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
title ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
title_full ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
title_fullStr ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
title_full_unstemmed ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
title_short ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
title_sort proggen automatic dataset generation for the halide domain specific language
url https://hdl.handle.net/1721.1/139199
work_keys_str_mv AT holbrookzachary proggenautomaticdatasetgenerationforthehalidedomainspecificlanguage