ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language
Compilers use cost models to choose between different optimization opportunities, and increasingly these cost models are developed using data-driven techniques. Compilers for general-purpose languages rely on large real-world program datasets to train their cost models. However, cost models for doma...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/139199 |
_version_ | 1811073017333153792 |
---|---|
author | Holbrook, Zachary |
author2 | Amarasinghe, Saman |
author_facet | Amarasinghe, Saman Holbrook, Zachary |
author_sort | Holbrook, Zachary |
collection | MIT |
description | Compilers use cost models to choose between different optimization opportunities, and increasingly these cost models are developed using data-driven techniques. Compilers for general-purpose languages rely on large real-world program datasets to train their cost models. However, cost models for domain-specific languages often have to use program generators due to a lack of large datasets of real-world programs. Program dataset generators are typically manually constructed or handwritten to generate programs in a randomly guided way. However, writing a program generator is time-consuming and requires considerable tuning to produce programs with realistic computation patterns in the desired domain.
This thesis presents ProgGen, a program generator inspired by genetic programming for automatically generating program datasets used in training compiler cost models. ProgGen automatically produces program datasets in different domains by starting with a small initial set of programs in the desired domain. I compare ProgGen with the random program generator used in the Halide Autoscheduler [1]. While the Halide random program generator performs better in the image processing and neural network domains it was designed for, ProgGen is competitive in video processing and linear algebra domains. Due to the automatic nature of ProgGen, ProgGen can also generate programs in new domains with far less engineering time. |
first_indexed | 2024-09-23T09:27:21Z |
format | Thesis |
id | mit-1721.1/139199 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T09:27:21Z |
publishDate | 2022 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1391992022-01-15T04:00:28Z ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language Holbrook, Zachary Amarasinghe, Saman Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Compilers use cost models to choose between different optimization opportunities, and increasingly these cost models are developed using data-driven techniques. Compilers for general-purpose languages rely on large real-world program datasets to train their cost models. However, cost models for domain-specific languages often have to use program generators due to a lack of large datasets of real-world programs. Program dataset generators are typically manually constructed or handwritten to generate programs in a randomly guided way. However, writing a program generator is time-consuming and requires considerable tuning to produce programs with realistic computation patterns in the desired domain. This thesis presents ProgGen, a program generator inspired by genetic programming for automatically generating program datasets used in training compiler cost models. ProgGen automatically produces program datasets in different domains by starting with a small initial set of programs in the desired domain. I compare ProgGen with the random program generator used in the Halide Autoscheduler [1]. While the Halide random program generator performs better in the image processing and neural network domains it was designed for, ProgGen is competitive in video processing and linear algebra domains. Due to the automatic nature of ProgGen, ProgGen can also generate programs in new domains with far less engineering time. M.Eng. 2022-01-14T14:56:15Z 2022-01-14T14:56:15Z 2021-06 2021-06-17T20:13:19.168Z Thesis https://hdl.handle.net/1721.1/139199 In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Holbrook, Zachary ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language |
title | ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language |
title_full | ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language |
title_fullStr | ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language |
title_full_unstemmed | ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language |
title_short | ProgGen: Automatic Dataset Generation for the Halide Domain Specific Language |
title_sort | proggen automatic dataset generation for the halide domain specific language |
url | https://hdl.handle.net/1721.1/139199 |
work_keys_str_mv | AT holbrookzachary proggenautomaticdatasetgenerationforthehalidedomainspecificlanguage |