Code Summarization and Program Synthesis with Large Language Models
Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclea...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156757 |
_version_ | 1811071934139465728 |
---|---|
author | Lam, Kelly |
author2 | Cafarella, Michael |
author_facet | Cafarella, Michael Lam, Kelly |
author_sort | Lam, Kelly |
collection | MIT |
description | Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclear how effective they are with code summarization and generation, especially as we examine longer source code segments or more complicated prompts for generation. In this thesis, we will formalize the automatic code summarization and generation problems, identify some cases where large-language models can perform poorly, propose some techniques to correct the initial bad results, and evaluate our results against appropriate baselines using suitable evaluation metrics. |
first_indexed | 2024-09-23T08:58:16Z |
format | Thesis |
id | mit-1721.1/156757 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T08:58:16Z |
publishDate | 2024 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1567572024-09-17T03:59:59Z Code Summarization and Program Synthesis with Large Language Models Lam, Kelly Cafarella, Michael Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclear how effective they are with code summarization and generation, especially as we examine longer source code segments or more complicated prompts for generation. In this thesis, we will formalize the automatic code summarization and generation problems, identify some cases where large-language models can perform poorly, propose some techniques to correct the initial bad results, and evaluate our results against appropriate baselines using suitable evaluation metrics. M.Eng. 2024-09-16T13:47:17Z 2024-09-16T13:47:17Z 2024-05 2024-07-11T14:37:06.306Z Thesis https://hdl.handle.net/1721.1/156757 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Lam, Kelly Code Summarization and Program Synthesis with Large Language Models |
title | Code Summarization and Program Synthesis with Large Language Models |
title_full | Code Summarization and Program Synthesis with Large Language Models |
title_fullStr | Code Summarization and Program Synthesis with Large Language Models |
title_full_unstemmed | Code Summarization and Program Synthesis with Large Language Models |
title_short | Code Summarization and Program Synthesis with Large Language Models |
title_sort | code summarization and program synthesis with large language models |
url | https://hdl.handle.net/1721.1/156757 |
work_keys_str_mv | AT lamkelly codesummarizationandprogramsynthesiswithlargelanguagemodels |