Code Summarization and Program Synthesis with Large Language Models

Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclea...

Full description

Bibliographic Details
Main Author:	Lam, Kelly
Other Authors:	Cafarella, Michael
Format:	Thesis
Published:	Massachusetts Institute of Technology 2024
Online Access:	https://hdl.handle.net/1721.1/156757

_version_	1811071934139465728
author	Lam, Kelly
author2	Cafarella, Michael
author_facet	Cafarella, Michael Lam, Kelly
author_sort	Lam, Kelly
collection	MIT
description	Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclear how effective they are with code summarization and generation, especially as we examine longer source code segments or more complicated prompts for generation. In this thesis, we will formalize the automatic code summarization and generation problems, identify some cases where large-language models can perform poorly, propose some techniques to correct the initial bad results, and evaluate our results against appropriate baselines using suitable evaluation metrics.
first_indexed	2024-09-23T08:58:16Z
format	Thesis
id	mit-1721.1/156757
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T08:58:16Z
publishDate	2024
publisher	Massachusetts Institute of Technology
record_format	dspace
spelling	mit-1721.1/1567572024-09-17T03:59:59Z Code Summarization and Program Synthesis with Large Language Models Lam, Kelly Cafarella, Michael Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Automatic source code summarization and generation are naturally complimentary operations because they bridge the gap between natural-language text and executable programs, allowing users to flow between the two modes. Even though large language models, have become increasingly popular, it is unclear how effective they are with code summarization and generation, especially as we examine longer source code segments or more complicated prompts for generation. In this thesis, we will formalize the automatic code summarization and generation problems, identify some cases where large-language models can perform poorly, propose some techniques to correct the initial bad results, and evaluate our results against appropriate baselines using suitable evaluation metrics. M.Eng. 2024-09-16T13:47:17Z 2024-09-16T13:47:17Z 2024-05 2024-07-11T14:37:06.306Z Thesis https://hdl.handle.net/1721.1/156757 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology
spellingShingle	Lam, Kelly Code Summarization and Program Synthesis with Large Language Models
title	Code Summarization and Program Synthesis with Large Language Models
title_full	Code Summarization and Program Synthesis with Large Language Models
title_fullStr	Code Summarization and Program Synthesis with Large Language Models
title_full_unstemmed	Code Summarization and Program Synthesis with Large Language Models
title_short	Code Summarization and Program Synthesis with Large Language Models
title_sort	code summarization and program synthesis with large language models
url	https://hdl.handle.net/1721.1/156757
work_keys_str_mv	AT lamkelly codesummarizationandprogramsynthesiswithlargelanguagemodels

Code Summarization and Program Synthesis with Large Language Models

Similar Items