Annotating videos that teach MS Excel and predicting mouse / keyboard actions

This research paper explores the extraction of specific sentences from natural language as a foundational step towards building an Artificial Intelligence system for automating Microsoft Excel. The focus is on leveraging language models with the capability to extract intention and procedure sente...

Full description

Bibliographic Details
Main Author:	Tan, Genson Yao Jie
Other Authors:	Li Boyang
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Large language models Prompt engineering
Online Access:	https://hdl.handle.net/10356/175233

_version_	1824454784191561728
author	Tan, Genson Yao Jie
author2	Li Boyang
author_facet	Li Boyang Tan, Genson Yao Jie
author_sort	Tan, Genson Yao Jie
collection	NTU
description	This research paper explores the extraction of specific sentences from natural language as a foundational step towards building an Artificial Intelligence system for automating Microsoft Excel. The focus is on leveraging language models with the capability to extract intention and procedure sentences from transcript collected on YouTube. Utilizing such model can significantly alleviate the laborious process of manual annotations, and consequently, this approach can enable us to acquire a sufficiently large dataset for training a model tailored to the specific domain of procedure prediction. The research methodology involves exploring the limitations of fine-tuning Flan-T5 for this task, while also utilizing prompt engineering on Large Language Model (LLM) such as Llama 2 as an alternative method. The experimentations are conducted on Google Colab platform which offers access up to only 15GB of VRAM. This paper is centred around understanding the behaviour of Llama2 and how it responds towards different prompting techniques for information extraction. Data extracted from individual transcripts can be returned as English sentences or in a structured format, such as JSON format. The model is then evaluated against a manually annotated dataset labelled by human annotators for its extraction quality. This approach offers a straightforward and accessible way to acquire large databases of structured knowledge derived from unstructured text with very limited computational resource.
first_indexed	2025-02-19T03:27:49Z
format	Final Year Project (FYP)
id	ntu-10356/175233
institution	Nanyang Technological University
language	English
last_indexed	2025-02-19T03:27:49Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1752332024-04-26T15:41:54Z Annotating videos that teach MS Excel and predicting mouse / keyboard actions Tan, Genson Yao Jie Li Boyang School of Computer Science and Engineering boyang.li@ntu.edu.sg Computer and Information Science Large language models Prompt engineering This research paper explores the extraction of specific sentences from natural language as a foundational step towards building an Artificial Intelligence system for automating Microsoft Excel. The focus is on leveraging language models with the capability to extract intention and procedure sentences from transcript collected on YouTube. Utilizing such model can significantly alleviate the laborious process of manual annotations, and consequently, this approach can enable us to acquire a sufficiently large dataset for training a model tailored to the specific domain of procedure prediction. The research methodology involves exploring the limitations of fine-tuning Flan-T5 for this task, while also utilizing prompt engineering on Large Language Model (LLM) such as Llama 2 as an alternative method. The experimentations are conducted on Google Colab platform which offers access up to only 15GB of VRAM. This paper is centred around understanding the behaviour of Llama2 and how it responds towards different prompting techniques for information extraction. Data extracted from individual transcripts can be returned as English sentences or in a structured format, such as JSON format. The model is then evaluated against a manually annotated dataset labelled by human annotators for its extraction quality. This approach offers a straightforward and accessible way to acquire large databases of structured knowledge derived from unstructured text with very limited computational resource. Bachelor's degree 2024-04-21T23:42:30Z 2024-04-21T23:42:30Z 2024 Final Year Project (FYP) Tan, G. Y. J. (2024). Annotating videos that teach MS Excel and predicting mouse / keyboard actions. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175233 https://hdl.handle.net/10356/175233 en SCSE23-0709 application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Large language models Prompt engineering Tan, Genson Yao Jie Annotating videos that teach MS Excel and predicting mouse / keyboard actions
title	Annotating videos that teach MS Excel and predicting mouse / keyboard actions
title_full	Annotating videos that teach MS Excel and predicting mouse / keyboard actions
title_fullStr	Annotating videos that teach MS Excel and predicting mouse / keyboard actions
title_full_unstemmed	Annotating videos that teach MS Excel and predicting mouse / keyboard actions
title_short	Annotating videos that teach MS Excel and predicting mouse / keyboard actions
title_sort	annotating videos that teach ms excel and predicting mouse keyboard actions
topic	Computer and Information Science Large language models Prompt engineering
url	https://hdl.handle.net/10356/175233
work_keys_str_mv	AT tangensonyaojie annotatingvideosthatteachmsexcelandpredictingmousekeyboardactions

Annotating videos that teach MS Excel and predicting mouse / keyboard actions

Similar Items