Embodied object hunt

This study investigates the use of multimodal encoders in the Embodied Object Hunt task. The motivation behind this approach is recent developments in joint multimodal encoders such as CLIP that are able to extract common features between images and text. This ability is ideal for tasks combining...

Full description

Bibliographic Details
Main Author:	Kam, Rainer I-Wen
Other Authors:	Cham Tat Jen
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/175084

_version_	1826124513978351616
author	Kam, Rainer I-Wen
author2	Cham Tat Jen
author_facet	Cham Tat Jen Kam, Rainer I-Wen
author_sort	Kam, Rainer I-Wen
collection	NTU
description	This study investigates the use of multimodal encoders in the Embodied Object Hunt task. The motivation behind this approach is recent developments in joint multimodal encoders such as CLIP that are able to extract common features between images and text. This ability is ideal for tasks combining imagery and text, such as the Embodied Object Hunt using visual observations and textual input prompts. This study also explores using intrinsic curiosity rewards to supplement agent learning, encouraging agents to explore their environment and facilitate learning. This study compares agents trained using CLIP embeddings and intrinsic curiosity and those without, and analyzes the key differences between their training results. The results of this study can be used to understand the effectiveness and feasibility of using different approaches to train embodied agents, serving as an exploratory study that future improvements can be based upon.
first_indexed	2024-10-01T06:21:33Z
format	Final Year Project (FYP)
id	ntu-10356/175084
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T06:21:33Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1750842024-04-19T15:45:51Z Embodied object hunt Kam, Rainer I-Wen Cham Tat Jen School of Computer Science and Engineering ASTJCham@ntu.edu.sg Computer and Information Science This study investigates the use of multimodal encoders in the Embodied Object Hunt task. The motivation behind this approach is recent developments in joint multimodal encoders such as CLIP that are able to extract common features between images and text. This ability is ideal for tasks combining imagery and text, such as the Embodied Object Hunt using visual observations and textual input prompts. This study also explores using intrinsic curiosity rewards to supplement agent learning, encouraging agents to explore their environment and facilitate learning. This study compares agents trained using CLIP embeddings and intrinsic curiosity and those without, and analyzes the key differences between their training results. The results of this study can be used to understand the effectiveness and feasibility of using different approaches to train embodied agents, serving as an exploratory study that future improvements can be based upon. Bachelor's degree 2024-04-19T04:33:05Z 2024-04-19T04:33:05Z 2024 Final Year Project (FYP) Kam, R. I. (2024). Embodied object hunt. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175084 https://hdl.handle.net/10356/175084 en SCSE23-0037 application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Kam, Rainer I-Wen Embodied object hunt
title	Embodied object hunt
title_full	Embodied object hunt
title_fullStr	Embodied object hunt
title_full_unstemmed	Embodied object hunt
title_short	Embodied object hunt
title_sort	embodied object hunt
topic	Computer and Information Science
url	https://hdl.handle.net/10356/175084
work_keys_str_mv	AT kamraineriwen embodiedobjecthunt

Embodied object hunt

Similar Items