Neural image and video captioning

In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machin...

Full description

Bibliographic Details
Main Author: Lam, Ting En
Other Authors: Hanwang Zhang
Format: Final Year Project (FYP)
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175286
_version_ 1826129488988078080
author Lam, Ting En
author2 Hanwang Zhang
author_facet Hanwang Zhang
Lam, Ting En
author_sort Lam, Ting En
collection NTU
description In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machine Learning, Computer Vision and Natural Language Processing to bridge the gap between human and computer understanding of visual understanding by generating descriptive captions from visual content. In this project, the effectiveness of various image captioning models is evaluated to identify optimal frameworks for textual description generation. Subsequently, a video captioning model capable of generating multimodal captions for video content is developed. The proposed image and video captioning models are evaluated using standard metrics and a human evaluation study was conducted. Additionally, the models are deployed into a user-friendly application for usage. Overall, this study seeks to improve video captioning performance and foster further advancements in this field.
first_indexed 2024-10-01T07:41:27Z
format Final Year Project (FYP)
id ntu-10356/175286
institution Nanyang Technological University
language English
last_indexed 2024-10-01T07:41:27Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1752862024-04-26T15:43:34Z Neural image and video captioning Lam, Ting En Hanwang Zhang School of Computer Science and Engineering hanwangzhang@ntu.edu.sg Computer and Information Science In today’s digital age, the proliferation of visual content has underscored the critical importance of multimedia comprehension and interpretation. Video uses images and sound to convey information. This project introduces a novel approach to video captioning, leveraging the synergies between Machine Learning, Computer Vision and Natural Language Processing to bridge the gap between human and computer understanding of visual understanding by generating descriptive captions from visual content. In this project, the effectiveness of various image captioning models is evaluated to identify optimal frameworks for textual description generation. Subsequently, a video captioning model capable of generating multimodal captions for video content is developed. The proposed image and video captioning models are evaluated using standard metrics and a human evaluation study was conducted. Additionally, the models are deployed into a user-friendly application for usage. Overall, this study seeks to improve video captioning performance and foster further advancements in this field. Bachelor's degree 2024-04-22T08:35:17Z 2024-04-22T08:35:17Z 2024 Final Year Project (FYP) Lam, T. E. (2024). Neural image and video captioning. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175286 https://hdl.handle.net/10356/175286 en SCSE23-0211 application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Lam, Ting En
Neural image and video captioning
title Neural image and video captioning
title_full Neural image and video captioning
title_fullStr Neural image and video captioning
title_full_unstemmed Neural image and video captioning
title_short Neural image and video captioning
title_sort neural image and video captioning
topic Computer and Information Science
url https://hdl.handle.net/10356/175286
work_keys_str_mv AT lamtingen neuralimageandvideocaptioning