From an image to a text description of the image

This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is h...

Full description

Bibliographic Details
Main Author: Thian, Ronald Chuan Yan
Other Authors: Chng Eng Siong
Format: Final Year Project (FYP)
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/72777
_version_ 1826112597228781568
author Thian, Ronald Chuan Yan
author2 Chng Eng Siong
author_facet Chng Eng Siong
Thian, Ronald Chuan Yan
author_sort Thian, Ronald Chuan Yan
collection NTU
description This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search.
first_indexed 2024-10-01T03:09:40Z
format Final Year Project (FYP)
id ntu-10356/72777
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:09:40Z
publishDate 2017
record_format dspace
spelling ntu-10356/727772023-03-03T20:25:18Z From an image to a text description of the image Thian, Ronald Chuan Yan Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search. Bachelor of Engineering (Computer Science) 2017-11-13T13:00:42Z 2017-11-13T13:00:42Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/72777 en Nanyang Technological University 62 p. application/pdf
spellingShingle DRNTU::Engineering::Computer science and engineering
Thian, Ronald Chuan Yan
From an image to a text description of the image
title From an image to a text description of the image
title_full From an image to a text description of the image
title_fullStr From an image to a text description of the image
title_full_unstemmed From an image to a text description of the image
title_short From an image to a text description of the image
title_sort from an image to a text description of the image
topic DRNTU::Engineering::Computer science and engineering
url http://hdl.handle.net/10356/72777
work_keys_str_mv AT thianronaldchuanyan fromanimagetoatextdescriptionoftheimage