Learning deep networks for video object segmentation

The Segment Anything Model (SAM) is an image segmentation model which has gained significant traction due to its powerful zero shot transfer performance on unseen data distributions as well as application to downstream tasks. It has a broad support of input methods such as point, box, and automa...

Full description

Bibliographic Details
Main Author:	Lim, Jun Rong
Other Authors:	Lin Guosheng
Format:	Final Year Project (FYP)
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Video object segmentation Deep neural network
Online Access:	https://hdl.handle.net/10356/175018

_version_	1811696983339958272
author	Lim, Jun Rong
author2	Lin Guosheng
author_facet	Lin Guosheng Lim, Jun Rong
author_sort	Lim, Jun Rong
collection	NTU
description	The Segment Anything Model (SAM) is an image segmentation model which has gained significant traction due to its powerful zero shot transfer performance on unseen data distributions as well as application to downstream tasks. It has a broad support of input methods such as point, box, and automatic mask generation. Traditional Video Object Segmentation (VOS) methods require strongly labelled training data consisting of densely annotated pixel level segmentation mask, which is both expensive and time-consuming to obtain. We explore using only weakly labelled bounding box annotations to turn the training process into a weakly supervised mode. In this paper, we present a novel method BoxSAM which combines the Segment Anything Model (SAM) with a Single object tracker and Monocular Depth mapping to tackle the task of Video Object Segmentation (VOS). BoxSAM leverages a robust bounding box based object tracker and point augmentation techniques from attention maps to generate an object mask, which will then be deconflicted using depth maps. The proposed method achieves 81.8 on DAVIS 17 and 70.5 on Youtube-VOS 2018 which compares favourably to other methods that were not trained on video segmentation data.
first_indexed	2024-10-01T07:48:02Z
format	Final Year Project (FYP)
id	ntu-10356/175018
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:48:02Z
publishDate	2024
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1750182024-04-19T15:46:14Z Learning deep networks for video object segmentation Lim, Jun Rong Lin Guosheng School of Computer Science and Engineering gslin@ntu.edu.sg Computer and Information Science Video object segmentation Deep neural network The Segment Anything Model (SAM) is an image segmentation model which has gained significant traction due to its powerful zero shot transfer performance on unseen data distributions as well as application to downstream tasks. It has a broad support of input methods such as point, box, and automatic mask generation. Traditional Video Object Segmentation (VOS) methods require strongly labelled training data consisting of densely annotated pixel level segmentation mask, which is both expensive and time-consuming to obtain. We explore using only weakly labelled bounding box annotations to turn the training process into a weakly supervised mode. In this paper, we present a novel method BoxSAM which combines the Segment Anything Model (SAM) with a Single object tracker and Monocular Depth mapping to tackle the task of Video Object Segmentation (VOS). BoxSAM leverages a robust bounding box based object tracker and point augmentation techniques from attention maps to generate an object mask, which will then be deconflicted using depth maps. The proposed method achieves 81.8 on DAVIS 17 and 70.5 on Youtube-VOS 2018 which compares favourably to other methods that were not trained on video segmentation data. Bachelor's degree 2024-04-18T08:11:04Z 2024-04-18T08:11:04Z 2024 Final Year Project (FYP) Lim, J. R. (2024). Learning deep networks for video object segmentation. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175018 https://hdl.handle.net/10356/175018 en SCSE23-0332 application/pdf Nanyang Technological University
spellingShingle	Computer and Information Science Video object segmentation Deep neural network Lim, Jun Rong Learning deep networks for video object segmentation
title	Learning deep networks for video object segmentation
title_full	Learning deep networks for video object segmentation
title_fullStr	Learning deep networks for video object segmentation
title_full_unstemmed	Learning deep networks for video object segmentation
title_short	Learning deep networks for video object segmentation
title_sort	learning deep networks for video object segmentation
topic	Computer and Information Science Video object segmentation Deep neural network
url	https://hdl.handle.net/10356/175018
work_keys_str_mv	AT limjunrong learningdeepnetworksforvideoobjectsegmentation

Learning deep networks for video object segmentation

Similar Items