Unsupervised Learning from Videos for Object Discovery in Single Images

This paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typicall...

Full description

Bibliographic Details
Main Authors: Dong Zhao, Baoqing Ding, Yulin Wu, Lei Chen, Hongchao Zhou
Format: Article
Language:English
Published: MDPI AG 2020-12-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/13/1/38
_version_ 1797543224425840640
author Dong Zhao
Baoqing Ding
Yulin Wu
Lei Chen
Hongchao Zhou
author_facet Dong Zhao
Baoqing Ding
Yulin Wu
Lei Chen
Hongchao Zhou
author_sort Dong Zhao
collection DOAJ
description This paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typically consists of multiple object instances (like the foreground and background) that have spatial transformations across video frames and they can be sparsely represented. By exploring the sparsity representation of a video with a neural network, one may learn the features of each object instance without any labels, which can be used to discover, recognize, or distinguish object instances from a single image. In this paper, we consider a relatively simple scenario, where each image roughly consists of a foreground and a background. Our proposed method is based on encoder-decoder structures to sparsely represent the foreground, background, and segmentation mask, which further reconstruct the original images. We apply the feed-forward network trained from videos for object discovery in single images, which is different from the previous co-segmentation methods that require videos or collections of images as the input for inference. The experimental results on various object segmentation benchmarks demonstrate that the proposed method extracts primary objects accurately and robustly, which suggests that unsupervised image learning tasks can benefit from the sparsity of images and the inter-frame structure of videos.
first_indexed 2024-03-10T13:42:36Z
format Article
id doaj.art-5d6ec544660a466cba3cab2373ea0dd4
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-10T13:42:36Z
publishDate 2020-12-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-5d6ec544660a466cba3cab2373ea0dd42023-11-21T02:54:13ZengMDPI AGSymmetry2073-89942020-12-011313810.3390/sym13010038Unsupervised Learning from Videos for Object Discovery in Single ImagesDong Zhao0Baoqing Ding1Yulin Wu2Lei Chen3Hongchao Zhou4School of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaThis paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typically consists of multiple object instances (like the foreground and background) that have spatial transformations across video frames and they can be sparsely represented. By exploring the sparsity representation of a video with a neural network, one may learn the features of each object instance without any labels, which can be used to discover, recognize, or distinguish object instances from a single image. In this paper, we consider a relatively simple scenario, where each image roughly consists of a foreground and a background. Our proposed method is based on encoder-decoder structures to sparsely represent the foreground, background, and segmentation mask, which further reconstruct the original images. We apply the feed-forward network trained from videos for object discovery in single images, which is different from the previous co-segmentation methods that require videos or collections of images as the input for inference. The experimental results on various object segmentation benchmarks demonstrate that the proposed method extracts primary objects accurately and robustly, which suggests that unsupervised image learning tasks can benefit from the sparsity of images and the inter-frame structure of videos.https://www.mdpi.com/2073-8994/13/1/38unsupervised learningsparsity representationobject discoveryforegroundbackgroundsegmentation mask
spellingShingle Dong Zhao
Baoqing Ding
Yulin Wu
Lei Chen
Hongchao Zhou
Unsupervised Learning from Videos for Object Discovery in Single Images
Symmetry
unsupervised learning
sparsity representation
object discovery
foreground
background
segmentation mask
title Unsupervised Learning from Videos for Object Discovery in Single Images
title_full Unsupervised Learning from Videos for Object Discovery in Single Images
title_fullStr Unsupervised Learning from Videos for Object Discovery in Single Images
title_full_unstemmed Unsupervised Learning from Videos for Object Discovery in Single Images
title_short Unsupervised Learning from Videos for Object Discovery in Single Images
title_sort unsupervised learning from videos for object discovery in single images
topic unsupervised learning
sparsity representation
object discovery
foreground
background
segmentation mask
url https://www.mdpi.com/2073-8994/13/1/38
work_keys_str_mv AT dongzhao unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages
AT baoqingding unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages
AT yulinwu unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages
AT leichen unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages
AT hongchaozhou unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages