Unsupervised Learning from Videos for Object Discovery in Single Images
This paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typicall...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-12-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/13/1/38 |
_version_ | 1797543224425840640 |
---|---|
author | Dong Zhao Baoqing Ding Yulin Wu Lei Chen Hongchao Zhou |
author_facet | Dong Zhao Baoqing Ding Yulin Wu Lei Chen Hongchao Zhou |
author_sort | Dong Zhao |
collection | DOAJ |
description | This paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typically consists of multiple object instances (like the foreground and background) that have spatial transformations across video frames and they can be sparsely represented. By exploring the sparsity representation of a video with a neural network, one may learn the features of each object instance without any labels, which can be used to discover, recognize, or distinguish object instances from a single image. In this paper, we consider a relatively simple scenario, where each image roughly consists of a foreground and a background. Our proposed method is based on encoder-decoder structures to sparsely represent the foreground, background, and segmentation mask, which further reconstruct the original images. We apply the feed-forward network trained from videos for object discovery in single images, which is different from the previous co-segmentation methods that require videos or collections of images as the input for inference. The experimental results on various object segmentation benchmarks demonstrate that the proposed method extracts primary objects accurately and robustly, which suggests that unsupervised image learning tasks can benefit from the sparsity of images and the inter-frame structure of videos. |
first_indexed | 2024-03-10T13:42:36Z |
format | Article |
id | doaj.art-5d6ec544660a466cba3cab2373ea0dd4 |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-10T13:42:36Z |
publishDate | 2020-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-5d6ec544660a466cba3cab2373ea0dd42023-11-21T02:54:13ZengMDPI AGSymmetry2073-89942020-12-011313810.3390/sym13010038Unsupervised Learning from Videos for Object Discovery in Single ImagesDong Zhao0Baoqing Ding1Yulin Wu2Lei Chen3Hongchao Zhou4School of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaSchool of Information Science and Engineering, Shandong University, Qingdao 266237, ChinaThis paper proposes a method for discovering the primary objects in single images by learning from videos in a purely unsupervised manner—the learning process is based on videos, but the generated network is able to discover objects from a single input image. The rough idea is that an image typically consists of multiple object instances (like the foreground and background) that have spatial transformations across video frames and they can be sparsely represented. By exploring the sparsity representation of a video with a neural network, one may learn the features of each object instance without any labels, which can be used to discover, recognize, or distinguish object instances from a single image. In this paper, we consider a relatively simple scenario, where each image roughly consists of a foreground and a background. Our proposed method is based on encoder-decoder structures to sparsely represent the foreground, background, and segmentation mask, which further reconstruct the original images. We apply the feed-forward network trained from videos for object discovery in single images, which is different from the previous co-segmentation methods that require videos or collections of images as the input for inference. The experimental results on various object segmentation benchmarks demonstrate that the proposed method extracts primary objects accurately and robustly, which suggests that unsupervised image learning tasks can benefit from the sparsity of images and the inter-frame structure of videos.https://www.mdpi.com/2073-8994/13/1/38unsupervised learningsparsity representationobject discoveryforegroundbackgroundsegmentation mask |
spellingShingle | Dong Zhao Baoqing Ding Yulin Wu Lei Chen Hongchao Zhou Unsupervised Learning from Videos for Object Discovery in Single Images Symmetry unsupervised learning sparsity representation object discovery foreground background segmentation mask |
title | Unsupervised Learning from Videos for Object Discovery in Single Images |
title_full | Unsupervised Learning from Videos for Object Discovery in Single Images |
title_fullStr | Unsupervised Learning from Videos for Object Discovery in Single Images |
title_full_unstemmed | Unsupervised Learning from Videos for Object Discovery in Single Images |
title_short | Unsupervised Learning from Videos for Object Discovery in Single Images |
title_sort | unsupervised learning from videos for object discovery in single images |
topic | unsupervised learning sparsity representation object discovery foreground background segmentation mask |
url | https://www.mdpi.com/2073-8994/13/1/38 |
work_keys_str_mv | AT dongzhao unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages AT baoqingding unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages AT yulinwu unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages AT leichen unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages AT hongchaozhou unsupervisedlearningfromvideosforobjectdiscoveryinsingleimages |