Summary: | This paper presents a novel algorithm for unsupervised video object segmentation (UVOS) in unconstrained scenarios. Although a large variety of methods have been proposed in the literature, segmenting generic objects is still challenging because different methods often perform well in different situations, and no single method can outperform the others in all cases. To address this, we propose to solve the problem of UVOS in a crowd-sourcing setting. We claim that one can achieve superior results by aggregating the predictions of multiple imperfect methods in a reasonable way. Specifically, we propose a latent regression algorithm for ensemble-based segmentation by jointly labelling pixels in a sequence and learning an adaptive weight for each single method in an ensemble. The pixel labellings offer the outcome (pseudo groundtruth) for regression and thus promote the procedure of weight learning, while the learnt weights could provide better shape priors for labelling, resulting in more accurate segmentation. Besides, Laplacian regularization is introduced into the regression to facilitate a stable learning of the weights. The most distinct feature of our algorithm is that it adaptively learns the contributions of different single methods for each test sequence, thus is capable of capturing the advantages of those methods while avoiding their weaknesses. In the experiments, our algorithm is built on 14 non-deep learning segmentation methods which are based on handcrafted features and require no training data. Experimental results on popular benchmarks show that our algorithm achieves compelling performance, even in comparison with deep learning-based methods. Furthermore, benefiting from the adaptive weight learning mechanism, our algorithm can achieve good flexibility and usability by choosing the most complementary single methods without losing too much performance.
|