Summary: | Stereo matching is a challenging problem with respect to weak texture, discontinuities, illumination difference and occlusions. Therefore, a deep learning framework is presented in this paper, which focuses on the first and last stage of typical stereo methods: the matching cost computation and the disparity refinement. For matching cost computation, two patch-based network architectures are exploited to allow the trade-off between speed and accuracy, both of which leverage multi-size and multi-layer pooling unit with no strides to learn cross-scale feature representations. For disparity refinement, unlike traditional handcrafted refinement algorithms, we incorporate the initial optimal and sub-optimal disparity maps before outlier detection. Furthermore, diverse base learners are encouraged to focus on specific replacement tasks, corresponding to the smooth regions and details. Experiments on different datasets demonstrate the effectiveness of our approach, which is able to obtain sub-pixel accuracy and restore occlusions to a great extent. Specifically, our accurate framework attains near-peak accuracy both in non-occluded and occluded region and our fast framework achieves competitive performance against the fast algorithms on Middlebury benchmark.
|