Multi-Dimensional Residual Dense Attention Network for Stereo Matching

Very deep convolutional neural networks (CNNs) have recently achieved great success in stereo matching. It is still highly desirable to learn a robust feature map to improve ill-posed regions, such as weakly textured regions, reflective surfaces, and repetitive patterns. Therefore, we propose an end...

Full description

Bibliographic Details
Main Authors: Guanghui Zhang, Dongchen Zhu, Wenjun Shi, Xiaoqing Ye, Jiamao Li, Xiaolin Zhang
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8694010/
Description
Summary:Very deep convolutional neural networks (CNNs) have recently achieved great success in stereo matching. It is still highly desirable to learn a robust feature map to improve ill-posed regions, such as weakly textured regions, reflective surfaces, and repetitive patterns. Therefore, we propose an end-to-end multi-dimensional residual dense attention network (MRDA-Net) in this paper, focusing on more comprehensive pixel-wise feature extraction. Our proposed network consists of two parts: the 2D residual dense attention net for feature extraction and the 3D convolutional attention net for matching. Our designed 2D residual dense attention net uses a dense network structure to fully exploit the hierarchical features from preceding convolutional layers and uses residual network structure to fuse low-level structure information and high-level semantic information. The 2D attention module of the net aims to adaptively recalibrate channel-wise features to be more concerned about informative features. Our proposed 3D convolutional attention net further expands attention mechanism for matching. The stacked hourglass module of the net is exploited to extract multi-scale context information as well as geometry information. The novel 3D attention module of the net aggregates hierarchical sub-cost volumes adaptively instead of manually and then achieves a comprehensive recalibrated cost volume for more correct disparity computation. The experiments demonstrate that our approach achieves the state-of-the-art accuracy on Scene Flow dataset and KITTI 2012 and KITTI 2015 Stereo datasets.
ISSN:2169-3536