Semantically guided self‐supervised monocular depth estimation
Abstract Depth information plays an important role in the vision‐related activities of robots and autonomous vehicles. An effective method to obtain 3D scene information is self‐supervised monocular depth estimation, which utilizes large and diverse monocular video datasets during the training proce...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-04-01
|
Series: | IET Image Processing |
Online Access: | https://doi.org/10.1049/ipr2.12409 |
Summary: | Abstract Depth information plays an important role in the vision‐related activities of robots and autonomous vehicles. An effective method to obtain 3D scene information is self‐supervised monocular depth estimation, which utilizes large and diverse monocular video datasets during the training process without the need for ground‐truth data. A novel multi‐task learning strategy that uses semantic information to guide the monocular depth estimation method while maintaining self‐supervision is proposed. An improved differential direct visual odometer (DDVO) combined with Pose‐Net is applied for achieving better pose prediction. Minimum reprojection loss with auto‐masking and semantic masking is used to remove the effects of low‐texture areas and moving dynamic‐class objects within scenes. Concurrently, the semantic masking is introduced into the DDVO pose predictor to filter moving objects and reduce the matching error between monocular sequence frames. In addition, PackNet is employed as the backbone of multi‐task learning to further improve the accuracy of deep prediction. The proposed method produces state‐of‐the‐art results for monocular depth estimation on the KITTI Eigen split benchmark, even outperforming supervised methods that have been trained using ground‐truth depth. |
---|---|
ISSN: | 1751-9659 1751-9667 |