ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture

Localization is one of the main tasks involved in the operation of autonomous agents (e.g., vehicle, robot etc.). It allows them to be able to track their paths and properly detect and avoid obstacles. Visual Odometry (VO) is one of the techniques used for agent localization. VO involves estimating...

Full description

Bibliographic Details
Main Authors:	B Jayaraj P., J Ebin, R Karthik, P N Pournami
Format:	Article
Language:	English
Published:	EDP Sciences 2023-01-01
Series:	ITM Web of Conferences
Subjects:	visual odometry deep learning optical flow convolutional neural networks generative adversarial networks sequence-based models
Online Access:	https://www.itm-conferences.org/articles/itmconf/pdf/2023/04/itmconf_I3cs2023_01004.pdf

_version_	1797775772992143360
author	B Jayaraj P. J Ebin R Karthik P N Pournami
author_facet	B Jayaraj P. J Ebin R Karthik P N Pournami
author_sort	B Jayaraj P.
collection	DOAJ
description	Localization is one of the main tasks involved in the operation of autonomous agents (e.g., vehicle, robot etc.). It allows them to be able to track their paths and properly detect and avoid obstacles. Visual Odometry (VO) is one of the techniques used for agent localization. VO involves estimating the motion of an agent using the images taken by cameras attached to it. Conventional VO algorithms require specific workarounds for challenges posed by the working environment and the captured sensor data. On the other hand, Deep Learning approaches have shown tremendous efficiency and accuracy in tasks that require high degree of adaptability and scalability. In this work, a novel deep learning model is proposed to perform VO tasks for space robotic applications. The model consists of an optical flow estimation module which abstracts away scene-specific details from the input video sequence and produces an intermediate representation. The CNN module which follows next learn relative poses from the optical flow estimates. The final module is a state-of-the-art Vision Transformer, which learn absolute pose from the relative pose learnt by the CNN module. The model is trained on the KITTI dataset and has obtained a promising accuracy of approximately 2%. It has outperformed the baseline model, MagicVO, in a few sequences in the dataset.
first_indexed	2024-03-12T22:40:27Z
format	Article
id	doaj.art-92d2cfcbd6ce48af8ceefc4e7c48db5b
institution	Directory Open Access Journal
issn	2271-2097
language	English
last_indexed	2024-03-12T22:40:27Z
publishDate	2023-01-01
publisher	EDP Sciences
record_format	Article
series	ITM Web of Conferences
spelling	doaj.art-92d2cfcbd6ce48af8ceefc4e7c48db5b2023-07-21T09:41:32ZengEDP SciencesITM Web of Conferences2271-20972023-01-01540100410.1051/itmconf/20235401004itmconf_I3cs2023_01004ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid ArchitectureB Jayaraj P.0J Ebin1R Karthik2P N Pournami3National Insitute of Technology CalicutNational Insitute of Technology CalicutSED/ISG, Advanced Inertial Systems, ISRO Inertial Systems UnitNational Insitute of Technology CalicutLocalization is one of the main tasks involved in the operation of autonomous agents (e.g., vehicle, robot etc.). It allows them to be able to track their paths and properly detect and avoid obstacles. Visual Odometry (VO) is one of the techniques used for agent localization. VO involves estimating the motion of an agent using the images taken by cameras attached to it. Conventional VO algorithms require specific workarounds for challenges posed by the working environment and the captured sensor data. On the other hand, Deep Learning approaches have shown tremendous efficiency and accuracy in tasks that require high degree of adaptability and scalability. In this work, a novel deep learning model is proposed to perform VO tasks for space robotic applications. The model consists of an optical flow estimation module which abstracts away scene-specific details from the input video sequence and produces an intermediate representation. The CNN module which follows next learn relative poses from the optical flow estimates. The final module is a state-of-the-art Vision Transformer, which learn absolute pose from the relative pose learnt by the CNN module. The model is trained on the KITTI dataset and has obtained a promising accuracy of approximately 2%. It has outperformed the baseline model, MagicVO, in a few sequences in the dataset.https://www.itm-conferences.org/articles/itmconf/pdf/2023/04/itmconf_I3cs2023_01004.pdfvisual odometrydeep learningoptical flowconvolutional neural networksgenerative adversarial networkssequence-based models
spellingShingle	B Jayaraj P. J Ebin R Karthik P N Pournami ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture ITM Web of Conferences visual odometry deep learning optical flow convolutional neural networks generative adversarial networks sequence-based models
title	ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture
title_full	ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture
title_fullStr	ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture
title_full_unstemmed	ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture
title_short	ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture
title_sort	vit vo a visual odometry technique using cnn transformer hybrid architecture
topic	visual odometry deep learning optical flow convolutional neural networks generative adversarial networks sequence-based models
url	https://www.itm-conferences.org/articles/itmconf/pdf/2023/04/itmconf_I3cs2023_01004.pdf
work_keys_str_mv	AT bjayarajp vitvoavisualodometrytechniqueusingcnntransformerhybridarchitecture AT jebin vitvoavisualodometrytechniqueusingcnntransformerhybridarchitecture AT rkarthik vitvoavisualodometrytechniqueusingcnntransformerhybridarchitecture AT pnpournami vitvoavisualodometrytechniqueusingcnntransformerhybridarchitecture

ViT VO - A Visual Odometry technique Using CNN-Transformer Hybrid Architecture

Similar Items