BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry

As the demand for embedded-vision grows, solving large optimization problems in real-time with energy and cost budget is a challenge. We present BAX, a hardware accelerator of bundle adjustment (BA), which solves the least-squares problem of state estimation in visual odometry (VO). BAX consists of...

Full description

Bibliographic Details
Main Authors: Rongdi Sun, Peilin Liu, Jianwei Xue, Shiyu Yang, Jiuchao Qian, Rendong Ying
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9069904/
_version_ 1818664432726704128
author Rongdi Sun
Peilin Liu
Jianwei Xue
Shiyu Yang
Jiuchao Qian
Rendong Ying
author_facet Rongdi Sun
Peilin Liu
Jianwei Xue
Shiyu Yang
Jiuchao Qian
Rendong Ying
author_sort Rongdi Sun
collection DOAJ
description As the demand for embedded-vision grows, solving large optimization problems in real-time with energy and cost budget is a challenge. We present BAX, a hardware accelerator of bundle adjustment (BA), which solves the least-squares problem of state estimation in visual odometry (VO). BAX consists of a frontend and a backend for control and computation, respectively. The frontend generates instructions on-the-fly executed at the backend to perform the BA algorithm. The backend adopts decoupled access/execute (DAE) architecture, which separates the memory access unit (MAU) from the pipeline. The MAU can prefetch vectors and matrices ahead of computations. To further reduce the latency of data reorganization, three transpose-free dataflows are proposed for matrix multiplication operations on the vector processing unit (VPU). Besides, a unified architecture for both forward and backward substitution is designed for matrix decomposition in the linear solver. All the data are stored in 442kB on-chip memory, and the local map is maintained efficiently by the hierarchical graph memory. Compared with the baseline architecture, the processing time is reduced by 53.9% through the above techniques. BAX is implemented in 32-bit floating-point precision with data normalization on FPGA. It completes a full BA in about 63.44ms at 200MHz, consuming 1.12W power. BAX is $1.73\times $ and $22.38\times $ faster than the desktop and embedded CPUs, respectively, and achieves 90% performance of the GPU at much less power consumption.
first_indexed 2024-12-17T05:32:39Z
format Article
id doaj.art-ad64383c885e4236b668a575e29b189f
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-17T05:32:39Z
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ad64383c885e4236b668a575e29b189f2022-12-21T22:01:41ZengIEEEIEEE Access2169-35362020-01-018755307554210.1109/ACCESS.2020.29885279069904BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual OdometryRongdi Sun0https://orcid.org/0000-0002-8005-7458Peilin Liu1Jianwei Xue2Shiyu Yang3Jiuchao Qian4Rendong Ying5School of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaSchool of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaSchool of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaSchool of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaSchool of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaSchool of Electronic and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, ChinaAs the demand for embedded-vision grows, solving large optimization problems in real-time with energy and cost budget is a challenge. We present BAX, a hardware accelerator of bundle adjustment (BA), which solves the least-squares problem of state estimation in visual odometry (VO). BAX consists of a frontend and a backend for control and computation, respectively. The frontend generates instructions on-the-fly executed at the backend to perform the BA algorithm. The backend adopts decoupled access/execute (DAE) architecture, which separates the memory access unit (MAU) from the pipeline. The MAU can prefetch vectors and matrices ahead of computations. To further reduce the latency of data reorganization, three transpose-free dataflows are proposed for matrix multiplication operations on the vector processing unit (VPU). Besides, a unified architecture for both forward and backward substitution is designed for matrix decomposition in the linear solver. All the data are stored in 442kB on-chip memory, and the local map is maintained efficiently by the hierarchical graph memory. Compared with the baseline architecture, the processing time is reduced by 53.9% through the above techniques. BAX is implemented in 32-bit floating-point precision with data normalization on FPGA. It completes a full BA in about 63.44ms at 200MHz, consuming 1.12W power. BAX is $1.73\times $ and $22.38\times $ faster than the desktop and embedded CPUs, respectively, and achieves 90% performance of the GPU at much less power consumption.https://ieeexplore.ieee.org/document/9069904/Hardware acceleratordecoupled architectureFPGAembedded systembundle adjustmentvisual odometry
spellingShingle Rongdi Sun
Peilin Liu
Jianwei Xue
Shiyu Yang
Jiuchao Qian
Rendong Ying
BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry
IEEE Access
Hardware accelerator
decoupled architecture
FPGA
embedded system
bundle adjustment
visual odometry
title BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry
title_full BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry
title_fullStr BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry
title_full_unstemmed BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry
title_short BAX: A Bundle Adjustment Accelerator With Decoupled Access/Execute Architecture for Visual Odometry
title_sort bax a bundle adjustment accelerator with decoupled access execute architecture for visual odometry
topic Hardware accelerator
decoupled architecture
FPGA
embedded system
bundle adjustment
visual odometry
url https://ieeexplore.ieee.org/document/9069904/
work_keys_str_mv AT rongdisun baxabundleadjustmentacceleratorwithdecoupledaccessexecutearchitectureforvisualodometry
AT peilinliu baxabundleadjustmentacceleratorwithdecoupledaccessexecutearchitectureforvisualodometry
AT jianweixue baxabundleadjustmentacceleratorwithdecoupledaccessexecutearchitectureforvisualodometry
AT shiyuyang baxabundleadjustmentacceleratorwithdecoupledaccessexecutearchitectureforvisualodometry
AT jiuchaoqian baxabundleadjustmentacceleratorwithdecoupledaccessexecutearchitectureforvisualodometry
AT rendongying baxabundleadjustmentacceleratorwithdecoupledaccessexecutearchitectureforvisualodometry