Exploring Sparse Visual Odometry Acceleration on FPGA with High-level Synthesis.

Published in IEEE Access, 2023

Visual Odometry (VO) systems are widely used to determine the position and orientation of a robot or camera in an unknown environment. They are deployed on resource-constrained platforms, such as drones, and Virtual Reality or Augmented Reality headsets. VO systems harnessing modern System-on-Chip (SoCs) with integrated Field Programmable Gate Array(FPGA) have the potential to improve overall performance. This paper explores the FPGA acceleration of sparse semi-direct VO kernels using High-level-Synthesis (HLS). The selected sparse semi-direct VO system, since its conception, was developed to execute efficiently on low-power processors. We show that both computational and data transfer overheads between the processing cores and the accelerators on the reconfigurable fabric need to be optimized to obtain better end-to-end performance. The additional data movement incurred when using an FPGA accelerator is due to the sparse computational nature together with random memory access patterns of the kernels. This paper shows that state-of-the-art HLS tools are not yet able to perform the required optimizations automatically. HLS tools usually target successful application kernels with dense computational patterns and regular memory access. In this paper we propose three, potentially general, methods to reduce the data transfer between the processing cores and the customised hardware kernels on the FPGA; these methods are: (a) approximation based on domain-specific knowledge, (b) lossless image compression, and (c) the use of on-the-fly computation. We present a case study of the use of these methods on SVO, a state-of-the-art sparse VO system with a semi-direct front-end. We demonstrate that our proposed methods can reduce data transfer overhead to achieve better end-to-end performance and that they can be applied not only when using standard Xilinx tools, but also with other state-of-the-art HLS tools, such as HeteroFlow. Compared to the baseline performance of the original SVO software on Arm processors, our proposed methods enable the Xilinx SDSoC and HeteroFlow designs to achieve a speedup of 2.4× and 2.14×, respectively, without noticeable accuracy loss. The Xilinx SDSoC and HeteroFlow designs also achieve a 1.85× and 1.89×, respectively, improvement in energy efficiency on a Xilinx Zynq Ultrascale+ SoC with Arm A53 cores and integrated FPGA. Compared to the SVO software baseline running on the Intel Xeon system, our proposed methods enable the Xilinx SDSoC and HeteroFlow designs to achieve 8.2× and 8.3× improvement in energy efficiency, respectively.

Download Preprint

Find it on IEEEXplore