Summary: | For autonomous mobile robots, multiple sensors are being widely adopted as a way to enhance the perception ability of the robot. 3D LiDARs and cameras are the most commonly used sensors. A 3D LiDAR provides data in the form of unordered 3D coordinates of objects in the environment. A monocular camera provides data in the form of ordered color or intensity images. These two data complement each other and when fused together, they can provide a much better understanding of the scene. The first step to fuse the data from a 3D LiDAR and a camera is to obtain accurate calibration parameters between the two. Calibration is of two types – intrinsic and extrinsic. While the intrinsic calibration of each sensor is always done separately, this thesis is focused on the extrinsic calibration between the two sensors.
Extrinsic calibration involves determining the rotation matrix and translation vector between the two sensors. Generally, extrinsic calibration can be divided into two types – offline and online. Offline methods provide good accuracy; however, they require the use of a specific calibration target and human intervention. Online extrinsic calibration attempts to perform calibration in any unknown scene without the use of any specific targets, but the accuracy is not as good as offline methods. Thus, online calibration is more desirable in terms of convenience, but the accuracy needs to be improved. Through this thesis work, existing methods for offline and online extrinsic calibration between a LiDAR and camera are reviewed, and finally, a new method is proposed for online extrinsic calibration between a 3D LiDAR and monocular camera. The idea behind this new method is to leverage deep learning architectures to learn the extrinsic parameters from the image and point cloud data obtained from the two sensors. The network architecture takes the raw image and raw point cloud as the input and gives the rotation and translation parameters as the output. The utility of this proposed method is shown through extensive experiments in this work using the KITTI360 dataset. The proposed solution has 2 variations in the design, both of which are described and extensively tested. For miscalibrations in the range of ±0.2m translation per axis and ±10° rotation per axis, it is shown that the first design variation achieves a mean rotation error of 0.56° and mean translation error of 4.87 cm. For the same range of miscalibrations, the second design variation is shown to achieve a mean rotation error of 0.85° and a mean translation error of 3.97 cm. Finally, the proposed solution is adapted for a generalized use case and the utility is shown using experiments on data collected using real sensors such as the Livox Horizon 3D LiDAR and the ZED camera.
|