Summary: | We present a low-cost monocular 3D position estimation method for perception in aquaculture monitoring. Video surveillance of aquaculture has many advantages but given the size of farms and the complexity of their habitats, it is not feasible for farmers to continuously monitor fish health. We formulate a novel end-to-end deep visual learning pipeline called Aqua3DNet that estimates fish pose using a bottom-up approach to detect and assign key features in one pass. In addition, a depth estimation model using Saliency Object Detection (SOD) masks is implemented to track the 3D position of the fish over time, which is used in this paper to create 3D density heat maps of the fish. The evaluation of the algorithm's performance shows that the detection accuracy reaches 80.63%, the F1 score reaches 87.34%, and the frames per second (fps) reaches 5.12. Aqua3DNet achieves comparable performance to other aquaculture-based computer vision and depth estimation models, with minimal decrease in speed despite the synthesis of the two models.
|