Summary: | Intelligent unmanned vending machines (UVMs) based on machine vision have attracted great attention in the unmanned retail industry. However, due to the complexity of practical application scenarios and environments, the existing vision-based intelligent UVMs face challenges related to missed-detection and mis-detection of product, and require costly physical components such as the infrared radio frequency sensors to capture shopping behaviors. In this study, we propose a BP-YOLO, the real-time model that integrates optimized YOLOv7 and BlazePose for product detection and shopping behaviors recognition. BP-YOLO can accurately detect the products purchased by consumers and their shopping behaviors in complex scenarios. To address the problems of missed-detection and mis-detection, we introduce the 3D attention mechanism SimAM and the deformable ConvNets v2 (DCNv2) to recombine and optimize the one-stage object detection model YOLOv7. This method reduces the interference of the invalid information in complex scenarios by adaptively weighting each channel and 3D spatial features, focuses on feature information in a sparse space, and minimizes the loss of feature information during the transmission process based on multi-scale feature extraction and fusion. To recognize and judge the shopping behaviors of consumers, we track the hand and arm key points of consumers using the pose estimation model BlazePose. Using the mAP@[0.5:0.95] as the evaluation metric for product detection, the experimental results on a customized product dataset show that BP-YOLO achieves an average accuracy of 96.17% for all product categories detection; the average success rate of consumer shopping recognition reaches 92%, 98%, and 94.7% under three light and noise intensity, respectively. Therefore, our BP-YOLO model for intelligent UVMs has effectiveness in commercial deployment.
|