Summary: | The rapid outbreak of COVID-19 has caused serious harm and infected tens of millions of people worldwide. Since there is no specific treatment, wearing masks has become an effective method to prevent the transmission of COVID-19 and is required in most public areas, which has also led to a growing demand for automatic real-time mask detection services to replace manual reminding. However, few studies on face mask detection are being conducted. It is urgent to improve the performance of mask detectors. In this paper, we proposed the Properly Wearing Masked Face Detection Dataset (PWMFD), which included 9205 images of mask wearing samples with three categories. Moreover, we proposed Squeeze and Excitation (SE)-YOLOv3, a mask detector with relatively balanced effectiveness and efficiency. We integrated the attention mechanism by introducing the SE block into Darknet53 to obtain the relationships among channels so that the network can focus more on the important feature. We adopted GIoUloss, which can better describe the spatial difference between predicted and ground truth boxes to improve the stability of bounding box regression. Focal loss was utilized for solving the extreme foreground-background class imbalance. Besides, we performed corresponding image augmentation techniques to further improve the robustness of the model on the specific task. Experimental results showed that SE-YOLOv3 outperformed YOLOv3 and other state-of-the-art detectors on PWMFD and achieved a higher 8.6% mAP compared to YOLOv3 while having a comparable detection speed.
|