Summary: | Deep learning-based forest fire vision monitoring methods have developed rapidly and are becoming mainstream. The existing methods, however, are based on enormous amounts of data, and have issues with weak feature extraction, poor small target recognition and many missed and false detections in complex forest scenes. In order to solve these problems, we proposed a multi-task learning-based forest fire detection model (MTL-FFDet), which contains three tasks (the detection task, the segmentation task and the classification task) and shares the feature extraction module. In addition, to improve detection accuracy and decrease missed and false detections, we proposed the joint multi-task non-maximum suppression (NMS) processing algorithm that fully utilizes the advantages of each task. Furthermore, considering the objective fact that divided flame targets in an image are still flame targets, our proposed data augmentation strategy of a diagonal swap of random origin is a good remedy for the poor detection effect caused by small fire targets. Experiments showed that our model outperforms YOLOv5-s in terms of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>m</mi><mi>A</mi><mi>P</mi></mrow></semantics></math></inline-formula> (mean average precision) by 3.2%, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi><msub><mi>P</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula> (average precision for small objects) by 4.8%, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi><msub><mi>R</mi><mi>S</mi></msub></mrow></semantics></math></inline-formula> (average recall for small objects) by 4.0%, and other metrics by 1% to 2%. Finally, the visualization analysis showed that our multi-task model can focus on the target region better than the single-task model during feature extraction, with superior extraction ability.
|