Summary: | Object detection under weakly supervised learning is a challenging issue. In a remote sensing ship detection task, the weakly supervised learning method requires that the training set only has image-level class annotations. In the absence of location information, it is difficult to locate ships and extract features. Moreover, when the detector extracts more than one candidate region, the image-level annotation makes it difficult to determine whether the candidate regions are multiple ships or mixed with the detected background. To address these issues, this article analyzes the interaction between class information and location information, and proposes a weakly supervised detection method, i.e., PistonNet, based on data-efficient image transformers. PistonNet proposes an artificial point, which is inserted into the feature map. Artificial point can suppress the background and enhance the object by interfering in the weight distribution of object and background in self-attention calculation. PistonNet also proposes joint confidence probability to improve detection accuracy. Experiments on the GF1-LRSD and NWPU VHR-10 datasets show that the proposed method boosts detection performance effectively. The main improvements of PistonNet as well as the contributions of this article are threefold. First, PistonNet is a specific weakly supervised object detection method for single-class detection, which provides an innovative approach to classifying target and background. Second, PistonNet reaches the level of advanced supervised detectors on detection accuracy with fewer parameters. Finally, the objects’ locations detected by PistonNet are obtained by segmenting regions on the heat map. PistonNet’s background suppression ability makes it free from dependence on segmentation threshold.
|