Summary: | Development of deep learning has led to progress in computer vision, including metric learning tasks such as image retrieval, through convolutional neural networks. In image retrieval, the metric distance (i.e., the similarity) between the images needs to be computed and then compared to return similar images. Global descriptors are good at extracting holistic features of an image, such as the overall shape of the main object and the silhouette. On the other hand, the local features extract the detailed features which the model uses to help classify similar images together. This paper proposes a descriptor mixer which takes advantage of both local and global descriptors (group of features combined into one) as well as different types of global descriptors for an effect of a lighter version of an ensemble of models (i.e., fewer parameters and smaller model size than those of actual ensemble of networks). As a result, the model’s performance improved about 1.36% (recall @ 32) when the combination of the descriptors were used. We empirically found out that the combination of GeM and MAC achieved the highest performance.
|