SiCoDeF² Net: Siamese Convolution Deconvolution Feature Fusion Network for One-Shot Classification

Nowadays, deep convolutional neural networks (CNNs) for face recognition exhibit a performance comparable to human ability in the presence of the appropriate amount of labelled training data. However, training CNNs remains as an arduous task due to the lack of training samples. To overcome this draw...

Full description

Bibliographic Details
Main Authors: Swalpa Kumar Roy, Purbayan Kar, Mercedes E. Paoletti, Juan M. Haut, Rafael Pastor-Vargas, Antonio Robles-Gomez
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9521872/
Description
Summary:Nowadays, deep convolutional neural networks (CNNs) for face recognition exhibit a performance comparable to human ability in the presence of the appropriate amount of labelled training data. However, training CNNs remains as an arduous task due to the lack of training samples. To overcome this drawback, applications demand one-shot learning to improve the obtained performances over traditional machine learning approaches by learning representative information about data categories from few training samples. In this context, Siamese convolutional network (<monospace>SiConvNet</monospace>) provides an interesting deep architecture to tackle the data limitation. In this regard, applying the convolution operation on real world images by using the trainable correlative Gaussian kernel adds correlations to the output images, which hinder the recognition process due to the blurring effects introduced by the convolution kernel application. As a result the pixel-wise and channel-wise correlations or redundancies could appear in both single and multiple feature maps obtained by a hidden layer. In this sense, convolution-based models fail to generalize the feature representation because of both the strong correlations presence in neighboring pixels and the channel-wise high redundancies between different channels of the feature maps, which hamper the effective training. <italic>Deconvolution</italic> operation helps to overcome the shortcomings that limit the conventional <monospace>SiConvNet</monospace> performance, learning successfully correlation-free features representation. In this paper, a simple but efficient Siamese convolution deconvolution feature fusion network (<monospace>SiCoDeF</monospace><sup>2</sup><monospace>Net</monospace>) is proposed to learn the invariant and discriminative complementary features generated from both the (i) sub-convolution (SCoNet) and (ii) sub deconvolutional (SDeNet) networks using a concatenation operation which significantly improves the one-shot unconstrained facial recognition task. Extensive experiments performed on several widely used benchmarks, provide promising results, where the proposed <monospace>SiCoDeF</monospace><sup>2</sup><monospace>Net</monospace> model significantly outperforms the current state-of-art in terms of classification accuracy, F1, precision and recall. The code will be available on: <uri>https://github.com/purbayankar/SiCoDeF2Net</uri>.
ISSN:2169-3536