Deep feature learning for image classification via countering over-fitting

The great success of deep neural networks on visual recognition has inspired numerous real-world applications. However, such superior performance is closely related to model complexity and the amount of annotated data. Over-deepened networks and lack of data annotation will degrade generalization ca...

Full description

Bibliographic Details
Main Author: Qing, Yuanyuan
Other Authors: Huang Guangbin
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/151080
Description
Summary:The great success of deep neural networks on visual recognition has inspired numerous real-world applications. However, such superior performance is closely related to model complexity and the amount of annotated data. Over-deepened networks and lack of data annotation will degrade generalization capability of the model as over-fitting problems arise. In this thesis, the focus is on extracting robust semantic features in image data by alleviating over-fitting problems under different learning frameworks. For the first work in this thesis, the over-fitting problem of Extreme Learning Machine (ELM) classifier when combined with convolutional neural network (CNN) for supervised learning is studied. To remedy the over-fitting issue while still utilizing excellent feature extraction capability of deep neural network, a novel deep and wide feature based ELM (DW-ELM) is proposed by employing wide architecture design of residual networks (ResNets) for feature extraction. The empirical study has demonstrated that when combined with ELM that serves as a classifier, using wide ResNets (WRNs) for feature extraction can greatly compress the generalization gap. Extensive experiments on five visual benchmark datasets have shown that the proposed DW-ELM is able to boost and stabilize the generalization capability of the original backbone CNN model to a great extent. For the second work in this thesis, scarce annotation problem of semi-supervised learning is studied. Label propagation is commonly utilized to provide information flow from labeled data to unlabeled data as an transductive learning algorithm for pseudo-labeling purpose. Two limitations of previous algorithms that ultimately lead to noisy and incomplete information flow are addressed in this thesis. The first limitation is that the learned feature mapping is highly likely to be biased and can easily over-fit noise as only labeled data are used for feature learning. The second limitation is the loss of local geometry information in feature space during label propagation. This thesis proposes a novel algorithm to alleviate the above mentioned issues by incorporating self-supervised learning into feature learning phase and utilizing reconstruction concept to preserve local geometry. Extensive experiments conducted on three visual benchmark datasets have verified the effectiveness of the proposed algorithm and the empirical results show that the proposed algorithm consistently outperforms most of the state-of-the-art semi-supervised learning algorithms. For the third work in this thesis, the focus is on novel visual categories learning, which is a clustering problem with certain prior knowledge. The task can also be considered as a special type of semi-supervised learning where the categories of unlabeled data and labeled data are disjoint from each other. The main challenge is how to effectively leverage knowledge in labeled data to unlabeled data when they are independent from each other, and not belonging to the same set of categories. Two issues commonly inherent in previous algorithms: 1) All of previous algorithms are comprised of multiple training phases, which makes it difficult to train the model in an end-to-end fashion. 2) Strong dependence on the quality of pairwise similarity pseudo labels limits the performance as pseudo labels are vulnerable to noise and bias. This thesis proposes an end-to-end novel visual categories learning algorithm via auxiliary self-supervision tasks, such that labeled data and unlabeled data will share the same set of surrogate labels and overall supervising signals can have strong regularization. Moreover, local structure information in feature space is utilized for pairwise pseudo label construction as local properties are more robust to noise. Experiments conducted on three visual benchmark datasets have indicated the effectiveness of the proposed algorithms and new state-of-the-art performances have been achieved. Overall, this thesis discussed the over-fitting problem of deep learning-based feature learning in visual understanding from two perspectives : 1) Over-fitting problem of supervised learning due to network architecture. 2) Over-fitting problem in semi-supervised and unsupervised learning due to the lack of data annotation.