Image classification with limited data information

Image classification is a fundamental problem in image processing and computer vision. Recent algorithms have achieved significantly better results by learning deep features from large-scale datasets, such as ImageNet. However, in practice, challenges persist, especially with (I) low-quality image d...

Full description

Bibliographic Details
Main Author: Cheng, Hao
Other Authors: Wen Bihan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174167
_version_ 1811688191805095936
author Cheng, Hao
author2 Wen Bihan
author_facet Wen Bihan
Cheng, Hao
author_sort Cheng, Hao
collection NTU
description Image classification is a fundamental problem in image processing and computer vision. Recent algorithms have achieved significantly better results by learning deep features from large-scale datasets, such as ImageNet. However, in practice, challenges persist, especially with (I) low-quality image data, such as noisy data or image data with variations in object appearance, as encountered in image-set classification, and (II) limited availability of image data, including scarce samples, for example, in weakly supervised classification, or restricted availability of labeled data, as seen in few-shot image classification. These tasks require generic and highly flexible models, but also able to avoid over-fitting and failure to generalize when only a few samples are available. This thesis presents three works to tackle image classification tasks with limited data information in weakly supervised and few-shot learning. We begin our study with small-scale visual classification tasks. From a traditional model-based perspective, we introduce a novel method called Joint Statistical and Spatial Sparse (J3S) representation, which reconciles local spatial patch structures and global statistical Gaussian distribution with joint sparsity. Integrating global Gaussian statistical and local spatial patch information through J3S with two joint dictionaries yields more accurate and robust results compared to considering specific information alone. Moving beyond general small-scale classification tasks, then we extend our exploration to a specific task, few-shot image classification. Here, we propose two deep learning-based methods to tackle challenges arising from limited data information. Initially, we focus on Graph Neural Networks (GNN) and investigate the limitations of existing GNN methods for few-shot learning. To address over-fitting and over-smoothing issues observed in recent GNN approaches, we propose the Attentive GNN (AGNN) framework. AGNN incorporates a triple-attention mechanism, facilitating graph initialization, graph update, and correlation across graph layers. We provide both theoretical analysis and practical illustrations to showcase how the proposed modules enhance GNN scalability for few-shot tasks, thereby improving few-shot performance. Subsequently, we explore more generalized and challenging few-shot scenarios, encompassing few-shot domain generalization settings. To address feature distraction caused by class-irrelevant excursive features such as style, domain, and background in image data, we propose a novel Disentangled Feature Representation framework (DFR). DFR effectively removes irrelevant information for classification, thus enhancing performance with class-domain disentanglement. Furthermore, we reorganize a novel dataset called FS-DomainNet based on DomainNet, specifically for benchmarking few-shot domain generalization tasks. The main contributions of this thesis are three folds. Firstly, we conduct a comprehensive study on image classification with limited data from both model-based and deep learning-based perspectives. Secondly, we propose three novel approaches that address various challenges caused by limited data information from different angles. Additionally, we also introduce the FS-DomainNet dataset, specifically designed for evaluating the performance of few-shot methods in more generalized and challenging real-life scenarios. Lastly, we validate the effectiveness of our proposed methods through extensive experiments on multiple benchmarks. Both qualitative and quantitative results demonstrate the improved performance of the proposed approaches for image classification with limited data. The contributions made in this study significantly advance the understanding and practical capabilities of image classification with limited data information and provide essential groundwork for future research in this domain.
first_indexed 2024-10-01T05:28:17Z
format Thesis-Doctor of Philosophy
id ntu-10356/174167
institution Nanyang Technological University
language English
last_indexed 2024-10-01T05:28:17Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1741672024-04-09T03:58:58Z Image classification with limited data information Cheng, Hao Wen Bihan School of Electrical and Electronic Engineering bihan.wen@ntu.edu.sg Engineering Image classification Few-shot learning Image classification is a fundamental problem in image processing and computer vision. Recent algorithms have achieved significantly better results by learning deep features from large-scale datasets, such as ImageNet. However, in practice, challenges persist, especially with (I) low-quality image data, such as noisy data or image data with variations in object appearance, as encountered in image-set classification, and (II) limited availability of image data, including scarce samples, for example, in weakly supervised classification, or restricted availability of labeled data, as seen in few-shot image classification. These tasks require generic and highly flexible models, but also able to avoid over-fitting and failure to generalize when only a few samples are available. This thesis presents three works to tackle image classification tasks with limited data information in weakly supervised and few-shot learning. We begin our study with small-scale visual classification tasks. From a traditional model-based perspective, we introduce a novel method called Joint Statistical and Spatial Sparse (J3S) representation, which reconciles local spatial patch structures and global statistical Gaussian distribution with joint sparsity. Integrating global Gaussian statistical and local spatial patch information through J3S with two joint dictionaries yields more accurate and robust results compared to considering specific information alone. Moving beyond general small-scale classification tasks, then we extend our exploration to a specific task, few-shot image classification. Here, we propose two deep learning-based methods to tackle challenges arising from limited data information. Initially, we focus on Graph Neural Networks (GNN) and investigate the limitations of existing GNN methods for few-shot learning. To address over-fitting and over-smoothing issues observed in recent GNN approaches, we propose the Attentive GNN (AGNN) framework. AGNN incorporates a triple-attention mechanism, facilitating graph initialization, graph update, and correlation across graph layers. We provide both theoretical analysis and practical illustrations to showcase how the proposed modules enhance GNN scalability for few-shot tasks, thereby improving few-shot performance. Subsequently, we explore more generalized and challenging few-shot scenarios, encompassing few-shot domain generalization settings. To address feature distraction caused by class-irrelevant excursive features such as style, domain, and background in image data, we propose a novel Disentangled Feature Representation framework (DFR). DFR effectively removes irrelevant information for classification, thus enhancing performance with class-domain disentanglement. Furthermore, we reorganize a novel dataset called FS-DomainNet based on DomainNet, specifically for benchmarking few-shot domain generalization tasks. The main contributions of this thesis are three folds. Firstly, we conduct a comprehensive study on image classification with limited data from both model-based and deep learning-based perspectives. Secondly, we propose three novel approaches that address various challenges caused by limited data information from different angles. Additionally, we also introduce the FS-DomainNet dataset, specifically designed for evaluating the performance of few-shot methods in more generalized and challenging real-life scenarios. Lastly, we validate the effectiveness of our proposed methods through extensive experiments on multiple benchmarks. Both qualitative and quantitative results demonstrate the improved performance of the proposed approaches for image classification with limited data. The contributions made in this study significantly advance the understanding and practical capabilities of image classification with limited data information and provide essential groundwork for future research in this domain. Doctor of Philosophy 2024-03-18T10:45:01Z 2024-03-18T10:45:01Z 2023 Thesis-Doctor of Philosophy Cheng, H. (2023). Image classification with limited data information. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/174167 https://hdl.handle.net/10356/174167 10.32657/10356/174167 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Engineering
Image classification
Few-shot learning
Cheng, Hao
Image classification with limited data information
title Image classification with limited data information
title_full Image classification with limited data information
title_fullStr Image classification with limited data information
title_full_unstemmed Image classification with limited data information
title_short Image classification with limited data information
title_sort image classification with limited data information
topic Engineering
Image classification
Few-shot learning
url https://hdl.handle.net/10356/174167
work_keys_str_mv AT chenghao imageclassificationwithlimiteddatainformation