Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator

Although image recognition technologies are developing rapidly with deep learning, conventional recognition models trained by supervised learning with class labels do not work well when test inputs from untrained classes are given. For example, a recognizer trained to classify Asian bird species can...

Full description

Bibliographic Details
Main Authors:	Chan Hur, Hyeyoung Park
Format:	Article
Language:	English
Published:	MDPI AG 2023-06-01
Series:	Applied Sciences
Subjects:	zero-shot learning image captioning joint-embedding visual feature enhancement textural feature generation
Online Access:	https://www.mdpi.com/2076-3417/13/12/7071

_version_	1797596236573835264
author	Chan Hur Hyeyoung Park
author_facet	Chan Hur Hyeyoung Park
author_sort	Chan Hur
collection	DOAJ
description	Although image recognition technologies are developing rapidly with deep learning, conventional recognition models trained by supervised learning with class labels do not work well when test inputs from untrained classes are given. For example, a recognizer trained to classify Asian bird species cannot recognize the species of kiwi, because the class label “kiwi” and its image samples have not been seen during training. To overcome this limitation, zero-shot classification has been studied recently, and the joint-embedding-based approach has been suggested as one of the promised solutions. In this approach, image features and text descriptions belonging to the same class are trained to be closely located in a common joint-embedding space. Once we obtain the embedding function that can represent the semantic relationship of image–text pairs in training data, test images and text descriptions (prototypes) of unseen classes can also be mapped to the joint-embedding space for classification. The main challenge with this approach is mapping inputs of two different modalities into a common space, and previous works suffer from the inconsistency between the distribution of two feature sets on joint-embedding space extracted from the heterogeneous inputs. To treat this problem, we propose a novel method of employing additional textual information to rectify the visual representation of input images. Since the conceptual information of test classes is generally given as texts, we expect that the additional descriptions from a caption generator can adjust the visual feature for better matching with the representation of the test classes. We also propose to use the generated textual descriptions to augment training samples for learning joint-embedding space. In the experiments on two benchmark datasets, the proposed method shows significant performance improvements of 1.4% on the CUB dataset and 5.5% on the flower dataset, in comparison to existing models.
first_indexed	2024-03-11T02:48:45Z
format	Article
id	doaj.art-727eb73d833d4beca67f9f928307c939
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T02:48:45Z
publishDate	2023-06-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-727eb73d833d4beca67f9f928307c9392023-11-18T09:08:35ZengMDPI AGApplied Sciences2076-34172023-06-011312707110.3390/app13127071Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption GeneratorChan Hur0Hyeyoung Park1School of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Republic of KoreaSchool of Computer Science and Engineering, Kyungpook National University, Daegu 41566, Republic of KoreaAlthough image recognition technologies are developing rapidly with deep learning, conventional recognition models trained by supervised learning with class labels do not work well when test inputs from untrained classes are given. For example, a recognizer trained to classify Asian bird species cannot recognize the species of kiwi, because the class label “kiwi” and its image samples have not been seen during training. To overcome this limitation, zero-shot classification has been studied recently, and the joint-embedding-based approach has been suggested as one of the promised solutions. In this approach, image features and text descriptions belonging to the same class are trained to be closely located in a common joint-embedding space. Once we obtain the embedding function that can represent the semantic relationship of image–text pairs in training data, test images and text descriptions (prototypes) of unseen classes can also be mapped to the joint-embedding space for classification. The main challenge with this approach is mapping inputs of two different modalities into a common space, and previous works suffer from the inconsistency between the distribution of two feature sets on joint-embedding space extracted from the heterogeneous inputs. To treat this problem, we propose a novel method of employing additional textual information to rectify the visual representation of input images. Since the conceptual information of test classes is generally given as texts, we expect that the additional descriptions from a caption generator can adjust the visual feature for better matching with the representation of the test classes. We also propose to use the generated textual descriptions to augment training samples for learning joint-embedding space. In the experiments on two benchmark datasets, the proposed method shows significant performance improvements of 1.4% on the CUB dataset and 5.5% on the flower dataset, in comparison to existing models.https://www.mdpi.com/2076-3417/13/12/7071zero-shot learningimage captioningjoint-embeddingvisual feature enhancementtextural feature generation
spellingShingle	Chan Hur Hyeyoung Park Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator Applied Sciences zero-shot learning image captioning joint-embedding visual feature enhancement textural feature generation
title	Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator
title_full	Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator
title_fullStr	Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator
title_full_unstemmed	Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator
title_short	Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator
title_sort	zero shot image classification with rectified embedding vectors using a caption generator
topic	zero-shot learning image captioning joint-embedding visual feature enhancement textural feature generation
url	https://www.mdpi.com/2076-3417/13/12/7071
work_keys_str_mv	AT chanhur zeroshotimageclassificationwithrectifiedembeddingvectorsusingacaptiongenerator AT hyeyoungpark zeroshotimageclassificationwithrectifiedembeddingvectorsusingacaptiongenerator

Zero-Shot Image Classification with Rectified Embedding Vectors Using a Caption Generator

Similar Items