Zero-Shot Image Classification Based on a Learnable Deep Metric
The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no tra...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-05-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/21/9/3241 |
_version_ | 1797534996503724032 |
---|---|
author | Jingyi Liu Caijuan Shi Dongjing Tu Ze Shi Yazhi Liu |
author_facet | Jingyi Liu Caijuan Shi Dongjing Tu Ze Shi Yazhi Liu |
author_sort | Jingyi Liu |
collection | DOAJ |
description | The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method. |
first_indexed | 2024-03-10T11:38:20Z |
format | Article |
id | doaj.art-2eccfe51db3440bd98148b109285bfc5 |
institution | Directory Open Access Journal |
issn | 1424-8220 |
language | English |
last_indexed | 2024-03-10T11:38:20Z |
publishDate | 2021-05-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj.art-2eccfe51db3440bd98148b109285bfc52023-11-21T18:41:44ZengMDPI AGSensors1424-82202021-05-01219324110.3390/s21093241Zero-Shot Image Classification Based on a Learnable Deep MetricJingyi Liu0Caijuan Shi1Dongjing Tu2Ze Shi3Yazhi Liu4College of Information Engineering, North China University of Science and Technology, Tangshan 063210, ChinaCollege of Information Engineering, North China University of Science and Technology, Tangshan 063210, ChinaCollege of Information Engineering, North China University of Science and Technology, Tangshan 063210, ChinaCollege of Information Engineering, North China University of Science and Technology, Tangshan 063210, ChinaCollege of Information Engineering, North China University of Science and Technology, Tangshan 063210, ChinaThe supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.https://www.mdpi.com/1424-8220/21/9/3241zero-shot learningdeep metriccommon space embeddingrelation networkimage classificationdeep learning |
spellingShingle | Jingyi Liu Caijuan Shi Dongjing Tu Ze Shi Yazhi Liu Zero-Shot Image Classification Based on a Learnable Deep Metric Sensors zero-shot learning deep metric common space embedding relation network image classification deep learning |
title | Zero-Shot Image Classification Based on a Learnable Deep Metric |
title_full | Zero-Shot Image Classification Based on a Learnable Deep Metric |
title_fullStr | Zero-Shot Image Classification Based on a Learnable Deep Metric |
title_full_unstemmed | Zero-Shot Image Classification Based on a Learnable Deep Metric |
title_short | Zero-Shot Image Classification Based on a Learnable Deep Metric |
title_sort | zero shot image classification based on a learnable deep metric |
topic | zero-shot learning deep metric common space embedding relation network image classification deep learning |
url | https://www.mdpi.com/1424-8220/21/9/3241 |
work_keys_str_mv | AT jingyiliu zeroshotimageclassificationbasedonalearnabledeepmetric AT caijuanshi zeroshotimageclassificationbasedonalearnabledeepmetric AT dongjingtu zeroshotimageclassificationbasedonalearnabledeepmetric AT zeshi zeroshotimageclassificationbasedonalearnabledeepmetric AT yazhiliu zeroshotimageclassificationbasedonalearnabledeepmetric |