End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model

Raw image classification datasets generally maintain a long-tailed distribution in the real world. Standard classification algorithms face a substantial issue because many labels only relate to a few categories. The model learning processes will tend toward the dominant labels under the influence of...

Full description

Bibliographic Details
Main Authors:	Fei Yan, Hui Zhang, Yaogen Li, Yongjia Yang, Yinping Liu
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Applied Sciences
Subjects:	unmanned aerial vehicle state grid transmission line clamps image classification multimodule fusion neural network
Online Access:	https://www.mdpi.com/2076-3417/13/5/3287

_version_	1797615630373879808
author	Fei Yan Hui Zhang Yaogen Li Yongjia Yang Yinping Liu
author_facet	Fei Yan Hui Zhang Yaogen Li Yongjia Yang Yinping Liu
author_sort	Fei Yan
collection	DOAJ
description	Raw image classification datasets generally maintain a long-tailed distribution in the real world. Standard classification algorithms face a substantial issue because many labels only relate to a few categories. The model learning processes will tend toward the dominant labels under the influence of their loss functions. Existing systems typically use two stages to improve performance: pretraining on initial imbalanced datasets and fine-tuning on balanced datasets via re-sampling or logit adjustment. These have achieved promising results. However, their limited self-supervised information makes it challenging to transfer such systems to other vision tasks, such as detection and segmentation. Using large-scale contrastive visual-language pretraining, the Open AI team discovered a novel visual recognition method. We provide a simple one-stage model called the text-to-image network (TIN) for long-tailed recognition (LTR) based on the similarities between textual and visual features. The TIN has the following advantages over existing techniques: (1) Our model incorporates textual and visual semantic information. (2) This end-to-end strategy achieves good results with fewer image samples and no secondary training. (3) By using seesaw loss, we further reduce the loss gap between the head category and the tail category. These adjustments encourage large relative magnitudes between the logarithms of rare and dominant labels. TIN conducted extensive comparative experiments with a large number of advanced models on ImageNet-LT, the largest long-tailed public dataset, and achieved the state-of-the-art for a single-stage model with 72.8% at Top-1 accuracy.
first_indexed	2024-03-11T07:29:24Z
format	Article
id	doaj.art-0feb2911ac2440688b4135e8380d1cde
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-11T07:29:24Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-0feb2911ac2440688b4135e8380d1cde2023-11-17T07:21:48ZengMDPI AGApplied Sciences2076-34172023-03-01135328710.3390/app13053287End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language ModelFei Yan0Hui Zhang1Yaogen Li2Yongjia Yang3Yinping Liu4College of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaCollege of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaCollege of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaCollege of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaCollege of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, ChinaRaw image classification datasets generally maintain a long-tailed distribution in the real world. Standard classification algorithms face a substantial issue because many labels only relate to a few categories. The model learning processes will tend toward the dominant labels under the influence of their loss functions. Existing systems typically use two stages to improve performance: pretraining on initial imbalanced datasets and fine-tuning on balanced datasets via re-sampling or logit adjustment. These have achieved promising results. However, their limited self-supervised information makes it challenging to transfer such systems to other vision tasks, such as detection and segmentation. Using large-scale contrastive visual-language pretraining, the Open AI team discovered a novel visual recognition method. We provide a simple one-stage model called the text-to-image network (TIN) for long-tailed recognition (LTR) based on the similarities between textual and visual features. The TIN has the following advantages over existing techniques: (1) Our model incorporates textual and visual semantic information. (2) This end-to-end strategy achieves good results with fewer image samples and no secondary training. (3) By using seesaw loss, we further reduce the loss gap between the head category and the tail category. These adjustments encourage large relative magnitudes between the logarithms of rare and dominant labels. TIN conducted extensive comparative experiments with a large number of advanced models on ImageNet-LT, the largest long-tailed public dataset, and achieved the state-of-the-art for a single-stage model with 72.8% at Top-1 accuracy.https://www.mdpi.com/2076-3417/13/5/3287unmanned aerial vehiclestate gridtransmission line clampsimage classificationmultimodule fusionneural network
spellingShingle	Fei Yan Hui Zhang Yaogen Li Yongjia Yang Yinping Liu End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model Applied Sciences unmanned aerial vehicle state grid transmission line clamps image classification multimodule fusion neural network
title	End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model
title_full	End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model
title_fullStr	End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model
title_full_unstemmed	End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model
title_short	End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model
title_sort	end to end a simple template for the long tailed recognition of transmission line clamps via a vision language model
topic	unmanned aerial vehicle state grid transmission line clamps image classification multimodule fusion neural network
url	https://www.mdpi.com/2076-3417/13/5/3287
work_keys_str_mv	AT feiyan endtoendasimpletemplateforthelongtailedrecognitionoftransmissionlineclampsviaavisionlanguagemodel AT huizhang endtoendasimpletemplateforthelongtailedrecognitionoftransmissionlineclampsviaavisionlanguagemodel AT yaogenli endtoendasimpletemplateforthelongtailedrecognitionoftransmissionlineclampsviaavisionlanguagemodel AT yongjiayang endtoendasimpletemplateforthelongtailedrecognitionoftransmissionlineclampsviaavisionlanguagemodel AT yinpingliu endtoendasimpletemplateforthelongtailedrecognitionoftransmissionlineclampsviaavisionlanguagemodel

End-to-End: A Simple Template for the Long-Tailed-Recognition of Transmission Line Clamps via a Vision-Language Model

Similar Items