Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning

Self-supervised learning has emerged as an increasingly popular research topic within the field of computer vision. In this study, we propose a novel self-supervised learning approach based on Mixup features as pretext tasks. The proposed method aims to learn visual representations by predicting the...

Full description

Bibliographic Details
Main Authors:	Jiashu Xu, Sergii Stirenko
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Computer vision mixup feature self-supervised learning masked autoencoder
Online Access:	https://ieeexplore.ieee.org/document/10207028/

_version_	1797742946622111744
author	Jiashu Xu Sergii Stirenko
author_facet	Jiashu Xu Sergii Stirenko
author_sort	Jiashu Xu
collection	DOAJ
description	Self-supervised learning has emerged as an increasingly popular research topic within the field of computer vision. In this study, we propose a novel self-supervised learning approach based on Mixup features as pretext tasks. The proposed method aims to learn visual representations by predicting the Mixup-Feature of a masked image, which serves as a proxy for higher-level semantic information. Specifically, we investigate the efficacy of Mixup features as the prediction target for self-supervised learning. By setting the hyperparameter <inline-formula> <tex-math notation="LaTeX">$\lambda $ </tex-math></inline-formula> through Mixup operations, pairwise combinations of Sobel edge feature maps, HOG feature maps, and LBP feature maps are created. We employ the vision transformer as the backbone network, drawing inspiration from masked autoencoders (MAE). We evaluate the proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compare it with other state-of-the-art self-supervised learning approaches. The experiments demonstrate that mixed HOG-Sobel feature maps after Mixup achieve the best results in fine-tuning experiments on Cifar-10 and STL-10. Furthermore, compared to contrastive learning-based self-supervised learning methods, our approach proves to be more efficient, with shorter training durations and no reliance on data augmentation. When compared to generative self-supervised learning approaches based on MAE, the average performance improvement is 0.4%. Overall, the proposed self-supervised learning method based on Mixup features offers a promising direction for future research in the computer vision domain and has the potential to enhance performance across various downstream tasks. Our code will be published in GitHub.
first_indexed	2024-03-12T14:47:31Z
format	Article
id	doaj.art-166c67681533442c98cde0e3510691c5
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T14:47:31Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-166c67681533442c98cde0e3510691c52023-08-15T23:00:52ZengIEEEIEEE Access2169-35362023-01-0111824008240910.1109/ACCESS.2023.330156110207028Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature LearningJiashu Xu0https://orcid.org/0000-0001-6300-3629Sergii Stirenko1Computer Engineering Department, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, UkraineComputer Engineering Department, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, UkraineSelf-supervised learning has emerged as an increasingly popular research topic within the field of computer vision. In this study, we propose a novel self-supervised learning approach based on Mixup features as pretext tasks. The proposed method aims to learn visual representations by predicting the Mixup-Feature of a masked image, which serves as a proxy for higher-level semantic information. Specifically, we investigate the efficacy of Mixup features as the prediction target for self-supervised learning. By setting the hyperparameter <inline-formula> <tex-math notation="LaTeX">$\lambda $ </tex-math></inline-formula> through Mixup operations, pairwise combinations of Sobel edge feature maps, HOG feature maps, and LBP feature maps are created. We employ the vision transformer as the backbone network, drawing inspiration from masked autoencoders (MAE). We evaluate the proposed method on three benchmark datasets, namely Cifar-10, Cifar-100, and STL-10, and compare it with other state-of-the-art self-supervised learning approaches. The experiments demonstrate that mixed HOG-Sobel feature maps after Mixup achieve the best results in fine-tuning experiments on Cifar-10 and STL-10. Furthermore, compared to contrastive learning-based self-supervised learning methods, our approach proves to be more efficient, with shorter training durations and no reliance on data augmentation. When compared to generative self-supervised learning approaches based on MAE, the average performance improvement is 0.4%. Overall, the proposed self-supervised learning method based on Mixup features offers a promising direction for future research in the computer vision domain and has the potential to enhance performance across various downstream tasks. Our code will be published in GitHub.https://ieeexplore.ieee.org/document/10207028/Computer visionmixup featureself-supervised learningmasked autoencoder
spellingShingle	Jiashu Xu Sergii Stirenko Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning IEEE Access Computer vision mixup feature self-supervised learning masked autoencoder
title	Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning
title_full	Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning
title_fullStr	Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning
title_full_unstemmed	Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning
title_short	Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning
title_sort	mixup feature a pretext task self supervised learning method for enhanced visual feature learning
topic	Computer vision mixup feature self-supervised learning masked autoencoder
url	https://ieeexplore.ieee.org/document/10207028/
work_keys_str_mv	AT jiashuxu mixupfeatureapretexttaskselfsupervisedlearningmethodforenhancedvisualfeaturelearning AT sergiistirenko mixupfeatureapretexttaskselfsupervisedlearningmethodforenhancedvisualfeaturelearning

Mixup Feature: A Pretext Task Self-Supervised Learning Method for Enhanced Visual Feature Learning

Similar Items