Heuristic Attention Representation Learning for Self-Supervised Pretraining

Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random crop...

Full description

Bibliographic Details
Main Authors:	Van Nhiem Tran, Shen-Hsuan Liu, Yung-Hui Li, Jia-Ching Wang
Format:	Article
Language:	English
Published:	MDPI AG 2022-07-01
Series:	Sensors
Subjects:	heuristic attention perceptual grouping self-supervised learning visual representation learning deep learning computer vision
Online Access:	https://www.mdpi.com/1424-8220/22/14/5169

_version_	1827604334267858944
author	Van Nhiem Tran Shen-Hsuan Liu Yung-Hui Li Jia-Ching Wang
author_facet	Van Nhiem Tran Shen-Hsuan Liu Yung-Hui Li Jia-Ching Wang
author_sort	Van Nhiem Tran
collection	DOAJ
description	Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing <b>H</b>euristic <b>A</b>ttention <b>R</b>epresentation <b>L</b>earning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms <b>existing</b> self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP<sub>50</sub> of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.
first_indexed	2024-03-09T05:57:33Z
format	Article
id	doaj.art-bb733979520644e39e0579b66343bbaf
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T05:57:33Z
publishDate	2022-07-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-bb733979520644e39e0579b66343bbaf2023-12-03T12:12:21ZengMDPI AGSensors1424-82202022-07-012214516910.3390/s22145169Heuristic Attention Representation Learning for Self-Supervised PretrainingVan Nhiem Tran0Shen-Hsuan Liu1Yung-Hui Li2Jia-Ching Wang3Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 3200, TaiwanAI Research Center, Hon Hai Research Institute, Taipei 114699, TaiwanDepartment of Computer Science and Information Engineering, National Central University, Taoyuan 3200, TaiwanRecently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing <b>H</b>euristic <b>A</b>ttention <b>R</b>epresentation <b>L</b>earning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms <b>existing</b> self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP<sub>50</sub> of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.https://www.mdpi.com/1424-8220/22/14/5169heuristic attentionperceptual groupingself-supervised learningvisual representation learningdeep learningcomputer vision
spellingShingle	Van Nhiem Tran Shen-Hsuan Liu Yung-Hui Li Jia-Ching Wang Heuristic Attention Representation Learning for Self-Supervised Pretraining Sensors heuristic attention perceptual grouping self-supervised learning visual representation learning deep learning computer vision
title	Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_full	Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_fullStr	Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_full_unstemmed	Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_short	Heuristic Attention Representation Learning for Self-Supervised Pretraining
title_sort	heuristic attention representation learning for self supervised pretraining
topic	heuristic attention perceptual grouping self-supervised learning visual representation learning deep learning computer vision
url	https://www.mdpi.com/1424-8220/22/14/5169
work_keys_str_mv	AT vannhiemtran heuristicattentionrepresentationlearningforselfsupervisedpretraining AT shenhsuanliu heuristicattentionrepresentationlearningforselfsupervisedpretraining AT yunghuili heuristicattentionrepresentationlearningforselfsupervisedpretraining AT jiachingwang heuristicattentionrepresentationlearningforselfsupervisedpretraining

Heuristic Attention Representation Learning for Self-Supervised Pretraining

Similar Items