Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing

In unmanned systems, remote sensing is an approach that collects and analyzes data such as visual images, infrared thermal images, and LiDAR sensor data from a distance using a system that operates without human intervention. Recent advancements in deep learning enable the direct mapping of input im...

Full description

Bibliographic Details
Main Authors:	Yoojin Park, Yunsick Sung
Format:	Article
Language:	English
Published:	MDPI AG 2023-08-01
Series:	Remote Sensing
Subjects:	data augmentation deep learning image processing imitation learning Swin Transformer action classification
Online Access:	https://www.mdpi.com/2072-4292/15/17/4147

_version_	1797581886120263680
author	Yoojin Park Yunsick Sung
author_facet	Yoojin Park Yunsick Sung
author_sort	Yoojin Park
collection	DOAJ
description	In unmanned systems, remote sensing is an approach that collects and analyzes data such as visual images, infrared thermal images, and LiDAR sensor data from a distance using a system that operates without human intervention. Recent advancements in deep learning enable the direct mapping of input images in remote sensing to desired outputs, making it possible to learn through imitation learning and for unmanned systems to learn by collecting and analyzing those images. In the case of autonomous cars, raw high-dimensional data are collected using sensors, which are mapped to the values of steering and throttle through a deep learning network to train imitation learning. Therefore, by imitation learning, the unmanned systems observe expert demonstrations and learn expert policies, even in complex environments. However, in imitation learning, collecting and analyzing a large number of images from the game environment incurs time and costs. Training with a limited dataset leads to a lack of understanding of the environment. There are some augmentation approaches that have the limitation of increasing the dataset because of considering only the locations of objects visited and estimated. Therefore, it is required to consider the diverse kinds of the location of objects not visited to solve the limitation. This paper proposes an enhanced model to augment the number of training images comprising a Preprocessor, an enhanced Swin Transformer model, and an Action model. Using the original network structure of the Swin Transformer model for image augmentation in imitation learning is challenging. Therefore, the internal structure of the Swin Transformer model is enhanced, and the Preprocessor and Action model are combined to augment training images. The proposed method was verified through an experimental process by learning from expert demonstrations and augmented images, which reduced the total loss from 1.24068 to 0.41616. Compared to expert demonstrations, the accuracy was approximately 86.4%, and the proposed method achieved 920 points and 1200 points more than the comparison model to verify generalization.
first_indexed	2024-03-10T23:13:59Z
format	Article
id	doaj.art-00ee1b27f2bc43fab4206ff6b46bcade
institution	Directory Open Access Journal
issn	2072-4292
language	English
last_indexed	2024-03-10T23:13:59Z
publishDate	2023-08-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj.art-00ee1b27f2bc43fab4206ff6b46bcade2023-11-19T08:45:05ZengMDPI AGRemote Sensing2072-42922023-08-011517414710.3390/rs15174147Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote SensingYoojin Park0Yunsick Sung1Department of Autonomous Things Intelligence Graduate School, Dongguk University-Seoul, Seoul 04620, Republic of KoreaDivision of AI Software Convergence, Dongguk University-Seoul, Seoul 04620, Republic of KoreaIn unmanned systems, remote sensing is an approach that collects and analyzes data such as visual images, infrared thermal images, and LiDAR sensor data from a distance using a system that operates without human intervention. Recent advancements in deep learning enable the direct mapping of input images in remote sensing to desired outputs, making it possible to learn through imitation learning and for unmanned systems to learn by collecting and analyzing those images. In the case of autonomous cars, raw high-dimensional data are collected using sensors, which are mapped to the values of steering and throttle through a deep learning network to train imitation learning. Therefore, by imitation learning, the unmanned systems observe expert demonstrations and learn expert policies, even in complex environments. However, in imitation learning, collecting and analyzing a large number of images from the game environment incurs time and costs. Training with a limited dataset leads to a lack of understanding of the environment. There are some augmentation approaches that have the limitation of increasing the dataset because of considering only the locations of objects visited and estimated. Therefore, it is required to consider the diverse kinds of the location of objects not visited to solve the limitation. This paper proposes an enhanced model to augment the number of training images comprising a Preprocessor, an enhanced Swin Transformer model, and an Action model. Using the original network structure of the Swin Transformer model for image augmentation in imitation learning is challenging. Therefore, the internal structure of the Swin Transformer model is enhanced, and the Preprocessor and Action model are combined to augment training images. The proposed method was verified through an experimental process by learning from expert demonstrations and augmented images, which reduced the total loss from 1.24068 to 0.41616. Compared to expert demonstrations, the accuracy was approximately 86.4%, and the proposed method achieved 920 points and 1200 points more than the comparison model to verify generalization.https://www.mdpi.com/2072-4292/15/17/4147data augmentationdeep learningimage processingimitation learningSwin Transformeraction classification
spellingShingle	Yoojin Park Yunsick Sung Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing Remote Sensing data augmentation deep learning image processing imitation learning Swin Transformer action classification
title	Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing
title_full	Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing
title_fullStr	Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing
title_full_unstemmed	Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing
title_short	Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing
title_sort	imitation learning through image augmentation using enhanced swin transformer model in remote sensing
topic	data augmentation deep learning image processing imitation learning Swin Transformer action classification
url	https://www.mdpi.com/2072-4292/15/17/4147
work_keys_str_mv	AT yoojinpark imitationlearningthroughimageaugmentationusingenhancedswintransformermodelinremotesensing AT yunsicksung imitationlearningthroughimageaugmentationusingenhancedswintransformermodelinremotesensing

Imitation Learning through Image Augmentation Using Enhanced Swin Transformer Model in Remote Sensing

Similar Items