Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation

Pose Guided Person Image Generation (PGPIG) is a task that involves generating an image of a person in a target pose, given an image in a source pose, source pose information and target pose information. Many of the existing PGPIG techniques need extra pose-related data or tasks, which cause the lim...

全面介绍

书目详细资料
Main Authors:	Kei Shibasaki, Masaaki Ikehara
格式:	文件
语言:	English
出版:	IEEE 2023-01-01
丛编:	IEEE Access
主题:	Computer vision machine learning deep learning transformer pose guided person image generation
在线阅读:	https://ieeexplore.ieee.org/document/10373854/

_version_	1827394129116528640
author	Kei Shibasaki Masaaki Ikehara
author_facet	Kei Shibasaki Masaaki Ikehara
author_sort	Kei Shibasaki
collection	DOAJ
description	Pose Guided Person Image Generation (PGPIG) is a task that involves generating an image of a person in a target pose, given an image in a source pose, source pose information and target pose information. Many of the existing PGPIG techniques need extra pose-related data or tasks, which cause the limitations on their applicability. In addition CNNs are used as the feature extractor which do not have long-range dependency. However, CNNs can only extract features from neighboring pixels and cannot consider image consistency. This paper introduces a PGPIG network that solves these challenges by incorporating modules that use Axial Transformers with wide receptive fields. The proposed approach disentangles the PGPIG task into two subtasks: “rough pose transformation” and “detailed texture generation.” In “rough pose transformation,” lower-resolution feature maps is processed by Axial Transformer-based blocks. These blocks employ an Encoder-Decoder structure, which allows the network to use the pose information well and improves the stability and performance of the training. The latter subtask employs a CNN network with Adaptive Instance Normalization. Experimental results show the competitive performance of the proposed method compared to existing methods. The proposed method achieves lowest LPIPS in Deep Fashion dataset and FID in Market-1501 dataset. Remarkably, despite the great results obtained, the number of parameters of the proposed network is significantly less in contrast to existing methods.
first_indexed	2024-03-08T18:03:43Z
format	Article
id	doaj.art-cbb2e04fef4a4d9ba832e1569c25d013
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T18:03:43Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-cbb2e04fef4a4d9ba832e1569c25d0132024-01-02T00:02:08ZengIEEEIEEE Access2169-35362023-01-011114605414606410.1109/ACCESS.2023.334694010373854Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image GenerationKei Shibasaki0https://orcid.org/0000-0002-7748-7246Masaaki Ikehara1https://orcid.org/0000-0003-3461-1507Department of Electrical and Information Engineering, Faculty of Science and Technology, Keio University, Yokohama, Kanagawa, JapanDepartment of Electrical and Information Engineering, Faculty of Science and Technology, Keio University, Yokohama, Kanagawa, JapanPose Guided Person Image Generation (PGPIG) is a task that involves generating an image of a person in a target pose, given an image in a source pose, source pose information and target pose information. Many of the existing PGPIG techniques need extra pose-related data or tasks, which cause the limitations on their applicability. In addition CNNs are used as the feature extractor which do not have long-range dependency. However, CNNs can only extract features from neighboring pixels and cannot consider image consistency. This paper introduces a PGPIG network that solves these challenges by incorporating modules that use Axial Transformers with wide receptive fields. The proposed approach disentangles the PGPIG task into two subtasks: “rough pose transformation” and “detailed texture generation.” In “rough pose transformation,” lower-resolution feature maps is processed by Axial Transformer-based blocks. These blocks employ an Encoder-Decoder structure, which allows the network to use the pose information well and improves the stability and performance of the training. The latter subtask employs a CNN network with Adaptive Instance Normalization. Experimental results show the competitive performance of the proposed method compared to existing methods. The proposed method achieves lowest LPIPS in Deep Fashion dataset and FID in Market-1501 dataset. Remarkably, despite the great results obtained, the number of parameters of the proposed network is significantly less in contrast to existing methods.https://ieeexplore.ieee.org/document/10373854/Computer visionmachine learningdeep learningtransformerpose guided person image generation
spellingShingle	Kei Shibasaki Masaaki Ikehara Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation IEEE Access Computer vision machine learning deep learning transformer pose guided person image generation
title	Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
title_full	Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
title_fullStr	Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
title_full_unstemmed	Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
title_short	Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation
title_sort	pose aware disentangled multiscale transformer for pose guided person image generation
topic	Computer vision machine learning deep learning transformer pose guided person image generation
url	https://ieeexplore.ieee.org/document/10373854/
work_keys_str_mv	AT keishibasaki poseawaredisentangledmultiscaletransformerforposeguidedpersonimagegeneration AT masaakiikehara poseawaredisentangledmultiscaletransformerforposeguidedpersonimagegeneration

Pose-Aware Disentangled Multiscale Transformer for Pose Guided Person Image Generation

相似书籍