Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks

In this paper, we revisit the paired image-to-image translation using the conditional generative adversarial network, the so-called “Pix2Pix”, and propose efficient optimization techniques for the architecture and the training method to maximize the architecture’s performance to boost the realism of...

Full description

Bibliographic Details
Main Authors:	Hatem Ibrahem, Ahmed Salem, Hyun-Soo Kang
Format:	Article
Language:	English
Published:	MDPI AG 2022-10-01
Series:	Sensors
Subjects:	generative adversarial networks convolutional neural networks image-to-image translation semantic image synthesis
Online Access:	https://www.mdpi.com/1424-8220/22/21/8306

_version_	1797466549354758144
author	Hatem Ibrahem Ahmed Salem Hyun-Soo Kang
author_facet	Hatem Ibrahem Ahmed Salem Hyun-Soo Kang
author_sort	Hatem Ibrahem
collection	DOAJ
description	In this paper, we revisit the paired image-to-image translation using the conditional generative adversarial network, the so-called “Pix2Pix”, and propose efficient optimization techniques for the architecture and the training method to maximize the architecture’s performance to boost the realism of the generated images. We propose a generative adversarial network-based technique to create new artificial indoor scenes using a user-defined semantic segmentation map as an input to define the location, shape, and category of each object in the scene, exactly similar to Pix2Pix. We train different residual connections-based architectures of the generator and discriminator on the NYU depth-v2 dataset and a selected indoor subset from the ADE20K dataset, showing that the proposed models have fewer parameters, less computational complexity, and can generate better quality images than the state of the art methods following the same technique to generate realistic indoor images. We also prove that using extra specific labels and more training samples increases the quality of the generated images; however, the proposed residual connections-based models can learn better from small datasets (i.e., NYU depth-v2) and can improve the realism of the generated images in training on bigger datasets (i.e., ADE20K indoor subset) in comparison to Pix2Pix. The proposed method achieves an LPIPS value of 0.505 and an FID value of 81.067, generating better quality images than that produced by Pix2Pix and other recent paired Image-to-image translation methods and outperforming them in terms of LPIPS and FID.
first_indexed	2024-03-09T18:40:24Z
format	Article
id	doaj.art-bcaf7f2ded614fa597fbcf99039d0850
institution	Directory Open Access Journal
issn	1424-8220
language	English
last_indexed	2024-03-09T18:40:24Z
publishDate	2022-10-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj.art-bcaf7f2ded614fa597fbcf99039d08502023-11-24T06:46:02ZengMDPI AGSensors1424-82202022-10-012221830610.3390/s22218306Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial NetworksHatem Ibrahem0Ahmed Salem1Hyun-Soo Kang2Department of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, KoreaDepartment of Information and Communication Engineering, School of Electrical and Computer Engineering, Chungbuk National University, Cheongju-si 28644, KoreaIn this paper, we revisit the paired image-to-image translation using the conditional generative adversarial network, the so-called “Pix2Pix”, and propose efficient optimization techniques for the architecture and the training method to maximize the architecture’s performance to boost the realism of the generated images. We propose a generative adversarial network-based technique to create new artificial indoor scenes using a user-defined semantic segmentation map as an input to define the location, shape, and category of each object in the scene, exactly similar to Pix2Pix. We train different residual connections-based architectures of the generator and discriminator on the NYU depth-v2 dataset and a selected indoor subset from the ADE20K dataset, showing that the proposed models have fewer parameters, less computational complexity, and can generate better quality images than the state of the art methods following the same technique to generate realistic indoor images. We also prove that using extra specific labels and more training samples increases the quality of the generated images; however, the proposed residual connections-based models can learn better from small datasets (i.e., NYU depth-v2) and can improve the realism of the generated images in training on bigger datasets (i.e., ADE20K indoor subset) in comparison to Pix2Pix. The proposed method achieves an LPIPS value of 0.505 and an FID value of 81.067, generating better quality images than that produced by Pix2Pix and other recent paired Image-to-image translation methods and outperforming them in terms of LPIPS and FID.https://www.mdpi.com/1424-8220/22/21/8306generative adversarial networksconvolutional neural networksimage-to-image translationsemantic image synthesis
spellingShingle	Hatem Ibrahem Ahmed Salem Hyun-Soo Kang Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks Sensors generative adversarial networks convolutional neural networks image-to-image translation semantic image synthesis
title	Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks
title_full	Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks
title_fullStr	Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks
title_full_unstemmed	Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks
title_short	Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks
title_sort	exploration of semantic label decomposition and dataset size in semantic indoor scenes synthesis via optimized residual generative adversarial networks
topic	generative adversarial networks convolutional neural networks image-to-image translation semantic image synthesis
url	https://www.mdpi.com/1424-8220/22/21/8306
work_keys_str_mv	AT hatemibrahem explorationofsemanticlabeldecompositionanddatasetsizeinsemanticindoorscenessynthesisviaoptimizedresidualgenerativeadversarialnetworks AT ahmedsalem explorationofsemanticlabeldecompositionanddatasetsizeinsemanticindoorscenessynthesisviaoptimizedresidualgenerativeadversarialnetworks AT hyunsookang explorationofsemanticlabeldecompositionanddatasetsizeinsemanticindoorscenessynthesisviaoptimizedresidualgenerativeadversarialnetworks

Exploration of Semantic Label Decomposition and Dataset Size in Semantic Indoor Scenes Synthesis via Optimized Residual Generative Adversarial Networks

Similar Items