HQ‐I2IT: Redesign the optimization scheme to improve image quality in CycleGAN‐based image translation systems

Abstract The image‐to‐image translation (I2IT) task aims to transform images from the source domain into the specified target domain. State‐of‐the‐art CycleGAN‐based translation algorithms typically use cycle consistency loss and latent regression loss to constrain translation. In this work, it is d...

Full description

Bibliographic Details
Main Authors: Yipeng Zhang, Bingliang Hu, Yingying Huang, Chi Gao, Jianfu Yin, Quang Wang
Format: Article
Language:English
Published: Wiley 2024-02-01
Series:IET Image Processing
Subjects:
Online Access:https://doi.org/10.1049/ipr2.12965
Description
Summary:Abstract The image‐to‐image translation (I2IT) task aims to transform images from the source domain into the specified target domain. State‐of‐the‐art CycleGAN‐based translation algorithms typically use cycle consistency loss and latent regression loss to constrain translation. In this work, it is demonstrated that the model parameters constrained by the cycle consistency loss and the latent regression loss are equivalent to optimizing the medians of the data distribution and the generative distribution. In addition, there is a style bias in the translation. This bias interacts between the generator and the style encoder and visually exhibits translation errors, e.g. the style of the generated image is not equal to the style of the reference image. To address these issues, a new I2IT model termed high‐quality‐I2IT (HQ‐I2IT) is proposed. The optimization scheme is redesigned to prevent the model from optimizing the median of the data distribution. In addition, by separating the optimization of the generator and the latent code estimator, the redesigned model avoids error interactions and gradually corrects errors during training, thereby avoiding learning the median of the generated distribution. The experimental results demonstrate that the visual quality of the images produced by HQ‐I2IT is significantly improved without changing the generator structure, especially when guided by the reference images. Specifically, the Fréchet inception distance on the AFHQ and CelebA‐HQ datasets are reduced from 19.8 to 10.2 and from 23.8 to 17.0, respectively.
ISSN:1751-9659
1751-9667