GACnet-Text-to-Image Synthesis With Generative Models Using Attention Mechanisms With Contrastive Learning

The generation of high-quality images from textual descriptions is a challenging task in computer vision and natural language processing. The goal of text-to-image synthesis, a current topic of research, is to produce excellent images from written descriptions. This study proposes a hybrid approach...

Full description

Bibliographic Details
Main Authors: Md. Ahsan Habib, Md. Anwar Hussen Wadud, Lubna Yeasmin Pinky, Mehedi Hasan Talukder, Mohammad Motiur Rahman, M. F. Mridha, Yuichi Okuyama, Jungpil Shin
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10360129/
Description
Summary:The generation of high-quality images from textual descriptions is a challenging task in computer vision and natural language processing. The goal of text-to-image synthesis, a current topic of research, is to produce excellent images from written descriptions. This study proposes a hybrid approach to evaluating a dataset consisting of various text-image pairs by efficiently combining conditional generative adversarial networks (C-GAN), attention mechanisms, and contrastive learning (C-GAN+ATT+CL). We suggest a two-step method to improve image quality that starts by utilizing generative adversarial networks (GANs) with attention mechanisms to create low-resolution images and then contrastive learning to improve. Contrastive learning modules train on a separate dataset of high-resolution pictures; GANs learn on datasets of low-resolution text and image pairs. The Conditional GAN with Attention Mechanism and Contrastive Learning Method provides state-of-the-art performance in terms of image quality, diversity, and visual realism, among the several methods. The results of this study demonstrate that the proposed approach works better than all other methods, achieving an Inception Score (IS) of 35.23, a Fréchet Inception Distance (FID) of 18.2, and an R-Precision of 89.14. Our findings demonstrate that our “C-GAN+ATT+CL” approach significantly improves image quality and diversity and offers exciting paths for further study.
ISSN:2169-3536