AI assisted visual communication through generative models

<p>Expressing ideas in our minds which are inevitably visual into words had been a necessity. Lack of tools to express oneself creatively using visual communication had been a bottleneck. With the advent of AI and generative models starting to be applied to graphics based tasks, this elusive e...

Ful tanımlama

Detaylı Bibliyografya
Yazar: Ghosh, A
Diğer Yazarlar: Torr, PHS
Materyal Türü: Tez
Dil:English
Baskı/Yayın Bilgisi: 2021
Konular:
_version_ 1826307861462908928
author Ghosh, A
author2 Torr, PHS
author_facet Torr, PHS
Ghosh, A
author_sort Ghosh, A
collection OXFORD
description <p>Expressing ideas in our minds which are inevitably visual into words had been a necessity. Lack of tools to express oneself creatively using visual communication had been a bottleneck. With the advent of AI and generative models starting to be applied to graphics based tasks, this elusive elysian fields seems to be a step closer. This thesis tries to tackle the challenge using the 3 pronged approach of GANs, VAEs and Continuous Normalizing Flows.</p> <p>We first propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image to image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (For example the dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task. We introduce a similarity based competing objective (MAD-GAN-Sim) which encourages different generators to generate diverse samples based on a user defined similarity metric. We show its performance on the image-to-image translation, and also show its effectiveness on the unsupervised feature representation task. <p>We then propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized image to the user. This enables a feedback loop, where the user can edit their sketch based on the network’s recommendations, visualizing both the completed shape and final rendered image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network.</p> <p>In Appendix C we propose a semi supervision based strategy for deep generative modelling for human body analysis. Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without specific training for such a task. Our proposed models, the Conditional-DGPose and the Semi-DGPose, have different characteristics. In the first, body pose labels are taken as conditioners, from a fully-supervised training set. In the second, our structured semi-supervised approach allows for pose estimation to be performed by the model itself and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint understanding and generation of people in images. It is not only capable of mapping images to interpretable latent representations but also able to map these representations back to the image space. We compare our models with relevant baselines, the ClothNet-Body and the Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M, ChictopiaPlus and DeepFashion benchmarks.</p> <p>Finally, Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.</p>
first_indexed 2024-03-07T07:10:59Z
format Thesis
id oxford-uuid:88d29ad7-e323-4240-8985-82467e2630b2
institution University of Oxford
language English
last_indexed 2024-03-07T07:10:59Z
publishDate 2021
record_format dspace
spelling oxford-uuid:88d29ad7-e323-4240-8985-82467e2630b22022-06-16T14:11:02ZAI assisted visual communication through generative modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:88d29ad7-e323-4240-8985-82467e2630b2Image processingHuman-machine systemsComputer visionEnglishHyrax Deposit2021Ghosh, ATorr, PHS<p>Expressing ideas in our minds which are inevitably visual into words had been a necessity. Lack of tools to express oneself creatively using visual communication had been a bottleneck. With the advent of AI and generative models starting to be applied to graphics based tasks, this elusive elysian fields seems to be a step closer. This thesis tries to tackle the challenge using the 3 pronged approach of GANs, VAEs and Continuous Normalizing Flows.</p> <p>We first propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image to image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (For example the dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task. We introduce a similarity based competing objective (MAD-GAN-Sim) which encourages different generators to generate diverse samples based on a user defined similarity metric. We show its performance on the image-to-image translation, and also show its effectiveness on the unsupervised feature representation task. <p>We then propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized image to the user. This enables a feedback loop, where the user can edit their sketch based on the network’s recommendations, visualizing both the completed shape and final rendered image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network.</p> <p>In Appendix C we propose a semi supervision based strategy for deep generative modelling for human body analysis. Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without specific training for such a task. Our proposed models, the Conditional-DGPose and the Semi-DGPose, have different characteristics. In the first, body pose labels are taken as conditioners, from a fully-supervised training set. In the second, our structured semi-supervised approach allows for pose estimation to be performed by the model itself and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint understanding and generation of people in images. It is not only capable of mapping images to interpretable latent representations but also able to map these representations back to the image space. We compare our models with relevant baselines, the ClothNet-Body and the Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M, ChictopiaPlus and DeepFashion benchmarks.</p> <p>Finally, Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.</p>
spellingShingle Image processing
Human-machine systems
Computer vision
Ghosh, A
AI assisted visual communication through generative models
title AI assisted visual communication through generative models
title_full AI assisted visual communication through generative models
title_fullStr AI assisted visual communication through generative models
title_full_unstemmed AI assisted visual communication through generative models
title_short AI assisted visual communication through generative models
title_sort ai assisted visual communication through generative models
topic Image processing
Human-machine systems
Computer vision
work_keys_str_mv AT ghosha aiassistedvisualcommunicationthroughgenerativemodels