AI assisted visual communication through generative models
<p>Expressing ideas in our minds which are inevitably visual into words had been a necessity. Lack of tools to express oneself creatively using visual communication had been a bottleneck. With the advent of AI and generative models starting to be applied to graphics based tasks, this elusive e...
Yazar: | |
---|---|
Diğer Yazarlar: | |
Materyal Türü: | Tez |
Dil: | English |
Baskı/Yayın Bilgisi: |
2021
|
Konular: |
_version_ | 1826307861462908928 |
---|---|
author | Ghosh, A |
author2 | Torr, PHS |
author_facet | Torr, PHS Ghosh, A |
author_sort | Ghosh, A |
collection | OXFORD |
description | <p>Expressing ideas in our minds which are inevitably visual into words had been a necessity.
Lack of tools to express oneself creatively using visual communication had been a bottleneck.
With the advent of AI and generative models starting to be applied to graphics based tasks,
this elusive elysian fields seems to be a step closer. This thesis tries to tackle the challenge
using the 3 pronged approach of GANs, VAEs and Continuous Normalizing Flows.</p>
<p>We first propose MAD-GAN, an intuitive generalization to the Generative Adversarial
Networks (GANs) and its conditional variants to address the well known problem of mode
collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple
generators and one discriminator. Second, to enforce that different generators capture diverse
high probability modes, the discriminator of MAD-GAN is designed such that along with
finding the real and fake samples, it is also required to identify the generator that generated
the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push
different generators towards different identifiable modes. We perform extensive experiments
on synthetic and real datasets and compare MAD-GAN with different variants of GAN.
We show high quality diverse sample generations for challenging tasks such as image to image translation and face generation. In addition, we also show that MAD-GAN is able to
disentangle different modalities when trained using highly challenging diverse-class dataset
(For example the dataset with images of forests, icebergs, and bedrooms). In the end, we
show its efficacy on the unsupervised feature representation task. We introduce a similarity
based competing objective (MAD-GAN-Sim) which encourages different generators to
generate diverse samples based on a user defined similarity metric. We show its performance
on the image-to-image translation, and also show its effectiveness on the unsupervised
feature representation task.
<p>We then propose an interactive GAN-based sketch-to-image translation method that
helps novice users create images of simple objects. As the user starts to draw a sketch of
a desired object type, the network interactively recommends plausible completions, and
shows a corresponding synthesized image to the user. This enables a feedback loop, where
the user can edit their sketch based on the network’s recommendations, visualizing both
the completed shape and final rendered image while they draw. In order to use a single
trained model across a wide array of object classes, we introduce a gating-based approach
for class conditioning, which allows us to generate distinct classes without feature mixing,
from a single generator network.</p>
<p>In Appendix C we propose a semi supervision based strategy for deep generative
modelling for human body analysis. Deep generative modelling for human body analysis
is an emerging problem with many interesting applications. However, the latent space
learned by such approaches is typically not interpretable, resulting in less flexibility. In this
work, we present deep generative models for human body analysis in which the body pose
and the visual appearance are disentangled. Such a disentanglement allows independent
manipulation of pose and appearance, and hence enables applications such as pose-transfer
without specific training for such a task. Our proposed models, the Conditional-DGPose
and the Semi-DGPose, have different characteristics. In the first, body pose labels are
taken as conditioners, from a fully-supervised training set. In the second, our structured
semi-supervised approach allows for pose estimation to be performed by the model itself
and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint
understanding and generation of people in images. It is not only capable of mapping images
to interpretable latent representations but also able to map these representations back to the
image space. We compare our models with relevant baselines, the ClothNet-Body and the
Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M,
ChictopiaPlus and DeepFashion benchmarks.</p>
<p>Finally, Training Neural Ordinary Differential Equations (ODEs) is often computationally
expensive. Indeed, computing the forward pass of such models involves solving an ODE
which can become arbitrarily complex during training. Recent works have shown that
regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a
new regularization technique: randomly sampling the end time of the ODE during training.
The proposed regularization is simple to implement, has negligible overhead and is effective
across a wide variety of tasks. Further, the technique is orthogonal to several other methods
proposed to regularize the dynamics of ODEs and as such can be used in conjunction with
them. We show through experiments on normalizing flows, time series models and image
recognition that the proposed regularization can significantly decrease training time and
even improve performance over baseline models.</p> |
first_indexed | 2024-03-07T07:10:59Z |
format | Thesis |
id | oxford-uuid:88d29ad7-e323-4240-8985-82467e2630b2 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:10:59Z |
publishDate | 2021 |
record_format | dspace |
spelling | oxford-uuid:88d29ad7-e323-4240-8985-82467e2630b22022-06-16T14:11:02ZAI assisted visual communication through generative modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:88d29ad7-e323-4240-8985-82467e2630b2Image processingHuman-machine systemsComputer visionEnglishHyrax Deposit2021Ghosh, ATorr, PHS<p>Expressing ideas in our minds which are inevitably visual into words had been a necessity. Lack of tools to express oneself creatively using visual communication had been a bottleneck. With the advent of AI and generative models starting to be applied to graphics based tasks, this elusive elysian fields seems to be a step closer. This thesis tries to tackle the challenge using the 3 pronged approach of GANs, VAEs and Continuous Normalizing Flows.</p> <p>We first propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image to image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (For example the dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task. We introduce a similarity based competing objective (MAD-GAN-Sim) which encourages different generators to generate diverse samples based on a user defined similarity metric. We show its performance on the image-to-image translation, and also show its effectiveness on the unsupervised feature representation task. <p>We then propose an interactive GAN-based sketch-to-image translation method that helps novice users create images of simple objects. As the user starts to draw a sketch of a desired object type, the network interactively recommends plausible completions, and shows a corresponding synthesized image to the user. This enables a feedback loop, where the user can edit their sketch based on the network’s recommendations, visualizing both the completed shape and final rendered image while they draw. In order to use a single trained model across a wide array of object classes, we introduce a gating-based approach for class conditioning, which allows us to generate distinct classes without feature mixing, from a single generator network.</p> <p>In Appendix C we propose a semi supervision based strategy for deep generative modelling for human body analysis. Deep generative modelling for human body analysis is an emerging problem with many interesting applications. However, the latent space learned by such approaches is typically not interpretable, resulting in less flexibility. In this work, we present deep generative models for human body analysis in which the body pose and the visual appearance are disentangled. Such a disentanglement allows independent manipulation of pose and appearance, and hence enables applications such as pose-transfer without specific training for such a task. Our proposed models, the Conditional-DGPose and the Semi-DGPose, have different characteristics. In the first, body pose labels are taken as conditioners, from a fully-supervised training set. In the second, our structured semi-supervised approach allows for pose estimation to be performed by the model itself and relaxes the need for labelled data. Therefore, the Semi-DGPose aims for the joint understanding and generation of people in images. It is not only capable of mapping images to interpretable latent representations but also able to map these representations back to the image space. We compare our models with relevant baselines, the ClothNet-Body and the Pose Guided Person Generation networks, demonstrating their merits on the Human3.6M, ChictopiaPlus and DeepFashion benchmarks.</p> <p>Finally, Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.</p> |
spellingShingle | Image processing Human-machine systems Computer vision Ghosh, A AI assisted visual communication through generative models |
title | AI assisted visual communication through generative models |
title_full | AI assisted visual communication through generative models |
title_fullStr | AI assisted visual communication through generative models |
title_full_unstemmed | AI assisted visual communication through generative models |
title_short | AI assisted visual communication through generative models |
title_sort | ai assisted visual communication through generative models |
topic | Image processing Human-machine systems Computer vision |
work_keys_str_mv | AT ghosha aiassistedvisualcommunicationthroughgenerativemodels |