Unsupervised learning with diffusion models

In computer vision, a key goal is to obtain visual representations that faithfully capture the underlying structure and semantics of the data, encompassing object identities, positions, textures, and lighting conditions. However, existing methods for un-/self-supervised learning (SSL) are restricted...

ver descrição completa

Detalhes bibliográficos
Autor principal: Wang, Jiankun
Outros Autores: Weichen Liu
Formato: Thesis-Master by Research
Idioma:English
Publicado em: Nanyang Technological University 2023
Assuntos:
Acesso em linha:https://hdl.handle.net/10356/171953
_version_ 1826117014777757696
author Wang, Jiankun
author2 Weichen Liu
author_facet Weichen Liu
Wang, Jiankun
author_sort Wang, Jiankun
collection NTU
description In computer vision, a key goal is to obtain visual representations that faithfully capture the underlying structure and semantics of the data, encompassing object identities, positions, textures, and lighting conditions. However, existing methods for un-/self-supervised learning (SSL) are restricted to untangling basic augmentation attributes such as rotation and color modification, which constrains their capacity to efficiently modularize the underlying semantics. In the thesis, we propose DiffSiam, a novel SSL framework that incorporates a disentangled representation learning algorithm based on diffusion models. By introducing additional Gaussian noises during the diffusion forward process, DiffSiam collapses samples with similar attributes, intensifying the attribute loss. To compensate, we learn an expanding set of modular features over time, adhering to the reconstruction of the Diffusion Model. This training dynamics biases the learned features towards disentangling diverse semantics, from fine-grained to coarse-grained attributes. Experimental results demonstrate the superior performance of DiffSiam on various classification benchmarks and generative tasks, validating its effectiveness in generating disentangled representations.
first_indexed 2024-10-01T04:20:50Z
format Thesis-Master by Research
id ntu-10356/171953
institution Nanyang Technological University
language English
last_indexed 2024-10-01T04:20:50Z
publishDate 2023
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1719532023-12-01T01:52:37Z Unsupervised learning with diffusion models Wang, Jiankun Weichen Liu School of Computer Science and Engineering liu@ntu.edu.sg Engineering::Computer science and engineering In computer vision, a key goal is to obtain visual representations that faithfully capture the underlying structure and semantics of the data, encompassing object identities, positions, textures, and lighting conditions. However, existing methods for un-/self-supervised learning (SSL) are restricted to untangling basic augmentation attributes such as rotation and color modification, which constrains their capacity to efficiently modularize the underlying semantics. In the thesis, we propose DiffSiam, a novel SSL framework that incorporates a disentangled representation learning algorithm based on diffusion models. By introducing additional Gaussian noises during the diffusion forward process, DiffSiam collapses samples with similar attributes, intensifying the attribute loss. To compensate, we learn an expanding set of modular features over time, adhering to the reconstruction of the Diffusion Model. This training dynamics biases the learned features towards disentangling diverse semantics, from fine-grained to coarse-grained attributes. Experimental results demonstrate the superior performance of DiffSiam on various classification benchmarks and generative tasks, validating its effectiveness in generating disentangled representations. Master of Engineering 2023-11-17T04:16:22Z 2023-11-17T04:16:22Z 2023 Thesis-Master by Research Wang, J. (2023). Unsupervised learning with diffusion models. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171953 https://hdl.handle.net/10356/171953 10.32657/10356/171953 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Engineering::Computer science and engineering
Wang, Jiankun
Unsupervised learning with diffusion models
title Unsupervised learning with diffusion models
title_full Unsupervised learning with diffusion models
title_fullStr Unsupervised learning with diffusion models
title_full_unstemmed Unsupervised learning with diffusion models
title_short Unsupervised learning with diffusion models
title_sort unsupervised learning with diffusion models
topic Engineering::Computer science and engineering
url https://hdl.handle.net/10356/171953
work_keys_str_mv AT wangjiankun unsupervisedlearningwithdiffusionmodels