Towards advanced machine generalization via data interplay

Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world...

Full description

Bibliographic Details
Main Author: Wang, Tan
Other Authors: Hanwang Zhang
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180357
_version_ 1826116620805734400
author Wang, Tan
author2 Hanwang Zhang
author_facet Hanwang Zhang
Wang, Tan
author_sort Wang, Tan
collection NTU
description Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world applications, leading to substantial performance drops. This issue is known as the Out-of-Distribution (OOD) generalization problem, characterized by the test data differing significantly from the training data in unforeseen ways. Additionally, suboptimal design choices at any stage of developing a deep learning model can further impair its generalization capability across a broad spectrum of tasks. In this thesis, we study the OOD generalization problem in the context of four specific application areas, ranging from conventional computer vision tasks to the emerging field of diffusion generative models. To tackle the issue of robustness, we introduce the key innovation termed “data interplay”, aimed at more efficient utilization of training data. Specifically, we categorize data interplay into three distinct types, focusing on different levels of data interaction: interplay between data points, data groups, and data modalities.
first_indexed 2025-03-09T11:41:59Z
format Thesis-Doctor of Philosophy
id ntu-10356/180357
institution Nanyang Technological University
language English
last_indexed 2025-03-09T11:41:59Z
publishDate 2024
publisher Nanyang Technological University
record_format dspace
spelling ntu-10356/1803572024-11-01T08:23:04Z Towards advanced machine generalization via data interplay Wang, Tan Hanwang Zhang College of Computing and Data Science hanwangzhang@ntu.edu.sg Computer and Information Science Computer vision Machine learning Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world applications, leading to substantial performance drops. This issue is known as the Out-of-Distribution (OOD) generalization problem, characterized by the test data differing significantly from the training data in unforeseen ways. Additionally, suboptimal design choices at any stage of developing a deep learning model can further impair its generalization capability across a broad spectrum of tasks. In this thesis, we study the OOD generalization problem in the context of four specific application areas, ranging from conventional computer vision tasks to the emerging field of diffusion generative models. To tackle the issue of robustness, we introduce the key innovation termed “data interplay”, aimed at more efficient utilization of training data. Specifically, we categorize data interplay into three distinct types, focusing on different levels of data interaction: interplay between data points, data groups, and data modalities. Doctor of Philosophy 2024-10-03T05:12:53Z 2024-10-03T05:12:53Z 2024 Thesis-Doctor of Philosophy Wang, T. (2024). Towards advanced machine generalization via data interplay. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/180357 https://hdl.handle.net/10356/180357 10.32657/10356/180357 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
spellingShingle Computer and Information Science
Computer vision
Machine learning
Wang, Tan
Towards advanced machine generalization via data interplay
title Towards advanced machine generalization via data interplay
title_full Towards advanced machine generalization via data interplay
title_fullStr Towards advanced machine generalization via data interplay
title_full_unstemmed Towards advanced machine generalization via data interplay
title_short Towards advanced machine generalization via data interplay
title_sort towards advanced machine generalization via data interplay
topic Computer and Information Science
Computer vision
Machine learning
url https://hdl.handle.net/10356/180357
work_keys_str_mv AT wangtan towardsadvancedmachinegeneralizationviadatainterplay