Towards advanced machine generalization via data interplay
Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/180357 |
_version_ | 1826116620805734400 |
---|---|
author | Wang, Tan |
author2 | Hanwang Zhang |
author_facet | Hanwang Zhang Wang, Tan |
author_sort | Wang, Tan |
collection | NTU |
description | Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world applications, leading to substantial performance drops. This issue is known as the Out-of-Distribution (OOD) generalization problem, characterized by the test data differing significantly from the training data in unforeseen ways. Additionally, suboptimal design choices at any stage of developing a deep learning model can further impair its generalization capability across a broad spectrum of tasks. In this thesis, we study the OOD generalization problem in the context of four specific application areas, ranging from conventional computer vision tasks to the emerging field of diffusion generative models. To tackle the issue of robustness, we introduce the key innovation termed “data interplay”, aimed at more efficient
utilization of training data. Specifically, we categorize data interplay into three distinct types, focusing on different levels of data interaction: interplay between data points, data groups, and data modalities. |
first_indexed | 2025-03-09T11:41:59Z |
format | Thesis-Doctor of Philosophy |
id | ntu-10356/180357 |
institution | Nanyang Technological University |
language | English |
last_indexed | 2025-03-09T11:41:59Z |
publishDate | 2024 |
publisher | Nanyang Technological University |
record_format | dspace |
spelling | ntu-10356/1803572024-11-01T08:23:04Z Towards advanced machine generalization via data interplay Wang, Tan Hanwang Zhang College of Computing and Data Science hanwangzhang@ntu.edu.sg Computer and Information Science Computer vision Machine learning Traditional machine learning methodologies presuppose that training and testing datasets are Independent and Identically Distributed (IID), i.e., assuming the samples of both training and testing are drawn from a consistent distribution. However, this IID assumption often fails to hold in real-world applications, leading to substantial performance drops. This issue is known as the Out-of-Distribution (OOD) generalization problem, characterized by the test data differing significantly from the training data in unforeseen ways. Additionally, suboptimal design choices at any stage of developing a deep learning model can further impair its generalization capability across a broad spectrum of tasks. In this thesis, we study the OOD generalization problem in the context of four specific application areas, ranging from conventional computer vision tasks to the emerging field of diffusion generative models. To tackle the issue of robustness, we introduce the key innovation termed “data interplay”, aimed at more efficient utilization of training data. Specifically, we categorize data interplay into three distinct types, focusing on different levels of data interaction: interplay between data points, data groups, and data modalities. Doctor of Philosophy 2024-10-03T05:12:53Z 2024-10-03T05:12:53Z 2024 Thesis-Doctor of Philosophy Wang, T. (2024). Towards advanced machine generalization via data interplay. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/180357 https://hdl.handle.net/10356/180357 10.32657/10356/180357 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
spellingShingle | Computer and Information Science Computer vision Machine learning Wang, Tan Towards advanced machine generalization via data interplay |
title | Towards advanced machine generalization via data interplay |
title_full | Towards advanced machine generalization via data interplay |
title_fullStr | Towards advanced machine generalization via data interplay |
title_full_unstemmed | Towards advanced machine generalization via data interplay |
title_short | Towards advanced machine generalization via data interplay |
title_sort | towards advanced machine generalization via data interplay |
topic | Computer and Information Science Computer vision Machine learning |
url | https://hdl.handle.net/10356/180357 |
work_keys_str_mv | AT wangtan towardsadvancedmachinegeneralizationviadatainterplay |