Exploring the application of contrastive learning on two-sample hypothesis test and tabular data

The purpose of this study is to investigate the application of contrastive learning in two-sample tests for image data and feature enhancement for tabular data. This research is motivated by the potential of contrastive learning to improve the performance and accuracy of statistical tests and data c...

Full description

Bibliographic Details
Main Author: Wan, Bingbing
Other Authors: Lihui Chen
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/179367
Description
Summary:The purpose of this study is to investigate the application of contrastive learning in two-sample tests for image data and feature enhancement for tabular data. This research is motivated by the potential of contrastive learning to improve the performance and accuracy of statistical tests and data classification. The main research problem addressed in this thesis is whether contrastive learning can enhance the performance of two-sample tests for images and improve feature quality for tabular data. To address this problem, we first verified the strong test power of pairing contrastive learning with the Maximum Mean Discrepancy (MMD) [1] two-sample test method. We then introduced a novel method called the contrastive two-sample test. Additionally, we enhanced the features for tabular data using contrastive learning techniques. The experiments and comparisons were conducted on various datasets to evaluate the effectiveness of these approaches. The results of our experiments demonstrated that the contrastive learning approach significantly improved the performance of two-sample tests on images and slightly improved classification accuracies on tabular data. Specifically, the accuracy of image-based tests increased, indicating a more robust method for statistical testing in visual contexts. For tabular data, the enhancements led to more refined features that marginally boosted classification performance, showcasing the versatility of contrastive learning. These findings suggest that contrastive learning can be a valuable tool for improving the reliability of two-sample tests on image data and enhancing features on tabular data. This dual applicability highlights its potential in a variety of data types, making it a promising area for further research. Future research could explore its application to other types of data such as text and voice, potentially broadening the scope and impact of contrastive learning methodologies.