Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection

How to sample training/validation data is an important question for machine learning models, especially when the dataset is heterogeneous and skewed. In this paper, we propose a data sampling method that robustly selects training/validation data. We formulate the training/validation data sampling pr...

Full description

Bibliographic Details
Main Authors: Zhaobin Mo, Xuan Di, Rongye Shi
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Games
Subjects:
Online Access:https://www.mdpi.com/2073-4336/14/1/13
_version_ 1827757342916083712
author Zhaobin Mo
Xuan Di
Rongye Shi
author_facet Zhaobin Mo
Xuan Di
Rongye Shi
author_sort Zhaobin Mo
collection DOAJ
description How to sample training/validation data is an important question for machine learning models, especially when the dataset is heterogeneous and skewed. In this paper, we propose a data sampling method that robustly selects training/validation data. We formulate the training/validation data sampling process as a two-player game: a trainer aims to sample training data so as to minimize the test error, while a validator adversarially samples validation data that can increase the test error. Robust sampling is achieved at the game equilibrium. To accelerate the searching process, we adopt reinforcement learning aided Monte Carlo trees search (MCTS). We apply our method to a car-following modeling problem, a complicated scenario with heterogeneous and random human driving behavior. Real-world data, the Next Generation SIMulation (NGSIM), is used to validate this method, and experiment results demonstrate the sampling robustness and thereby the model out-of-sample performance.
first_indexed 2024-03-11T08:48:22Z
format Article
id doaj.art-ed72d7f32ba54d9cb1642353b1b0af4b
institution Directory Open Access Journal
issn 2073-4336
language English
last_indexed 2024-03-11T08:48:22Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Games
spelling doaj.art-ed72d7f32ba54d9cb1642353b1b0af4b2023-11-16T20:38:44ZengMDPI AGGames2073-43362023-01-011411310.3390/g14010013Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data SelectionZhaobin Mo0Xuan Di1Rongye Shi2Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USADepartment of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USADepartment of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USAHow to sample training/validation data is an important question for machine learning models, especially when the dataset is heterogeneous and skewed. In this paper, we propose a data sampling method that robustly selects training/validation data. We formulate the training/validation data sampling process as a two-player game: a trainer aims to sample training data so as to minimize the test error, while a validator adversarially samples validation data that can increase the test error. Robust sampling is achieved at the game equilibrium. To accelerate the searching process, we adopt reinforcement learning aided Monte Carlo trees search (MCTS). We apply our method to a car-following modeling problem, a complicated scenario with heterogeneous and random human driving behavior. Real-world data, the Next Generation SIMulation (NGSIM), is used to validate this method, and experiment results demonstrate the sampling robustness and thereby the model out-of-sample performance.https://www.mdpi.com/2073-4336/14/1/13two-player gameMonte Carlo tree searchreinforcement learningcar-following modeling
spellingShingle Zhaobin Mo
Xuan Di
Rongye Shi
Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
Games
two-player game
Monte Carlo tree search
reinforcement learning
car-following modeling
title Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_full Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_fullStr Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_full_unstemmed Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_short Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_sort robust data sampling in machine learning a game theoretic framework for training and validation data selection
topic two-player game
Monte Carlo tree search
reinforcement learning
car-following modeling
url https://www.mdpi.com/2073-4336/14/1/13
work_keys_str_mv AT zhaobinmo robustdatasamplinginmachinelearningagametheoreticframeworkfortrainingandvalidationdataselection
AT xuandi robustdatasamplinginmachinelearningagametheoreticframeworkfortrainingandvalidationdataselection
AT rongyeshi robustdatasamplinginmachinelearningagametheoreticframeworkfortrainingandvalidationdataselection