Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection

How to sample training/validation data is an important question for machine learning models, especially when the dataset is heterogeneous and skewed. In this paper, we propose a data sampling method that robustly selects training/validation data. We formulate the training/validation data sampling pr...

Full description

Bibliographic Details
Main Authors:	Zhaobin Mo, Xuan Di, Rongye Shi
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Games
Subjects:	two-player game Monte Carlo tree search reinforcement learning car-following modeling
Online Access:	https://www.mdpi.com/2073-4336/14/1/13

_version_	1827757342916083712
author	Zhaobin Mo Xuan Di Rongye Shi
author_facet	Zhaobin Mo Xuan Di Rongye Shi
author_sort	Zhaobin Mo
collection	DOAJ
description	How to sample training/validation data is an important question for machine learning models, especially when the dataset is heterogeneous and skewed. In this paper, we propose a data sampling method that robustly selects training/validation data. We formulate the training/validation data sampling process as a two-player game: a trainer aims to sample training data so as to minimize the test error, while a validator adversarially samples validation data that can increase the test error. Robust sampling is achieved at the game equilibrium. To accelerate the searching process, we adopt reinforcement learning aided Monte Carlo trees search (MCTS). We apply our method to a car-following modeling problem, a complicated scenario with heterogeneous and random human driving behavior. Real-world data, the Next Generation SIMulation (NGSIM), is used to validate this method, and experiment results demonstrate the sampling robustness and thereby the model out-of-sample performance.
first_indexed	2024-03-11T08:48:22Z
format	Article
id	doaj.art-ed72d7f32ba54d9cb1642353b1b0af4b
institution	Directory Open Access Journal
issn	2073-4336
language	English
last_indexed	2024-03-11T08:48:22Z
publishDate	2023-01-01
publisher	MDPI AG
record_format	Article
series	Games
spelling	doaj.art-ed72d7f32ba54d9cb1642353b1b0af4b2023-11-16T20:38:44ZengMDPI AGGames2073-43362023-01-011411310.3390/g14010013Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data SelectionZhaobin Mo0Xuan Di1Rongye Shi2Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USADepartment of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USADepartment of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USAHow to sample training/validation data is an important question for machine learning models, especially when the dataset is heterogeneous and skewed. In this paper, we propose a data sampling method that robustly selects training/validation data. We formulate the training/validation data sampling process as a two-player game: a trainer aims to sample training data so as to minimize the test error, while a validator adversarially samples validation data that can increase the test error. Robust sampling is achieved at the game equilibrium. To accelerate the searching process, we adopt reinforcement learning aided Monte Carlo trees search (MCTS). We apply our method to a car-following modeling problem, a complicated scenario with heterogeneous and random human driving behavior. Real-world data, the Next Generation SIMulation (NGSIM), is used to validate this method, and experiment results demonstrate the sampling robustness and thereby the model out-of-sample performance.https://www.mdpi.com/2073-4336/14/1/13two-player gameMonte Carlo tree searchreinforcement learningcar-following modeling
spellingShingle	Zhaobin Mo Xuan Di Rongye Shi Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection Games two-player game Monte Carlo tree search reinforcement learning car-following modeling
title	Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_full	Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_fullStr	Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_full_unstemmed	Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_short	Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection
title_sort	robust data sampling in machine learning a game theoretic framework for training and validation data selection
topic	two-player game Monte Carlo tree search reinforcement learning car-following modeling
url	https://www.mdpi.com/2073-4336/14/1/13
work_keys_str_mv	AT zhaobinmo robustdatasamplinginmachinelearningagametheoreticframeworkfortrainingandvalidationdataselection AT xuandi robustdatasamplinginmachinelearningagametheoreticframeworkfortrainingandvalidationdataselection AT rongyeshi robustdatasamplinginmachinelearningagametheoreticframeworkfortrainingandvalidationdataselection

Robust Data Sampling in Machine Learning: A Game-Theoretic Framework for Training and Validation Data Selection

Similar Items