Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System

The process of modelling individual player performance using machine learning is a mature task in sports analytics. The most significant challenges in machine learning include class imbalance and high dimensionality problems. We conducted a comprehensive literature review and observed that both the...

Full description

Bibliographic Details
Main Authors: Mustafa A. Al-Asadi, Sakir Tasdemir
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9598823/
_version_ 1818856625582112768
author Mustafa A. Al-Asadi
Sakir Tasdemir
author_facet Mustafa A. Al-Asadi
Sakir Tasdemir
author_sort Mustafa A. Al-Asadi
collection DOAJ
description The process of modelling individual player performance using machine learning is a mature task in sports analytics. The most significant challenges in machine learning include class imbalance and high dimensionality problems. We conducted a comprehensive literature review and observed that both the issues have been studied independently. We found that feature selection addresses the dimensionality reduction problem by determining a subset of relevant features, while data sampling seeks to make the data more balanced by adding or removing instances. We also found out that efforts have been taken for studying the effect of the joint use of feature selection and balancing techniques. However, the prioritization of the feature selection and sampling is still difficult, and the relationship between them remains unclear. This paper presents a large-scale comparison of characterizing football players into nine positions by using FIFA video game data, whereas most of the previous studies in this field have focused on characterizing players into only three classes according to their positions. The proposed methodology for the study consists of three main steps. In the first step, the sampling technique is applied to deal with class imbalance, while the second step encompasses the feature selection technique, which deals with the high dimensionality problem. The third step combines feature selection and data sampling to deal with both the issues. We made the comparisons based on nine feature selection algorithms and three balancing techniques, and then we evaluated their performance using the random forest classifier. We found that 1) feature selection techniques did not improve the accuracy of the baseline model, 2) balancing techniques improved the accuracy compared to the baseline, and 3) the results showed superiority of the proposed methodology, involving the joint application of resampling and feature selection with data balanced by the random oversampling (ROS) method and synthetic minority oversampling technique (SMOTE), compared to the results obtained only through the use of a single technique and from the original imbalanced training set. Overall, the proposed methodology improved prediction accuracy compared to the baseline model. Moreover, the methodology provided a significant decrease in the number of features, from 29 to 10 features on average.
first_indexed 2024-12-19T08:27:29Z
format Article
id doaj.art-f2064bc0164941bd959f378dfaff65c1
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-12-19T08:27:29Z
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f2064bc0164941bd959f378dfaff65c12022-12-21T20:29:16ZengIEEEIEEE Access2169-35362021-01-01914926614928610.1109/ACCESS.2021.31249319598823Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game SystemMustafa A. Al-Asadi0https://orcid.org/0000-0002-8218-3458Sakir Tasdemir1Department of Computer Engineering, Faculty of Technology, Selçuk University, Konya, TurkeyDepartment of Computer Engineering, Faculty of Technology, Selçuk University, Konya, TurkeyThe process of modelling individual player performance using machine learning is a mature task in sports analytics. The most significant challenges in machine learning include class imbalance and high dimensionality problems. We conducted a comprehensive literature review and observed that both the issues have been studied independently. We found that feature selection addresses the dimensionality reduction problem by determining a subset of relevant features, while data sampling seeks to make the data more balanced by adding or removing instances. We also found out that efforts have been taken for studying the effect of the joint use of feature selection and balancing techniques. However, the prioritization of the feature selection and sampling is still difficult, and the relationship between them remains unclear. This paper presents a large-scale comparison of characterizing football players into nine positions by using FIFA video game data, whereas most of the previous studies in this field have focused on characterizing players into only three classes according to their positions. The proposed methodology for the study consists of three main steps. In the first step, the sampling technique is applied to deal with class imbalance, while the second step encompasses the feature selection technique, which deals with the high dimensionality problem. The third step combines feature selection and data sampling to deal with both the issues. We made the comparisons based on nine feature selection algorithms and three balancing techniques, and then we evaluated their performance using the random forest classifier. We found that 1) feature selection techniques did not improve the accuracy of the baseline model, 2) balancing techniques improved the accuracy compared to the baseline, and 3) the results showed superiority of the proposed methodology, involving the joint application of resampling and feature selection with data balanced by the random oversampling (ROS) method and synthetic minority oversampling technique (SMOTE), compared to the results obtained only through the use of a single technique and from the original imbalanced training set. Overall, the proposed methodology improved prediction accuracy compared to the baseline model. Moreover, the methodology provided a significant decrease in the number of features, from 29 to 10 features on average.https://ieeexplore.ieee.org/document/9598823/Class imbalancedata miningdata samplingfeature selectionFIFA video gameplayer characterizing
spellingShingle Mustafa A. Al-Asadi
Sakir Tasdemir
Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System
IEEE Access
Class imbalance
data mining
data sampling
feature selection
FIFA video game
player characterizing
title Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System
title_full Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System
title_fullStr Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System
title_full_unstemmed Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System
title_short Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System
title_sort empirical comparisons for combining balancing and feature selection strategies for characterizing football players using fifa video game system
topic Class imbalance
data mining
data sampling
feature selection
FIFA video game
player characterizing
url https://ieeexplore.ieee.org/document/9598823/
work_keys_str_mv AT mustafaaalasadi empiricalcomparisonsforcombiningbalancingandfeatureselectionstrategiesforcharacterizingfootballplayersusingfifavideogamesystem
AT sakirtasdemir empiricalcomparisonsforcombiningbalancingandfeatureselectionstrategiesforcharacterizingfootballplayersusingfifavideogamesystem