Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy
Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this p...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-11-01
|
Series: | Frontiers in Neuroinformatics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fninf.2021.715421/full |
_version_ | 1818344209403346944 |
---|---|
author | Giulia Varotto Giulia Varotto Gianluca Susi Gianluca Susi Laura Tassi Francesca Gozzo Silvana Franceschetti Ferruccio Panzica |
author_facet | Giulia Varotto Giulia Varotto Gianluca Susi Gianluca Susi Laura Tassi Francesca Gozzo Silvana Franceschetti Ferruccio Panzica |
author_sort | Giulia Varotto |
collection | DOAJ |
description | Aim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery.Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered.Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method.Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome. |
first_indexed | 2024-12-13T16:42:50Z |
format | Article |
id | doaj.art-1b9ae50c9eed4277bc3395713a3db303 |
institution | Directory Open Access Journal |
issn | 1662-5196 |
language | English |
last_indexed | 2024-12-13T16:42:50Z |
publishDate | 2021-11-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Neuroinformatics |
spelling | doaj.art-1b9ae50c9eed4277bc3395713a3db3032022-12-21T23:38:14ZengFrontiers Media S.A.Frontiers in Neuroinformatics1662-51962021-11-011510.3389/fninf.2021.715421715421Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal EpilepsyGiulia Varotto0Giulia Varotto1Gianluca Susi2Gianluca Susi3Laura Tassi4Francesca Gozzo5Silvana Franceschetti6Ferruccio Panzica7Epilepsy Unit, Bioengineering Group, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, ItalyNeurophysiopathology Unit, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, ItalyUniversidad Complutense de Madrid-Universidad Politécnica de Madrid (UPM-UCM) Laboratory of Cognitive and Computational Neuroscience, Center of Biomedical Technology, Technical University of Madrid, Madrid, SpainDepartment of Experimental Psychology, Cognitive Processes and Logopedy, Complutense University of Madrid, Madrid, Spain“Claudio Munari” Epilepsy Surgery Centre, Niguarda Hospital, Milan, Italy“Claudio Munari” Epilepsy Surgery Centre, Niguarda Hospital, Milan, ItalyNeurophysiopathology Unit, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, ItalyClinical Engineering, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milan, ItalyAim: In neuroscience research, data are quite often characterized by an imbalanced distribution between the majority and minority classes, an issue that can limit or even worsen the prediction performance of machine learning methods. Different resampling procedures have been developed to face this problem and a lot of work has been done in comparing their effectiveness in different scenarios. Notably, the robustness of such techniques has been tested among a wide variety of different datasets, without considering the performance of each specific dataset. In this study, we compare the performances of different resampling procedures for the imbalanced domain in stereo-electroencephalography (SEEG) recordings of the patients with focal epilepsies who underwent surgery.Methods: We considered data obtained by network analysis of interictal SEEG recorded from 10 patients with drug-resistant focal epilepsies, for a supervised classification problem aimed at distinguishing between the epileptogenic and non-epileptogenic brain regions in interictal conditions. We investigated the effectiveness of five oversampling and five undersampling procedures, using 10 different machine learning classifiers. Moreover, six specific ensemble methods for the imbalanced domain were also tested. To compare the performances, Area under the ROC curve (AUC), F-measure, Geometric Mean, and Balanced Accuracy were considered.Results: Both the resampling procedures showed improved performances with respect to the original dataset. The oversampling procedure was found to be more sensitive to the type of classification method employed, with Adaptive Synthetic Sampling (ADASYN) exhibiting the best performances. All the undersampling approaches were more robust than the oversampling among the different classifiers, with Random Undersampling (RUS) exhibiting the best performance despite being the simplest and most basic classification method.Conclusions: The application of machine learning techniques that take into consideration the balance of features by resampling is beneficial and leads to more accurate localization of the epileptogenic zone from interictal periods. In addition, our results highlight the importance of the type of classification method that must be used together with the resampling to maximize the benefit to the outcome.https://www.frontiersin.org/articles/10.3389/fninf.2021.715421/fullimbalanced dataset classificationre-sampling techniquesoversampling and undersamplingensemble methodsnetwork analysisepilepsy surgery |
spellingShingle | Giulia Varotto Giulia Varotto Gianluca Susi Gianluca Susi Laura Tassi Francesca Gozzo Silvana Franceschetti Ferruccio Panzica Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy Frontiers in Neuroinformatics imbalanced dataset classification re-sampling techniques oversampling and undersampling ensemble methods network analysis epilepsy surgery |
title | Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy |
title_full | Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy |
title_fullStr | Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy |
title_full_unstemmed | Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy |
title_short | Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients With Focal Epilepsy |
title_sort | comparison of resampling techniques for imbalanced datasets in machine learning application to epileptogenic zone localization from interictal intracranial eeg recordings in patients with focal epilepsy |
topic | imbalanced dataset classification re-sampling techniques oversampling and undersampling ensemble methods network analysis epilepsy surgery |
url | https://www.frontiersin.org/articles/10.3389/fninf.2021.715421/full |
work_keys_str_mv | AT giuliavarotto comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT giuliavarotto comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT gianlucasusi comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT gianlucasusi comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT lauratassi comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT francescagozzo comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT silvanafranceschetti comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy AT ferrucciopanzica comparisonofresamplingtechniquesforimbalanceddatasetsinmachinelearningapplicationtoepileptogeniczonelocalizationfrominterictalintracranialeegrecordingsinpatientswithfocalepilepsy |