Empirical Evaluation on the Impact of Class Overlap for EEG-Based Early Epileptic Seizure Detection

Important physiological information is hidden in electroencephalography (EEG), which can reflect the human brain's activity. EEG, which is a kind of complicated signal, can be used for epileptic seizure detection and epilepsy diagnosis via machine learning. A large amount of effort, including r...

Full description

Bibliographic Details
Main Authors: Yubin Qu, Xiang Chen, Fang Li, Fan Yang, Junxia Ji, Long Li
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9210622/
Description
Summary:Important physiological information is hidden in electroencephalography (EEG), which can reflect the human brain's activity. EEG, which is a kind of complicated signal, can be used for epileptic seizure detection and epilepsy diagnosis via machine learning. A large amount of effort, including raw signal preprocessing and data preprocessing for machine learning, is required for constructing high-quality training datasets because the classification performance highly depends on high-quality data. Feature extraction has been widely used in EEG-based early epileptic seizure detection. Due to the complexity of data collection and labeling, some of the training instances are inevitably mislabeled. That means some similar instances have different labels. This is called the issue of class overlap, which leads to a poor class boundary for classification models and makes constructing a high-quality classification model more difficult. However, the previous studies investigating the impact of the class overlap for EEG data is quite limited. Our goal is to investigate the impact of the class overlap on EEG-based early epileptic seizure detection. We propose a special neighborhood cleaning rule (SNCR) to solve the class overlap issue. To alleviate the class overlap issue, we conduct large-scale experiments on two widely-used EEG datasets and compare our proposed SNCR strategy with a state-of-the-art data clean strategy, i.e., the improved k-means clustering cleaning approach (IKMCCA). The experimental results show that the classification model can achieve significantly better performance in terms of AUC, recall, and F1 metrics when using our proposed SNCR strategy. Therefore, for EEG-based early epileptic seizure detection, we recommend researchers to apply the SNCR strategy to mitigate the class overlap issue and use the SNCR strategy to perform data preprocessing in a future related study.
ISSN:2169-3536