Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder

IntroductionMajor depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising resul...

Full description

Bibliographic Details
Main Authors:	Friedrich Philipp Carrle, Yasmin Hollenbenders, Alexandra Reichenbach
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-10-01
Series:	Frontiers in Neuroscience
Subjects:	major depressive disorder electroencephalography generative adversarial network deep learning data augmentation synthetic data
Online Access:	https://www.frontiersin.org/articles/10.3389/fnins.2023.1219133/full

_version_	1797262493082451968
author	Friedrich Philipp Carrle Friedrich Philipp Carrle Yasmin Hollenbenders Yasmin Hollenbenders Alexandra Reichenbach Alexandra Reichenbach
author_facet	Friedrich Philipp Carrle Friedrich Philipp Carrle Yasmin Hollenbenders Yasmin Hollenbenders Alexandra Reichenbach Alexandra Reichenbach
author_sort	Friedrich Philipp Carrle
collection	DOAJ
description	IntroductionMajor depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising results. However, the generalizability of those models, a prerequisite for clinical application, is restricted by small datasets. One approach to train ML models with good generalizability is complementing the original with synthetic data produced by generative algorithms. Another advantage of synthetic data is the possibility of publishing the data for other researchers without risking patient data privacy. Synthetic EEG time-series have not yet been generated for two clinical populations like MDD patients and healthy controls.MethodsWe first reviewed 27 studies presenting EEG data augmentation with generative algorithms for classification tasks, like diagnosis, for the possibilities and shortcomings of recent methods. The subsequent empirical study generated EEG time-series based on two public datasets with 30/28 and 24/29 subjects (MDD/controls). To obtain baseline diagnostic accuracies, convolutional neural networks (CNN) were trained with time-series from each dataset. The data were synthesized with generative adversarial networks (GAN) consisting of CNNs. We evaluated the synthetic data qualitatively and quantitatively and finally used it for re-training the diagnostic model.ResultsThe reviewed studies improved their classification accuracies by between 1 and 40% with the synthetic data. Our own diagnostic accuracy improved up to 10% for one dataset but not significantly for the other. We found a rich repertoire of generative models in the reviewed literature, solving various technical issues. A major shortcoming in the field is the lack of meaningful evaluation metrics for synthetic data. The few studies analyzing the data in the frequency domain, including our own, show that only some features can be produced truthfully.DiscussionThe systematic review combined with our own investigation provides an overview of the available methods for generating EEG data for a classification task, their possibilities, and shortcomings. The approach is promising and the technical basis is set. For a broad application of these techniques in neuroscience research or clinical application, the methods need fine-tuning facilitated by domain expertise in (clinical) EEG research.
first_indexed	2024-04-24T23:57:59Z
format	Article
id	doaj.art-2723d3e9898a42dbae274f2c212a2140
institution	Directory Open Access Journal
issn	1662-453X
language	English
last_indexed	2024-04-24T23:57:59Z
publishDate	2023-10-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neuroscience
spelling	doaj.art-2723d3e9898a42dbae274f2c212a21402024-03-14T10:51:14ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-10-011710.3389/fnins.2023.12191331219133Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorderFriedrich Philipp Carrle0Friedrich Philipp Carrle1Yasmin Hollenbenders2Yasmin Hollenbenders3Alexandra Reichenbach4Alexandra Reichenbach5Center for Machine Learning, Heilbronn University, Heilbronn, GermanyMedical Faculty Heidelberg, University of Heidelberg, Heidelberg, GermanyCenter for Machine Learning, Heilbronn University, Heilbronn, GermanyMedical Faculty Heidelberg, University of Heidelberg, Heidelberg, GermanyCenter for Machine Learning, Heilbronn University, Heilbronn, GermanyMedical Faculty Heidelberg, University of Heidelberg, Heidelberg, GermanyIntroductionMajor depressive disorder (MDD) is the most common mental disorder worldwide, leading to impairment in quality and independence of life. Electroencephalography (EEG) biomarkers processed with machine learning (ML) algorithms have been explored for objective diagnoses with promising results. However, the generalizability of those models, a prerequisite for clinical application, is restricted by small datasets. One approach to train ML models with good generalizability is complementing the original with synthetic data produced by generative algorithms. Another advantage of synthetic data is the possibility of publishing the data for other researchers without risking patient data privacy. Synthetic EEG time-series have not yet been generated for two clinical populations like MDD patients and healthy controls.MethodsWe first reviewed 27 studies presenting EEG data augmentation with generative algorithms for classification tasks, like diagnosis, for the possibilities and shortcomings of recent methods. The subsequent empirical study generated EEG time-series based on two public datasets with 30/28 and 24/29 subjects (MDD/controls). To obtain baseline diagnostic accuracies, convolutional neural networks (CNN) were trained with time-series from each dataset. The data were synthesized with generative adversarial networks (GAN) consisting of CNNs. We evaluated the synthetic data qualitatively and quantitatively and finally used it for re-training the diagnostic model.ResultsThe reviewed studies improved their classification accuracies by between 1 and 40% with the synthetic data. Our own diagnostic accuracy improved up to 10% for one dataset but not significantly for the other. We found a rich repertoire of generative models in the reviewed literature, solving various technical issues. A major shortcoming in the field is the lack of meaningful evaluation metrics for synthetic data. The few studies analyzing the data in the frequency domain, including our own, show that only some features can be produced truthfully.DiscussionThe systematic review combined with our own investigation provides an overview of the available methods for generating EEG data for a classification task, their possibilities, and shortcomings. The approach is promising and the technical basis is set. For a broad application of these techniques in neuroscience research or clinical application, the methods need fine-tuning facilitated by domain expertise in (clinical) EEG research.https://www.frontiersin.org/articles/10.3389/fnins.2023.1219133/fullmajor depressive disorderelectroencephalographygenerative adversarial networkdeep learningdata augmentationsynthetic data
spellingShingle	Friedrich Philipp Carrle Friedrich Philipp Carrle Yasmin Hollenbenders Yasmin Hollenbenders Alexandra Reichenbach Alexandra Reichenbach Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder Frontiers in Neuroscience major depressive disorder electroencephalography generative adversarial network deep learning data augmentation synthetic data
title	Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_full	Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_fullStr	Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_full_unstemmed	Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_short	Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder
title_sort	generation of synthetic eeg data for training algorithms supporting the diagnosis of major depressive disorder
topic	major depressive disorder electroencephalography generative adversarial network deep learning data augmentation synthetic data
url	https://www.frontiersin.org/articles/10.3389/fnins.2023.1219133/full
work_keys_str_mv	AT friedrichphilippcarrle generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder AT friedrichphilippcarrle generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder AT yasminhollenbenders generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder AT yasminhollenbenders generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder AT alexandrareichenbach generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder AT alexandrareichenbach generationofsyntheticeegdatafortrainingalgorithmssupportingthediagnosisofmajordepressivedisorder

Generation of synthetic EEG data for training algorithms supporting the diagnosis of major depressive disorder

Similar Items