Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin

To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-l...

Full description

Bibliographic Details
Main Authors: Magdalena Kircher, Elisa Chludzinski, Jessica Krepel, Babak Saremi, Andreas Beineke, Klaus Jung
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/23/5/2481
_version_ 1797474997963325440
author Magdalena Kircher
Elisa Chludzinski
Jessica Krepel
Babak Saremi
Andreas Beineke
Klaus Jung
author_facet Magdalena Kircher
Elisa Chludzinski
Jessica Krepel
Babak Saremi
Andreas Beineke
Klaus Jung
author_sort Magdalena Kircher
collection DOAJ
description To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined.
first_indexed 2024-03-09T20:38:54Z
format Article
id doaj.art-6a71f870d6a74b468ab4ef8b64fe00d9
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-09T20:38:54Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-6a71f870d6a74b468ab4ef8b64fe00d92023-11-23T23:04:44ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672022-02-01235248110.3390/ijms23052481Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral OriginMagdalena Kircher0Elisa Chludzinski1Jessica Krepel2Babak Saremi3Andreas Beineke4Klaus Jung5Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, GermanyDepartment of Pathology, University of Veterinary Medicine Hannover, Buenteweg 17, 30559 Hannover, GermanyInstitute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, GermanyInstitute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, GermanyDepartment of Pathology, University of Veterinary Medicine Hannover, Buenteweg 17, 30559 Hannover, GermanyInstitute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Buenteweg 17p, 30559 Hannover, GermanyTo better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined.https://www.mdpi.com/1422-0067/23/5/2481data augmentationdeep learninggenerative adversarial networkstranscriptomic datahigh-dimensional dataviral acute respiratory illness
spellingShingle Magdalena Kircher
Elisa Chludzinski
Jessica Krepel
Babak Saremi
Andreas Beineke
Klaus Jung
Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
International Journal of Molecular Sciences
data augmentation
deep learning
generative adversarial networks
transcriptomic data
high-dimensional data
viral acute respiratory illness
title Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_full Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_fullStr Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_full_unstemmed Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_short Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin
title_sort augmentation of transcriptomic data for improved classification of patients with respiratory diseases of viral origin
topic data augmentation
deep learning
generative adversarial networks
transcriptomic data
high-dimensional data
viral acute respiratory illness
url https://www.mdpi.com/1422-0067/23/5/2481
work_keys_str_mv AT magdalenakircher augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT elisachludzinski augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT jessicakrepel augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT babaksaremi augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT andreasbeineke augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin
AT klausjung augmentationoftranscriptomicdataforimprovedclassificationofpatientswithrespiratorydiseasesofviralorigin