Deep Learning Methods for Omics Data Imputation

One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However,...

Full description

Bibliographic Details
Main Authors: Lei Huang, Meng Song, Hui Shen, Huixiao Hong, Ping Gong, Hong-Wen Deng, Chaoyang Zhang
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Biology
Subjects:
Online Access:https://www.mdpi.com/2079-7737/12/10/1313
_version_ 1797574642553061376
author Lei Huang
Meng Song
Hui Shen
Huixiao Hong
Ping Gong
Hong-Wen Deng
Chaoyang Zhang
author_facet Lei Huang
Meng Song
Hui Shen
Huixiao Hong
Ping Gong
Hong-Wen Deng
Chaoyang Zhang
author_sort Lei Huang
collection DOAJ
description One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.
first_indexed 2024-03-10T21:25:12Z
format Article
id doaj.art-208961ba39054211a6827c8ebe0f54d8
institution Directory Open Access Journal
issn 2079-7737
language English
last_indexed 2024-03-10T21:25:12Z
publishDate 2023-10-01
publisher MDPI AG
record_format Article
series Biology
spelling doaj.art-208961ba39054211a6827c8ebe0f54d82023-11-19T15:43:32ZengMDPI AGBiology2079-77372023-10-011210131310.3390/biology12101313Deep Learning Methods for Omics Data ImputationLei Huang0Meng Song1Hui Shen2Huixiao Hong3Ping Gong4Hong-Wen Deng5Chaoyang Zhang6School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USASchool of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USACenter for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USADivision of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USAEnvironmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USACenter for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USASchool of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USAOne common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.https://www.mdpi.com/2079-7737/12/10/1313omics imputationdeep learningmulti-omics imputation
spellingShingle Lei Huang
Meng Song
Hui Shen
Huixiao Hong
Ping Gong
Hong-Wen Deng
Chaoyang Zhang
Deep Learning Methods for Omics Data Imputation
Biology
omics imputation
deep learning
multi-omics imputation
title Deep Learning Methods for Omics Data Imputation
title_full Deep Learning Methods for Omics Data Imputation
title_fullStr Deep Learning Methods for Omics Data Imputation
title_full_unstemmed Deep Learning Methods for Omics Data Imputation
title_short Deep Learning Methods for Omics Data Imputation
title_sort deep learning methods for omics data imputation
topic omics imputation
deep learning
multi-omics imputation
url https://www.mdpi.com/2079-7737/12/10/1313
work_keys_str_mv AT leihuang deeplearningmethodsforomicsdataimputation
AT mengsong deeplearningmethodsforomicsdataimputation
AT huishen deeplearningmethodsforomicsdataimputation
AT huixiaohong deeplearningmethodsforomicsdataimputation
AT pinggong deeplearningmethodsforomicsdataimputation
AT hongwendeng deeplearningmethodsforomicsdataimputation
AT chaoyangzhang deeplearningmethodsforomicsdataimputation