Clustering column-mean quantile median: a new methodology for imputing missing data

Abstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, espe...

Full description

Bibliographic Details
Main Authors:	Nourhan Yehia, Manal Abdel Wahed, Mai Said Mabrouk
Format:	Article
Language:	English
Published:	SpringerOpen 2022-12-01
Series:	Journal of Engineering and Applied Science
Subjects:	Microarray Missing data Imputation Machine learning
Online Access:	https://doi.org/10.1186/s44147-022-00148-7

_version_	1797980180567818240
author	Nourhan Yehia Manal Abdel Wahed Mai Said Mabrouk
author_facet	Nourhan Yehia Manal Abdel Wahed Mai Said Mabrouk
author_sort	Nourhan Yehia
collection	DOAJ
description	Abstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency.
first_indexed	2024-04-11T05:50:19Z
format	Article
id	doaj.art-cac848eeba9e483eb81767d85be8b4b5
institution	Directory Open Access Journal
issn	1110-1903 2536-9512
language	English
last_indexed	2024-04-11T05:50:19Z
publishDate	2022-12-01
publisher	SpringerOpen
record_format	Article
series	Journal of Engineering and Applied Science
spelling	doaj.art-cac848eeba9e483eb81767d85be8b4b52022-12-22T04:42:05ZengSpringerOpenJournal of Engineering and Applied Science1110-19032536-95122022-12-0169111510.1186/s44147-022-00148-7Clustering column-mean quantile median: a new methodology for imputing missing dataNourhan Yehia0Manal Abdel Wahed1Mai Said Mabrouk2Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology UniversitySystems and Biomedical Engineering Department, Faculty of Engineering, Cairo UniversityBiomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology UniversityAbstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency.https://doi.org/10.1186/s44147-022-00148-7MicroarrayMissing dataImputationMachine learning
spellingShingle	Nourhan Yehia Manal Abdel Wahed Mai Said Mabrouk Clustering column-mean quantile median: a new methodology for imputing missing data Journal of Engineering and Applied Science Microarray Missing data Imputation Machine learning
title	Clustering column-mean quantile median: a new methodology for imputing missing data
title_full	Clustering column-mean quantile median: a new methodology for imputing missing data
title_fullStr	Clustering column-mean quantile median: a new methodology for imputing missing data
title_full_unstemmed	Clustering column-mean quantile median: a new methodology for imputing missing data
title_short	Clustering column-mean quantile median: a new methodology for imputing missing data
title_sort	clustering column mean quantile median a new methodology for imputing missing data
topic	Microarray Missing data Imputation Machine learning
url	https://doi.org/10.1186/s44147-022-00148-7
work_keys_str_mv	AT nourhanyehia clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata AT manalabdelwahed clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata AT maisaidmabrouk clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata

Clustering column-mean quantile median: a new methodology for imputing missing data

Similar Items