Clustering column-mean quantile median: a new methodology for imputing missing data

Abstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, espe...

Full description

Bibliographic Details
Main Authors: Nourhan Yehia, Manal Abdel Wahed, Mai Said Mabrouk
Format: Article
Language:English
Published: SpringerOpen 2022-12-01
Series:Journal of Engineering and Applied Science
Subjects:
Online Access:https://doi.org/10.1186/s44147-022-00148-7
_version_ 1797980180567818240
author Nourhan Yehia
Manal Abdel Wahed
Mai Said Mabrouk
author_facet Nourhan Yehia
Manal Abdel Wahed
Mai Said Mabrouk
author_sort Nourhan Yehia
collection DOAJ
description Abstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency.
first_indexed 2024-04-11T05:50:19Z
format Article
id doaj.art-cac848eeba9e483eb81767d85be8b4b5
institution Directory Open Access Journal
issn 1110-1903
2536-9512
language English
last_indexed 2024-04-11T05:50:19Z
publishDate 2022-12-01
publisher SpringerOpen
record_format Article
series Journal of Engineering and Applied Science
spelling doaj.art-cac848eeba9e483eb81767d85be8b4b52022-12-22T04:42:05ZengSpringerOpenJournal of Engineering and Applied Science1110-19032536-95122022-12-0169111510.1186/s44147-022-00148-7Clustering column-mean quantile median: a new methodology for imputing missing dataNourhan Yehia0Manal Abdel Wahed1Mai Said Mabrouk2Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology UniversitySystems and Biomedical Engineering Department, Faculty of Engineering, Cairo UniversityBiomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology UniversityAbstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency.https://doi.org/10.1186/s44147-022-00148-7MicroarrayMissing dataImputationMachine learning
spellingShingle Nourhan Yehia
Manal Abdel Wahed
Mai Said Mabrouk
Clustering column-mean quantile median: a new methodology for imputing missing data
Journal of Engineering and Applied Science
Microarray
Missing data
Imputation
Machine learning
title Clustering column-mean quantile median: a new methodology for imputing missing data
title_full Clustering column-mean quantile median: a new methodology for imputing missing data
title_fullStr Clustering column-mean quantile median: a new methodology for imputing missing data
title_full_unstemmed Clustering column-mean quantile median: a new methodology for imputing missing data
title_short Clustering column-mean quantile median: a new methodology for imputing missing data
title_sort clustering column mean quantile median a new methodology for imputing missing data
topic Microarray
Missing data
Imputation
Machine learning
url https://doi.org/10.1186/s44147-022-00148-7
work_keys_str_mv AT nourhanyehia clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata
AT manalabdelwahed clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata
AT maisaidmabrouk clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata