Clustering column-mean quantile median: a new methodology for imputing missing data
Abstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, espe...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2022-12-01
|
Series: | Journal of Engineering and Applied Science |
Subjects: | |
Online Access: | https://doi.org/10.1186/s44147-022-00148-7 |
_version_ | 1797980180567818240 |
---|---|
author | Nourhan Yehia Manal Abdel Wahed Mai Said Mabrouk |
author_facet | Nourhan Yehia Manal Abdel Wahed Mai Said Mabrouk |
author_sort | Nourhan Yehia |
collection | DOAJ |
description | Abstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency. |
first_indexed | 2024-04-11T05:50:19Z |
format | Article |
id | doaj.art-cac848eeba9e483eb81767d85be8b4b5 |
institution | Directory Open Access Journal |
issn | 1110-1903 2536-9512 |
language | English |
last_indexed | 2024-04-11T05:50:19Z |
publishDate | 2022-12-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Engineering and Applied Science |
spelling | doaj.art-cac848eeba9e483eb81767d85be8b4b52022-12-22T04:42:05ZengSpringerOpenJournal of Engineering and Applied Science1110-19032536-95122022-12-0169111510.1186/s44147-022-00148-7Clustering column-mean quantile median: a new methodology for imputing missing dataNourhan Yehia0Manal Abdel Wahed1Mai Said Mabrouk2Biomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology UniversitySystems and Biomedical Engineering Department, Faculty of Engineering, Cairo UniversityBiomedical Engineering Department, Faculty of Engineering, Misr University for Science and Technology UniversityAbstract DNA microarray data sets have been widely explored and used to analyze data without any previous biological background. However, analyzing them becomes challenging if data are missing. Thus, machine learning techniques are applied because microarray technology is promising in genomics, especially in the analysis of gene expression data. Furthermore, gene expression data can describe the transcription and translation processes of each genetic information in detail. In this study, a new system was proposed to impute more realizable values for missing data in a microarray dataset. This system was validated and evaluated on 42 samples of rectal cancer. Several evaluation tests were also conducted to confirm the effectiveness of the new system and compare it with highly known imputing algorithms. The proposed clustering column-mean quantile median technique could predict highly informative missing genes, thereby reducing the difference between the original and imputed datasets and demonstrating its efficiency.https://doi.org/10.1186/s44147-022-00148-7MicroarrayMissing dataImputationMachine learning |
spellingShingle | Nourhan Yehia Manal Abdel Wahed Mai Said Mabrouk Clustering column-mean quantile median: a new methodology for imputing missing data Journal of Engineering and Applied Science Microarray Missing data Imputation Machine learning |
title | Clustering column-mean quantile median: a new methodology for imputing missing data |
title_full | Clustering column-mean quantile median: a new methodology for imputing missing data |
title_fullStr | Clustering column-mean quantile median: a new methodology for imputing missing data |
title_full_unstemmed | Clustering column-mean quantile median: a new methodology for imputing missing data |
title_short | Clustering column-mean quantile median: a new methodology for imputing missing data |
title_sort | clustering column mean quantile median a new methodology for imputing missing data |
topic | Microarray Missing data Imputation Machine learning |
url | https://doi.org/10.1186/s44147-022-00148-7 |
work_keys_str_mv | AT nourhanyehia clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata AT manalabdelwahed clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata AT maisaidmabrouk clusteringcolumnmeanquantilemediananewmethodologyforimputingmissingdata |