Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
The genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient’s he...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10072401/ |
_version_ | 1797847342538293248 |
---|---|
author | Ganeshkumar Pugalendhi M. Mazhar Rathore Dhirendra Shukla Anand Paul |
author_facet | Ganeshkumar Pugalendhi M. Mazhar Rathore Dhirendra Shukla Anand Paul |
author_sort | Ganeshkumar Pugalendhi |
collection | DOAJ |
description | The genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient’s health condition if it is presented in a meaningful way and processed on time. Several methods have been proposed to reduce the dimensions of medical microarray data and optimize its search space with minimal accuracy loss. However, the discretization of continuous gene-values in the process of dimension reduction is failed to preserve the inherent meaning of genes. Also, ensuring high accuracy and interpretability in the reduction process may result in extra processing time, which is unfavorable for time-critical applications. To overcome these issues, in this paper, we propose a dimension reduction method in conjunction with a fuzzy expert system (FES) optimization approach, while keeping an accuracy-interpretability-speedy tradeoff in mind. To accomplish this, we use a fuzzy rough set on <inline-formula> <tex-math notation="LaTeX">${f}$ </tex-math></inline-formula>-information to identify meaningful genes without changing their original values. We propose a conditionally guided particle swarm optimization for faster knowledge acquisition, where the velocity is adjusted based on a predefined update probability, resulting in a faster search. A big data processing architecture is designed using the Hadoop ecosystem along with a <inline-formula> <tex-math notation="LaTeX">$MapReduce$ </tex-math></inline-formula>-equivalent algorithm of the proposed method for speedy processing, enabling parallel processing on microarray data to reduce dimensions and perform classification through knowledge extraction. The proposed method is thoroughly tested on eleven microarray datasets by considering accuracy-interpretability-speed tradeoff. The results show that the proposed method is effective in identifying disease-causing genes while also understanding the patient’s genetic profile with only a few operations and a small amount of CPU time. Statistical tests are also run to validate the proposed method’s efficacy in comparison to other methods. |
first_indexed | 2024-04-09T18:09:47Z |
format | Article |
id | doaj.art-2065c73aabb64e94ae338f272c2ad18d |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-09T18:09:47Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-2065c73aabb64e94ae338f272c2ad18d2023-04-13T23:01:04ZengIEEEIEEE Access2169-35362023-01-0111351823519610.1109/ACCESS.2023.325787510072401Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert SystemGaneshkumar Pugalendhi0https://orcid.org/0000-0001-8681-8169M. Mazhar Rathore1Dhirendra Shukla2https://orcid.org/0000-0002-0036-714XAnand Paul3https://orcid.org/0009-0001-2119-5148Department of Information Technology, Anna University Regional Campus, Coimbatore, IndiaDr. J. Herbert Smith Centre, University of New Brunswick, Fredericton, CanadaDr. J. Herbert Smith Centre, University of New Brunswick, Fredericton, CanadaSchool of Computer Science and Engineering, Kyungpook National University, Daegu, South KoreaThe genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient’s health condition if it is presented in a meaningful way and processed on time. Several methods have been proposed to reduce the dimensions of medical microarray data and optimize its search space with minimal accuracy loss. However, the discretization of continuous gene-values in the process of dimension reduction is failed to preserve the inherent meaning of genes. Also, ensuring high accuracy and interpretability in the reduction process may result in extra processing time, which is unfavorable for time-critical applications. To overcome these issues, in this paper, we propose a dimension reduction method in conjunction with a fuzzy expert system (FES) optimization approach, while keeping an accuracy-interpretability-speedy tradeoff in mind. To accomplish this, we use a fuzzy rough set on <inline-formula> <tex-math notation="LaTeX">${f}$ </tex-math></inline-formula>-information to identify meaningful genes without changing their original values. We propose a conditionally guided particle swarm optimization for faster knowledge acquisition, where the velocity is adjusted based on a predefined update probability, resulting in a faster search. A big data processing architecture is designed using the Hadoop ecosystem along with a <inline-formula> <tex-math notation="LaTeX">$MapReduce$ </tex-math></inline-formula>-equivalent algorithm of the proposed method for speedy processing, enabling parallel processing on microarray data to reduce dimensions and perform classification through knowledge extraction. The proposed method is thoroughly tested on eleven microarray datasets by considering accuracy-interpretability-speed tradeoff. The results show that the proposed method is effective in identifying disease-causing genes while also understanding the patient’s genetic profile with only a few operations and a small amount of CPU time. Statistical tests are also run to validate the proposed method’s efficacy in comparison to other methods.https://ieeexplore.ieee.org/document/10072401/f-informationfuzzy expert systemmicroarray dataparticle swarm optimization |
spellingShingle | Ganeshkumar Pugalendhi M. Mazhar Rathore Dhirendra Shukla Anand Paul Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System IEEE Access f-information fuzzy expert system microarray data particle swarm optimization |
title | Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System |
title_full | Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System |
title_fullStr | Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System |
title_full_unstemmed | Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System |
title_short | Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System |
title_sort | handling big microarray data a novel approach to design accurate fuzzy based medical expert system |
topic | f-information fuzzy expert system microarray data particle swarm optimization |
url | https://ieeexplore.ieee.org/document/10072401/ |
work_keys_str_mv | AT ganeshkumarpugalendhi handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem AT mmazharrathore handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem AT dhirendrashukla handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem AT anandpaul handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem |