Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System

The genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient’s he...

Full description

Bibliographic Details
Main Authors: Ganeshkumar Pugalendhi, M. Mazhar Rathore, Dhirendra Shukla, Anand Paul
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10072401/
_version_ 1797847342538293248
author Ganeshkumar Pugalendhi
M. Mazhar Rathore
Dhirendra Shukla
Anand Paul
author_facet Ganeshkumar Pugalendhi
M. Mazhar Rathore
Dhirendra Shukla
Anand Paul
author_sort Ganeshkumar Pugalendhi
collection DOAJ
description The genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient&#x2019;s health condition if it is presented in a meaningful way and processed on time. Several methods have been proposed to reduce the dimensions of medical microarray data and optimize its search space with minimal accuracy loss. However, the discretization of continuous gene-values in the process of dimension reduction is failed to preserve the inherent meaning of genes. Also, ensuring high accuracy and interpretability in the reduction process may result in extra processing time, which is unfavorable for time-critical applications. To overcome these issues, in this paper, we propose a dimension reduction method in conjunction with a fuzzy expert system (FES) optimization approach, while keeping an accuracy-interpretability-speedy tradeoff in mind. To accomplish this, we use a fuzzy rough set on <inline-formula> <tex-math notation="LaTeX">${f}$ </tex-math></inline-formula>-information to identify meaningful genes without changing their original values. We propose a conditionally guided particle swarm optimization for faster knowledge acquisition, where the velocity is adjusted based on a predefined update probability, resulting in a faster search. A big data processing architecture is designed using the Hadoop ecosystem along with a <inline-formula> <tex-math notation="LaTeX">$MapReduce$ </tex-math></inline-formula>-equivalent algorithm of the proposed method for speedy processing, enabling parallel processing on microarray data to reduce dimensions and perform classification through knowledge extraction. The proposed method is thoroughly tested on eleven microarray datasets by considering accuracy-interpretability-speed tradeoff. The results show that the proposed method is effective in identifying disease-causing genes while also understanding the patient&#x2019;s genetic profile with only a few operations and a small amount of CPU time. Statistical tests are also run to validate the proposed method&#x2019;s efficacy in comparison to other methods.
first_indexed 2024-04-09T18:09:47Z
format Article
id doaj.art-2065c73aabb64e94ae338f272c2ad18d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-09T18:09:47Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-2065c73aabb64e94ae338f272c2ad18d2023-04-13T23:01:04ZengIEEEIEEE Access2169-35362023-01-0111351823519610.1109/ACCESS.2023.325787510072401Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert SystemGaneshkumar Pugalendhi0https://orcid.org/0000-0001-8681-8169M. Mazhar Rathore1Dhirendra Shukla2https://orcid.org/0000-0002-0036-714XAnand Paul3https://orcid.org/0009-0001-2119-5148Department of Information Technology, Anna University Regional Campus, Coimbatore, IndiaDr. J. Herbert Smith Centre, University of New Brunswick, Fredericton, CanadaDr. J. Herbert Smith Centre, University of New Brunswick, Fredericton, CanadaSchool of Computer Science and Engineering, Kyungpook National University, Daegu, South KoreaThe genes data produced by microarray experiments is complex in terms of dimensions and samples. It consumes a lot of computation power and time when it is processed for a disease analysis while working with an expert system. At the same time, data can help doctors identify a patient&#x2019;s health condition if it is presented in a meaningful way and processed on time. Several methods have been proposed to reduce the dimensions of medical microarray data and optimize its search space with minimal accuracy loss. However, the discretization of continuous gene-values in the process of dimension reduction is failed to preserve the inherent meaning of genes. Also, ensuring high accuracy and interpretability in the reduction process may result in extra processing time, which is unfavorable for time-critical applications. To overcome these issues, in this paper, we propose a dimension reduction method in conjunction with a fuzzy expert system (FES) optimization approach, while keeping an accuracy-interpretability-speedy tradeoff in mind. To accomplish this, we use a fuzzy rough set on <inline-formula> <tex-math notation="LaTeX">${f}$ </tex-math></inline-formula>-information to identify meaningful genes without changing their original values. We propose a conditionally guided particle swarm optimization for faster knowledge acquisition, where the velocity is adjusted based on a predefined update probability, resulting in a faster search. A big data processing architecture is designed using the Hadoop ecosystem along with a <inline-formula> <tex-math notation="LaTeX">$MapReduce$ </tex-math></inline-formula>-equivalent algorithm of the proposed method for speedy processing, enabling parallel processing on microarray data to reduce dimensions and perform classification through knowledge extraction. The proposed method is thoroughly tested on eleven microarray datasets by considering accuracy-interpretability-speed tradeoff. The results show that the proposed method is effective in identifying disease-causing genes while also understanding the patient&#x2019;s genetic profile with only a few operations and a small amount of CPU time. Statistical tests are also run to validate the proposed method&#x2019;s efficacy in comparison to other methods.https://ieeexplore.ieee.org/document/10072401/f-informationfuzzy expert systemmicroarray dataparticle swarm optimization
spellingShingle Ganeshkumar Pugalendhi
M. Mazhar Rathore
Dhirendra Shukla
Anand Paul
Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
IEEE Access
f-information
fuzzy expert system
microarray data
particle swarm optimization
title Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
title_full Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
title_fullStr Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
title_full_unstemmed Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
title_short Handling Big Microarray Data: A Novel Approach to Design Accurate Fuzzy-Based Medical Expert System
title_sort handling big microarray data a novel approach to design accurate fuzzy based medical expert system
topic f-information
fuzzy expert system
microarray data
particle swarm optimization
url https://ieeexplore.ieee.org/document/10072401/
work_keys_str_mv AT ganeshkumarpugalendhi handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem
AT mmazharrathore handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem
AT dhirendrashukla handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem
AT anandpaul handlingbigmicroarraydataanovelapproachtodesignaccuratefuzzybasedmedicalexpertsystem