Identification of cancer related genes using feature selection and association rule mining

High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this stu...

Full description

Bibliographic Details
Main Authors: Consolata Gakii, Richard Rimiru
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S235291482100085X
_version_ 1819066615632756736
author Consolata Gakii
Richard Rimiru
author_facet Consolata Gakii
Richard Rimiru
author_sort Consolata Gakii
collection DOAJ
description High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function.
first_indexed 2024-12-21T16:05:11Z
format Article
id doaj.art-70b768d6a9d149fbbdad26ce846c1f34
institution Directory Open Access Journal
issn 2352-9148
language English
last_indexed 2024-12-21T16:05:11Z
publishDate 2021-01-01
publisher Elsevier
record_format Article
series Informatics in Medicine Unlocked
spelling doaj.art-70b768d6a9d149fbbdad26ce846c1f342022-12-21T18:57:54ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100595Identification of cancer related genes using feature selection and association rule miningConsolata Gakii0Richard Rimiru1School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya; Department of Mathematics, Computing and Information Technology, University of Embu, Embu, Kenya; Corresponding author. School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya.School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, KenyaHigh throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function.http://www.sciencedirect.com/science/article/pii/S235291482100085XFeature selectionDiscretizationAssociation rule miningCoexpression network
spellingShingle Consolata Gakii
Richard Rimiru
Identification of cancer related genes using feature selection and association rule mining
Informatics in Medicine Unlocked
Feature selection
Discretization
Association rule mining
Coexpression network
title Identification of cancer related genes using feature selection and association rule mining
title_full Identification of cancer related genes using feature selection and association rule mining
title_fullStr Identification of cancer related genes using feature selection and association rule mining
title_full_unstemmed Identification of cancer related genes using feature selection and association rule mining
title_short Identification of cancer related genes using feature selection and association rule mining
title_sort identification of cancer related genes using feature selection and association rule mining
topic Feature selection
Discretization
Association rule mining
Coexpression network
url http://www.sciencedirect.com/science/article/pii/S235291482100085X
work_keys_str_mv AT consolatagakii identificationofcancerrelatedgenesusingfeatureselectionandassociationrulemining
AT richardrimiru identificationofcancerrelatedgenesusingfeatureselectionandassociationrulemining