Identification of cancer related genes using feature selection and association rule mining
High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this stu...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Informatics in Medicine Unlocked |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S235291482100085X |
_version_ | 1819066615632756736 |
---|---|
author | Consolata Gakii Richard Rimiru |
author_facet | Consolata Gakii Richard Rimiru |
author_sort | Consolata Gakii |
collection | DOAJ |
description | High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function. |
first_indexed | 2024-12-21T16:05:11Z |
format | Article |
id | doaj.art-70b768d6a9d149fbbdad26ce846c1f34 |
institution | Directory Open Access Journal |
issn | 2352-9148 |
language | English |
last_indexed | 2024-12-21T16:05:11Z |
publishDate | 2021-01-01 |
publisher | Elsevier |
record_format | Article |
series | Informatics in Medicine Unlocked |
spelling | doaj.art-70b768d6a9d149fbbdad26ce846c1f342022-12-21T18:57:54ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100595Identification of cancer related genes using feature selection and association rule miningConsolata Gakii0Richard Rimiru1School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya; Department of Mathematics, Computing and Information Technology, University of Embu, Embu, Kenya; Corresponding author. School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya.School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, KenyaHigh throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function.http://www.sciencedirect.com/science/article/pii/S235291482100085XFeature selectionDiscretizationAssociation rule miningCoexpression network |
spellingShingle | Consolata Gakii Richard Rimiru Identification of cancer related genes using feature selection and association rule mining Informatics in Medicine Unlocked Feature selection Discretization Association rule mining Coexpression network |
title | Identification of cancer related genes using feature selection and association rule mining |
title_full | Identification of cancer related genes using feature selection and association rule mining |
title_fullStr | Identification of cancer related genes using feature selection and association rule mining |
title_full_unstemmed | Identification of cancer related genes using feature selection and association rule mining |
title_short | Identification of cancer related genes using feature selection and association rule mining |
title_sort | identification of cancer related genes using feature selection and association rule mining |
topic | Feature selection Discretization Association rule mining Coexpression network |
url | http://www.sciencedirect.com/science/article/pii/S235291482100085X |
work_keys_str_mv | AT consolatagakii identificationofcancerrelatedgenesusingfeatureselectionandassociationrulemining AT richardrimiru identificationofcancerrelatedgenesusingfeatureselectionandassociationrulemining |