Identification of cancer related genes using feature selection and association rule mining

High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this stu...

Full description

Bibliographic Details
Main Authors:	Consolata Gakii, Richard Rimiru
Format:	Article
Language:	English
Published:	Elsevier 2021-01-01
Series:	Informatics in Medicine Unlocked
Subjects:	Feature selection Discretization Association rule mining Coexpression network
Online Access:	http://www.sciencedirect.com/science/article/pii/S235291482100085X

_version_	1819066615632756736
author	Consolata Gakii Richard Rimiru
author_facet	Consolata Gakii Richard Rimiru
author_sort	Consolata Gakii
collection	DOAJ
description	High throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function.
first_indexed	2024-12-21T16:05:11Z
format	Article
id	doaj.art-70b768d6a9d149fbbdad26ce846c1f34
institution	Directory Open Access Journal
issn	2352-9148
language	English
last_indexed	2024-12-21T16:05:11Z
publishDate	2021-01-01
publisher	Elsevier
record_format	Article
series	Informatics in Medicine Unlocked
spelling	doaj.art-70b768d6a9d149fbbdad26ce846c1f342022-12-21T18:57:54ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100595Identification of cancer related genes using feature selection and association rule miningConsolata Gakii0Richard Rimiru1School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya; Department of Mathematics, Computing and Information Technology, University of Embu, Embu, Kenya; Corresponding author. School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, Kenya.School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O Box 62000, Nairobi, KenyaHigh throughput sequencing generates large volumes of high dimensional data. Identifying informative features from the generated big data is always a challenge. Feature selection reduces complex data into a smaller number of variables while preserving the information as much as possible. In this study, we used DaMiRseq, DESeq2, edgeR and Limma + voom to identify differentially expressed genes in 79 small cell lung cancer (sclc) and 7 normal controls. A gene network was used to identify any coexpressed genes. Association rule mining was used to identify any association between connected genes in the network. Limma + voom identified the highest number of differentially expressed genes. However, 81 genes were common in the four differential gene expression analysis methods used. After filtering out all nodes with a degree less than 5, the final network had 43 nodes and 63 edges. Association rule mining on the coexpressed genes generated 263 rules. Genes that were common in the rules were: SLC34A2, CAV2, EPAS1, CTSH, AQP1 and LRRK2. These genes have been associated with various types of cancer. Therefore, feature selection using differential gene expression analysis, co-expression networks and association rule mining could help infer relationships among genes and their possibility of having a shared biological function.http://www.sciencedirect.com/science/article/pii/S235291482100085XFeature selectionDiscretizationAssociation rule miningCoexpression network
spellingShingle	Consolata Gakii Richard Rimiru Identification of cancer related genes using feature selection and association rule mining Informatics in Medicine Unlocked Feature selection Discretization Association rule mining Coexpression network
title	Identification of cancer related genes using feature selection and association rule mining
title_full	Identification of cancer related genes using feature selection and association rule mining
title_fullStr	Identification of cancer related genes using feature selection and association rule mining
title_full_unstemmed	Identification of cancer related genes using feature selection and association rule mining
title_short	Identification of cancer related genes using feature selection and association rule mining
title_sort	identification of cancer related genes using feature selection and association rule mining
topic	Feature selection Discretization Association rule mining Coexpression network
url	http://www.sciencedirect.com/science/article/pii/S235291482100085X
work_keys_str_mv	AT consolatagakii identificationofcancerrelatedgenesusingfeatureselectionandassociationrulemining AT richardrimiru identificationofcancerrelatedgenesusingfeatureselectionandassociationrulemining

Identification of cancer related genes using feature selection and association rule mining

Similar Items