A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences

Introduction: In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. Methods: In this research, cancer data are analyzed based on DNA sequences. The...

Full description

Bibliographic Details
Main Authors: Amin Khodaei, Mohammad-Reza Feizi-Derakhshi, Behzad Mozaffari-Tazehkand
Format: Article
Language:English
Published: Tabriz University of Medical Sciences 2021-03-01
Series:BioImpacts
Subjects:
Online Access:https://bi.tbzmed.ac.ir/PDF/bi-11-87.pdf
_version_ 1818926019481960448
author Amin Khodaei
Mohammad-Reza Feizi-Derakhshi
Behzad Mozaffari-Tazehkand
author_facet Amin Khodaei
Mohammad-Reza Feizi-Derakhshi
Behzad Mozaffari-Tazehkand
author_sort Amin Khodaei
collection DOAJ
description Introduction: In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. Methods: In this research, cancer data are analyzed based on DNA sequences. The transition probability of occurring two pairs of nucleotides in DNA sequences has Markovian property. This property inspires the idea of feature dimension reduction of DNA sequence for overcoming the high computational overhead of genes analysis. This idea is utilized in this research based on the Markovian property of DNA sequences. This mapping decreases feature dimensions and conserves basic properties for discrimination of cancerous and non-cancerous genes. Results: The results showed that a non-linear support vector machine (SVM) classifier with RBF and polynomial kernel functions can discriminate selected cancerous samples from non-cancerous ones. Experimental results based on the 10-fold cross-validation and accuracy metrics verified that the proposed method has low computational overhead and high accuracy. Conclusion: The proposed algorithm was successfully tested on related research case studies. In general, a combination of proposed Markovian-based feature reduction and non-linear SVM classifier can be considered as one of the best methods for discrimination of cancerous and non-cancerous genes.
first_indexed 2024-12-20T02:50:28Z
format Article
id doaj.art-47ac149be6434ba79cc72e4714e59ead
institution Directory Open Access Journal
issn 2228-5660
2228-5652
language English
last_indexed 2024-12-20T02:50:28Z
publishDate 2021-03-01
publisher Tabriz University of Medical Sciences
record_format Article
series BioImpacts
spelling doaj.art-47ac149be6434ba79cc72e4714e59ead2022-12-21T19:56:03ZengTabriz University of Medical SciencesBioImpacts2228-56602228-56522021-03-01112879910.34172/bi.2021.16bi-21850A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequencesAmin Khodaei0Mohammad-Reza Feizi-Derakhshi1Behzad Mozaffari-Tazehkand2Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranFaculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranFaculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranIntroduction: In recent decades, the growing rate of cancer incidence is a big concern for most societies. Due to the genetic origins of cancer disease, its internal structure is necessary for the study of this disease. Methods: In this research, cancer data are analyzed based on DNA sequences. The transition probability of occurring two pairs of nucleotides in DNA sequences has Markovian property. This property inspires the idea of feature dimension reduction of DNA sequence for overcoming the high computational overhead of genes analysis. This idea is utilized in this research based on the Markovian property of DNA sequences. This mapping decreases feature dimensions and conserves basic properties for discrimination of cancerous and non-cancerous genes. Results: The results showed that a non-linear support vector machine (SVM) classifier with RBF and polynomial kernel functions can discriminate selected cancerous samples from non-cancerous ones. Experimental results based on the 10-fold cross-validation and accuracy metrics verified that the proposed method has low computational overhead and high accuracy. Conclusion: The proposed algorithm was successfully tested on related research case studies. In general, a combination of proposed Markovian-based feature reduction and non-linear SVM classifier can be considered as one of the best methods for discrimination of cancerous and non-cancerous genes.https://bi.tbzmed.ac.ir/PDF/bi-11-87.pdfdna sequencecancerclassificationmarkov chainsupport vector machine
spellingShingle Amin Khodaei
Mohammad-Reza Feizi-Derakhshi
Behzad Mozaffari-Tazehkand
A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
BioImpacts
dna sequence
cancer
classification
markov chain
support vector machine
title A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_full A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_fullStr A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_full_unstemmed A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_short A Markov chain-based feature extraction method for classification and identification of cancerous DNA sequences
title_sort markov chain based feature extraction method for classification and identification of cancerous dna sequences
topic dna sequence
cancer
classification
markov chain
support vector machine
url https://bi.tbzmed.ac.ir/PDF/bi-11-87.pdf
work_keys_str_mv AT aminkhodaei amarkovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT mohammadrezafeiziderakhshi amarkovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT behzadmozaffaritazehkand amarkovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT aminkhodaei markovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT mohammadrezafeiziderakhshi markovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences
AT behzadmozaffaritazehkand markovchainbasedfeatureextractionmethodforclassificationandidentificationofcancerousdnasequences