A novel hierarchical clustering algorithm for gene sequences

<p>Abstract</p> <p>Background</p> <p>Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms D...

Full description

Bibliographic Details
Main Authors: Wei Dan, Jiang Qingshan, Wei Yanjie, Wang Shengrui
Format: Article
Language:English
Published: BMC 2012-07-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/13/174
_version_ 1819115181168394240
author Wei Dan
Jiang Qingshan
Wei Yanjie
Wang Shengrui
author_facet Wei Dan
Jiang Qingshan
Wei Yanjie
Wang Shengrui
author_sort Wei Dan
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of <it>k</it>-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors.</p> <p>Results</p> <p>The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences.</p> <p>Conclusions</p> <p>We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.</p>
first_indexed 2024-12-22T04:57:06Z
format Article
id doaj.art-51778ac00ffd4d6bb0fb4c31178c2f86
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-22T04:57:06Z
publishDate 2012-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-51778ac00ffd4d6bb0fb4c31178c2f862022-12-21T18:38:20ZengBMCBMC Bioinformatics1471-21052012-07-0113117410.1186/1471-2105-13-174A novel hierarchical clustering algorithm for gene sequencesWei DanJiang QingshanWei YanjieWang Shengrui<p>Abstract</p> <p>Background</p> <p>Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of <it>k</it>-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors.</p> <p>Results</p> <p>The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust, CD-HIT-EST and some others. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences.</p> <p>Conclusions</p> <p>We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.</p>http://www.biomedcentral.com/1471-2105/13/174
spellingShingle Wei Dan
Jiang Qingshan
Wei Yanjie
Wang Shengrui
A novel hierarchical clustering algorithm for gene sequences
BMC Bioinformatics
title A novel hierarchical clustering algorithm for gene sequences
title_full A novel hierarchical clustering algorithm for gene sequences
title_fullStr A novel hierarchical clustering algorithm for gene sequences
title_full_unstemmed A novel hierarchical clustering algorithm for gene sequences
title_short A novel hierarchical clustering algorithm for gene sequences
title_sort novel hierarchical clustering algorithm for gene sequences
url http://www.biomedcentral.com/1471-2105/13/174
work_keys_str_mv AT weidan anovelhierarchicalclusteringalgorithmforgenesequences
AT jiangqingshan anovelhierarchicalclusteringalgorithmforgenesequences
AT weiyanjie anovelhierarchicalclusteringalgorithmforgenesequences
AT wangshengrui anovelhierarchicalclusteringalgorithmforgenesequences
AT weidan novelhierarchicalclusteringalgorithmforgenesequences
AT jiangqingshan novelhierarchicalclusteringalgorithmforgenesequences
AT weiyanjie novelhierarchicalclusteringalgorithmforgenesequences
AT wangshengrui novelhierarchicalclusteringalgorithmforgenesequences