Incremental genetic K-means algorithm and its application in gene expression data analysis

Abstract Background In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partiti...

Full description

Bibliographic Details
Main Authors:	Deng Youping, Fotouhi Farshad, Lu Shiyong, Lu Yi, Brown Susan J
Format:	Article
Language:	English
Published:	BMC 2004-10-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/5/172

_version_	1828529902628896768
author	Deng Youping Fotouhi Farshad Lu Shiyong Lu Yi Brown Susan J
author_facet	Deng Youping Fotouhi Farshad Lu Shiyong Lu Yi Brown Susan J
author_sort	Deng Youping
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data.</p> <p>Results</p> <p>In this paper, we propose a new clustering algorithm, <it>Incremental Genetic K-means Algorithm (IGKA)</it>. IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (<it>FGKA</it>). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at <url>http://database.cs.wayne.edu/proj/FGKA/index.htm.</url></p> <p>Conclusions</p> <p>Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster.</p>
first_indexed	2024-12-11T22:15:24Z
format	Article
id	doaj.art-68a2ceafb5e5424886310c09e05b2b57
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-11T22:15:24Z
publishDate	2004-10-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-68a2ceafb5e5424886310c09e05b2b572022-12-22T00:48:37ZengBMCBMC Bioinformatics1471-21052004-10-015117210.1186/1471-2105-5-172Incremental genetic K-means algorithm and its application in gene expression data analysisDeng YoupingFotouhi FarshadLu ShiyongLu YiBrown Susan J<p>Abstract</p> <p>Background</p> <p>In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data.</p> <p>Results</p> <p>In this paper, we propose a new clustering algorithm, <it>Incremental Genetic K-means Algorithm (IGKA)</it>. IGKA is an extension to our previously proposed clustering algorithm, the Fast Genetic K-means Algorithm (<it>FGKA</it>). IGKA outperforms FGKA when the mutation probability is small. The main idea of IGKA is to calculate the objective value Total Within-Cluster Variation (TWCV) and to cluster centroids incrementally whenever the mutation probability is small. IGKA inherits the salient feature of FGKA of always converging to the global optimum. C program is freely available at <url>http://database.cs.wayne.edu/proj/FGKA/index.htm.</url></p> <p>Conclusions</p> <p>Our experiments indicate that, while the IGKA algorithm has a convergence pattern similar to FGKA, it has a better time performance when the mutation probability decreases to some point. Finally, we used IGKA to cluster a yeast dataset and found that it increased the enrichment of genes of similar function within the cluster.</p>http://www.biomedcentral.com/1471-2105/5/172
spellingShingle	Deng Youping Fotouhi Farshad Lu Shiyong Lu Yi Brown Susan J Incremental genetic K-means algorithm and its application in gene expression data analysis BMC Bioinformatics
title	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_full	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_fullStr	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_full_unstemmed	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_short	Incremental genetic K-means algorithm and its application in gene expression data analysis
title_sort	incremental genetic k means algorithm and its application in gene expression data analysis
url	http://www.biomedcentral.com/1471-2105/5/172
work_keys_str_mv	AT dengyouping incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT fotouhifarshad incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT lushiyong incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT luyi incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis AT brownsusanj incrementalgenetickmeansalgorithmanditsapplicationingeneexpressiondataanalysis

Incremental genetic K-means algorithm and its application in gene expression data analysis

Similar Items