<it>WordCluster</it>: detecting clusters of DNA words and genomic elements

<p>Abstract</p> <p>Background</p> <p>Many <it>k-</it>mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding...

Full description

Bibliographic Details
Main Authors: Oliver José L, Alganza Ángel M, Bernaola-Galván Pedro, Barturen Guillermo, Carpena Pedro, Hackenberg Michael
Format: Article
Language:English
Published: BMC 2011-01-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/6/1/2
_version_ 1818210106240663552
author Oliver José L
Alganza Ángel M
Bernaola-Galván Pedro
Barturen Guillermo
Carpena Pedro
Hackenberg Michael
author_facet Oliver José L
Alganza Ángel M
Bernaola-Galván Pedro
Barturen Guillermo
Carpena Pedro
Hackenberg Michael
author_sort Oliver José L
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Many <it>k-</it>mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds.</p> <p>Results</p> <p>We introduce here an algorithm to detect clusters of DNA words (<it>k-</it>mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used <it>WordCluster </it>to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome.</p> <p>Conclusions</p> <p><it>WordCluster </it>seems to predict biological meaningful clusters of DNA words (<it>k-</it>mers) and genomic entities. The implementation of the method into a web server is available at <url>http://bioinfo2.ugr.es/wordCluster/wordCluster.php</url> including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.</p>
first_indexed 2024-12-12T05:11:20Z
format Article
id doaj.art-f60c5491426c4bf2b5d70fedadae2138
institution Directory Open Access Journal
issn 1748-7188
language English
last_indexed 2024-12-12T05:11:20Z
publishDate 2011-01-01
publisher BMC
record_format Article
series Algorithms for Molecular Biology
spelling doaj.art-f60c5491426c4bf2b5d70fedadae21382022-12-22T00:36:54ZengBMCAlgorithms for Molecular Biology1748-71882011-01-0161210.1186/1748-7188-6-2<it>WordCluster</it>: detecting clusters of DNA words and genomic elementsOliver José LAlganza Ángel MBernaola-Galván PedroBarturen GuillermoCarpena PedroHackenberg Michael<p>Abstract</p> <p>Background</p> <p>Many <it>k-</it>mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds.</p> <p>Results</p> <p>We introduce here an algorithm to detect clusters of DNA words (<it>k-</it>mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used <it>WordCluster </it>to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome.</p> <p>Conclusions</p> <p><it>WordCluster </it>seems to predict biological meaningful clusters of DNA words (<it>k-</it>mers) and genomic entities. The implementation of the method into a web server is available at <url>http://bioinfo2.ugr.es/wordCluster/wordCluster.php</url> including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.</p>http://www.almob.org/content/6/1/2
spellingShingle Oliver José L
Alganza Ángel M
Bernaola-Galván Pedro
Barturen Guillermo
Carpena Pedro
Hackenberg Michael
<it>WordCluster</it>: detecting clusters of DNA words and genomic elements
Algorithms for Molecular Biology
title <it>WordCluster</it>: detecting clusters of DNA words and genomic elements
title_full <it>WordCluster</it>: detecting clusters of DNA words and genomic elements
title_fullStr <it>WordCluster</it>: detecting clusters of DNA words and genomic elements
title_full_unstemmed <it>WordCluster</it>: detecting clusters of DNA words and genomic elements
title_short <it>WordCluster</it>: detecting clusters of DNA words and genomic elements
title_sort it wordcluster it detecting clusters of dna words and genomic elements
url http://www.almob.org/content/6/1/2
work_keys_str_mv AT oliverjosel itwordclusteritdetectingclustersofdnawordsandgenomicelements
AT alganzaangelm itwordclusteritdetectingclustersofdnawordsandgenomicelements
AT bernaolagalvanpedro itwordclusteritdetectingclustersofdnawordsandgenomicelements
AT barturenguillermo itwordclusteritdetectingclustersofdnawordsandgenomicelements
AT carpenapedro itwordclusteritdetectingclustersofdnawordsandgenomicelements
AT hackenbergmichael itwordclusteritdetectingclustersofdnawordsandgenomicelements