Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

Abstract Background The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive element...

Full description

Bibliographic Details
Main Authors:	Macas Jiří, Neumann Pavel, Novák Petr
Format:	Article
Language:	English
Published:	BMC 2010-07-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/11/378

_version_	1818118852862541824
author	Macas Jiří Neumann Pavel Novák Petr
author_facet	Macas Jiří Neumann Pavel Novák Petr
author_sort	Macas Jiří
collection	DOAJ
description	<p>Abstract</p> <p>Background</p> <p>The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization.</p> <p>Results</p> <p>We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, <it>Pisum sativum </it>and <it>Glycine max</it>, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, <it>SeqGrapheR</it>, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families.</p> <p>Conclusions</p> <p>Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.</p>
first_indexed	2024-12-11T05:00:54Z
format	Article
id	doaj.art-f7f91fe758264f32bec78960b230465a
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-11T05:00:54Z
publishDate	2010-07-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-f7f91fe758264f32bec78960b230465a2022-12-22T01:20:10ZengBMCBMC Bioinformatics1471-21052010-07-0111137810.1186/1471-2105-11-378Graph-based clustering and characterization of repetitive sequences in next-generation sequencing dataMacas JiříNeumann PavelNovák Petr<p>Abstract</p> <p>Background</p> <p>The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization.</p> <p>Results</p> <p>We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, <it>Pisum sativum </it>and <it>Glycine max</it>, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, <it>SeqGrapheR</it>, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families.</p> <p>Conclusions</p> <p>Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.</p>http://www.biomedcentral.com/1471-2105/11/378
spellingShingle	Macas Jiří Neumann Pavel Novák Petr Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data BMC Bioinformatics
title	Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
title_full	Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
title_fullStr	Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
title_full_unstemmed	Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
title_short	Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data
title_sort	graph based clustering and characterization of repetitive sequences in next generation sequencing data
url	http://www.biomedcentral.com/1471-2105/11/378
work_keys_str_mv	AT macasjiri graphbasedclusteringandcharacterizationofrepetitivesequencesinnextgenerationsequencingdata AT neumannpavel graphbasedclusteringandcharacterizationofrepetitivesequencesinnextgenerationsequencingdata AT novakpetr graphbasedclusteringandcharacterizationofrepetitivesequencesinnextgenerationsequencingdata

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

Similar Items