Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly

<p>Abstract</p> <p>Background</p> <p>Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (<it>Glycine max</it>) genome has been sh...

Full description

Bibliographic Details
Main Authors: Shultz Jeffry, Saini Navinder, Lightfoot David A
Format: Article
Language:English
Published: BMC 2008-07-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/9/323
_version_ 1811294539114086400
author Shultz Jeffry
Saini Navinder
Lightfoot David A
author_facet Shultz Jeffry
Saini Navinder
Lightfoot David A
author_sort Shultz Jeffry
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (<it>Glycine max</it>) genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS). Here the aim was to use BAC end sequences (BES) derived from three minimum tile paths (MTP) to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches.</p> <p>Results</p> <p>Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs) were single nucleotide polymorphisms (SNPs; 89%) and single nucleotide indels (SNIs 10%). Larger indels were rare but present (1%). Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed.</p> <p>Conclusion</p> <p>The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de-convolution and positioning of sequence scaffolds (see BES_scaffolds section of SoyGD). This approach will assist genome annotation for paleopolyploid and true polyploid genomes such as soybean and many important cereal and fruit crops.</p>
first_indexed 2024-04-13T05:18:10Z
format Article
id doaj.art-1d634bf793c8448fa2282022f8260d54
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-04-13T05:18:10Z
publishDate 2008-07-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-1d634bf793c8448fa2282022f8260d542022-12-22T03:00:49ZengBMCBMC Genomics1471-21642008-07-019132310.1186/1471-2164-9-323Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assemblyShultz JeffrySaini NavinderLightfoot David A<p>Abstract</p> <p>Background</p> <p>Many of the world's most important food crops have either polyploid genomes or homeologous regions derived from segmental shuffling following polyploid formation. The soybean (<it>Glycine max</it>) genome has been shown to be composed of approximately four thousand short interspersed homeologous regions with 1, 2 or 4 copies per haploid genome by RFLP analysis, microsatellite anchors to BACs and by contigs formed from BAC fingerprints. Despite these similar regions,, the genome has been sequenced by whole genome shotgun sequence (WGS). Here the aim was to use BAC end sequences (BES) derived from three minimum tile paths (MTP) to examine the extent and homogeneity of polyploid-like regions within contigs and the extent of correlation between the polyploid-like regions inferred from fingerprinting and the polyploid-like sequences inferred from WGS matches.</p> <p>Results</p> <p>Results show that when sequence divergence was 1–10%, the copy number of homeologous regions could be identified from sequence variation in WGS reads overlapping BES. Homeolog sequence variants (HSVs) were single nucleotide polymorphisms (SNPs; 89%) and single nucleotide indels (SNIs 10%). Larger indels were rare but present (1%). Simulations that had predicted fingerprints of homeologous regions could be separated when divergence exceeded 2% were shown to be false. We show that a 5–10% sequence divergence is necessary to separate homeologs by fingerprinting. BES compared to WGS traces showed polyploid-like regions with less than 1% sequence divergence exist at 2.3% of the locations assayed.</p> <p>Conclusion</p> <p>The use of HSVs like SNPs and SNIs to characterize BACs wil improve contig building methods. The implications for bioinformatic and functional annotation of polyploid and paleopolyploid genomes show that a combined approach of BAC fingerprint based physical maps, WGS sequence and HSV-based partitioning of BAC clones from homeologous regions to separate contigs will allow reliable de-convolution and positioning of sequence scaffolds (see BES_scaffolds section of SoyGD). This approach will assist genome annotation for paleopolyploid and true polyploid genomes such as soybean and many important cereal and fruit crops.</p>http://www.biomedcentral.com/1471-2164/9/323
spellingShingle Shultz Jeffry
Saini Navinder
Lightfoot David A
Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly
BMC Genomics
title Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly
title_full Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly
title_fullStr Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly
title_full_unstemmed Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly
title_short Re-annotation of the physical map of <it>Glycine max </it>for polyploid-like regions by BAC end sequence driven whole genome shotgun read assembly
title_sort re annotation of the physical map of it glycine max it for polyploid like regions by bac end sequence driven whole genome shotgun read assembly
url http://www.biomedcentral.com/1471-2164/9/323
work_keys_str_mv AT shultzjeffry reannotationofthephysicalmapofitglycinemaxitforpolyploidlikeregionsbybacendsequencedrivenwholegenomeshotgunreadassembly
AT saininavinder reannotationofthephysicalmapofitglycinemaxitforpolyploidlikeregionsbybacendsequencedrivenwholegenomeshotgunreadassembly
AT lightfootdavida reannotationofthephysicalmapofitglycinemaxitforpolyploidlikeregionsbybacendsequencedrivenwholegenomeshotgunreadassembly