Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey
<p>Abstract</p> <p>Background</p> <p>The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced refere...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2009-10-01
|
Series: | BMC Genomics |
Online Access: | http://www.biomedcentral.com/1471-2164/10/479 |
_version_ | 1818111705317638144 |
---|---|
author | den Dunnen Johan T Chin-A-Woeng Thomas FC Dibbits Bert W Veenendaal Albertine Crooijmans Richard PMA Kerstens Hindrik HD Groenen Martien AM |
author_facet | den Dunnen Johan T Chin-A-Woeng Thomas FC Dibbits Bert W Veenendaal Albertine Crooijmans Richard PMA Kerstens Hindrik HD Groenen Martien AM |
author_sort | den Dunnen Johan T |
collection | DOAJ |
description | <p>Abstract</p> <p>Background</p> <p>The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled <it>Meleagris gallopavo </it>(turkey) individuals.</p> <p>Results</p> <p>A total of 100 million 36 bp reads were generated, representing approximately 5-6% (~62 Mbp) of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC) and observed minor allele frequency (MAF) for the validated SNPs was 0.69.</p> <p>Conclusion</p> <p>We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even distribution of SNPs across the targeted genome.</p> |
first_indexed | 2024-12-11T03:07:17Z |
format | Article |
id | doaj.art-a232bf8900454b42b1c85788fcd0e99a |
institution | Directory Open Access Journal |
issn | 1471-2164 |
language | English |
last_indexed | 2024-12-11T03:07:17Z |
publishDate | 2009-10-01 |
publisher | BMC |
record_format | Article |
series | BMC Genomics |
spelling | doaj.art-a232bf8900454b42b1c85788fcd0e99a2022-12-22T01:22:56ZengBMCBMC Genomics1471-21642009-10-0110147910.1186/1471-2164-10-479Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkeyden Dunnen Johan TChin-A-Woeng Thomas FCDibbits Bert WVeenendaal AlbertineCrooijmans Richard PMAKerstens Hindrik HDGroenen Martien AM<p>Abstract</p> <p>Background</p> <p>The development of second generation sequencing methods has enabled large scale DNA variation studies at moderate cost. For the high throughput discovery of single nucleotide polymorphisms (SNPs) in species lacking a sequenced reference genome, we set-up an analysis pipeline based on a short read de novo sequence assembler and a program designed to identify variation within short reads. To illustrate the potential of this technique, we present the results obtained with a randomly sheared, enzymatically generated, 2-3 kbp genome fraction of six pooled <it>Meleagris gallopavo </it>(turkey) individuals.</p> <p>Results</p> <p>A total of 100 million 36 bp reads were generated, representing approximately 5-6% (~62 Mbp) of the turkey genome, with an estimated sequence depth of 58. Reads consisting of bases called with less than 1% error probability were selected and assembled into contigs. Subsequently, high throughput discovery of nucleotide variation was performed using sequences with more than 90% reliability by using the assembled contigs that were 50 bp or longer as the reference sequence. We identified more than 7,500 SNPs with a high probability of representing true nucleotide variation in turkeys. Increasing the reference genome by adding publicly available turkey BAC-end sequences increased the number of SNPs to over 11,000. A comparison with the sequenced chicken genome indicated that the assembled turkey contigs were distributed uniformly across the turkey genome. Genotyping of a representative sample of 340 SNPs resulted in a SNP conversion rate of 95%. The correlation of the minor allele count (MAC) and observed minor allele frequency (MAF) for the validated SNPs was 0.69.</p> <p>Conclusion</p> <p>We provide an efficient and cost-effective approach for the identification of thousands of high quality SNPs in species currently lacking a sequenced genome and applied this to turkey. The methodology addresses a random fraction of the genome, resulting in an even distribution of SNPs across the targeted genome.</p>http://www.biomedcentral.com/1471-2164/10/479 |
spellingShingle | den Dunnen Johan T Chin-A-Woeng Thomas FC Dibbits Bert W Veenendaal Albertine Crooijmans Richard PMA Kerstens Hindrik HD Groenen Martien AM Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey BMC Genomics |
title | Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey |
title_full | Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey |
title_fullStr | Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey |
title_full_unstemmed | Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey |
title_short | Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey |
title_sort | large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology applied to turkey |
url | http://www.biomedcentral.com/1471-2164/10/479 |
work_keys_str_mv | AT dendunnenjohant largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey AT chinawoengthomasfc largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey AT dibbitsbertw largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey AT veenendaalalbertine largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey AT crooijmansrichardpma largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey AT kerstenshindrikhd largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey AT groenenmartienam largescalesinglenucleotidepolymorphismdiscoveryinunsequencedgenomesusingsecondgenerationhighthroughputsequencingtechnologyappliedtoturkey |