Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology.
High throughput sequencing technologies are revolutionizing genetic research. With this "rise of the machines", genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored...
Main Authors: | , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2012-01-01
|
Series: | PLoS ONE |
Online Access: | https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23185309/?tool=EBI |
_version_ | 1818405368429019136 |
---|---|
author | Florian Leese Philipp Brand Andrey Rozenberg Christoph Mayer Shobhit Agrawal Johannes Dambach Lars Dietz Jana S Doemel William P Goodall-Copstake Christoph Held Jennifer A Jackson Kathrin P Lampert Katrin Linse Jan N Macher Jennifer Nolzen Michael J Raupach Nicole T Rivera Christoph D Schubart Sebastian Striewski Ralph Tollrian Chester J Sands |
author_facet | Florian Leese Philipp Brand Andrey Rozenberg Christoph Mayer Shobhit Agrawal Johannes Dambach Lars Dietz Jana S Doemel William P Goodall-Copstake Christoph Held Jennifer A Jackson Kathrin P Lampert Katrin Linse Jan N Macher Jennifer Nolzen Michael J Raupach Nicole T Rivera Christoph D Schubart Sebastian Striewski Ralph Tollrian Chester J Sands |
author_sort | Florian Leese |
collection | DOAJ |
description | High throughput sequencing technologies are revolutionizing genetic research. With this "rise of the machines", genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02-25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. |
first_indexed | 2024-12-14T08:54:56Z |
format | Article |
id | doaj.art-e4fbd7d257b44d0395e95b9e28f54023 |
institution | Directory Open Access Journal |
issn | 1932-6203 |
language | English |
last_indexed | 2024-12-14T08:54:56Z |
publishDate | 2012-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj.art-e4fbd7d257b44d0395e95b9e28f540232022-12-21T23:08:57ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-01711e4920210.1371/journal.pone.0049202Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology.Florian LeesePhilipp BrandAndrey RozenbergChristoph MayerShobhit AgrawalJohannes DambachLars DietzJana S DoemelWilliam P Goodall-CopstakeChristoph HeldJennifer A JacksonKathrin P LampertKatrin LinseJan N MacherJennifer NolzenMichael J RaupachNicole T RiveraChristoph D SchubartSebastian StriewskiRalph TollrianChester J SandsHigh throughput sequencing technologies are revolutionizing genetic research. With this "rise of the machines", genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02-25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23185309/?tool=EBI |
spellingShingle | Florian Leese Philipp Brand Andrey Rozenberg Christoph Mayer Shobhit Agrawal Johannes Dambach Lars Dietz Jana S Doemel William P Goodall-Copstake Christoph Held Jennifer A Jackson Kathrin P Lampert Katrin Linse Jan N Macher Jennifer Nolzen Michael J Raupach Nicole T Rivera Christoph D Schubart Sebastian Striewski Ralph Tollrian Chester J Sands Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology. PLoS ONE |
title | Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology. |
title_full | Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology. |
title_fullStr | Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology. |
title_full_unstemmed | Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology. |
title_short | Exploring Pandora's box: potential and pitfalls of low coverage genome surveys for evolutionary biology. |
title_sort | exploring pandora s box potential and pitfalls of low coverage genome surveys for evolutionary biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23185309/?tool=EBI |
work_keys_str_mv | AT florianleese exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT philippbrand exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT andreyrozenberg exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT christophmayer exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT shobhitagrawal exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT johannesdambach exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT larsdietz exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT janasdoemel exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT williampgoodallcopstake exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT christophheld exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT jenniferajackson exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT kathrinplampert exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT katrinlinse exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT jannmacher exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT jennifernolzen exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT michaeljraupach exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT nicoletrivera exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT christophdschubart exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT sebastianstriewski exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT ralphtollrian exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology AT chesterjsands exploringpandorasboxpotentialandpitfallsoflowcoveragegenomesurveysforevolutionarybiology |