Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here f...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Oxford University Press
2020-05-01
|
Series: | G3: Genes, Genomes, Genetics |
Subjects: | |
Online Access: | http://g3journal.org/lookup/doi/10.1534/g3.119.400959 |
_version_ | 1818618650164199424 |
---|---|
author | Michael J. Bronski Ciera C. Martinez Holli A. Weld Michael B. Eisen |
author_facet | Michael J. Bronski Ciera C. Martinez Holli A. Weld Michael B. Eisen |
author_sort | Michael J. Bronski |
collection | DOAJ |
description | Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5–15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations. |
first_indexed | 2024-12-16T17:24:57Z |
format | Article |
id | doaj.art-eee85273c3dd42209b1c563c89b33a04 |
institution | Directory Open Access Journal |
issn | 2160-1836 |
language | English |
last_indexed | 2024-12-16T17:24:57Z |
publishDate | 2020-05-01 |
publisher | Oxford University Press |
record_format | Article |
series | G3: Genes, Genomes, Genetics |
spelling | doaj.art-eee85273c3dd42209b1c563c89b33a042022-12-21T22:23:05ZengOxford University PressG3: Genes, Genomes, Genetics2160-18362020-05-011051443145510.1534/g3.119.4009591Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary HypothesesMichael J. BronskiCiera C. MartinezHolli A. WeldMichael B. EisenLarge groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium species group, 22 of which are presented here for the first time. The montium group is well-positioned for clade genomics. Within the montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the montium group and D. melanogaster is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the montium group range from 155 Mb to 223 Mb (mean = 196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min = 18 kb, max = 390 kb, mean = 74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5–15%. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96% of known single-copy Dipteran genes (BUSCOs, n = 2,799). Similarly, by aligning our assemblies to the D. melanogaster genome and remapping coordinates for a large set of transcriptional enhancers (n = 3,457), we showed that each montium assembly contains orthologs for at least 91% of D. melanogaster enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous Drosophila assemblies. The alignment of our own D. serrata assembly to a previously published PacBio D. serrata assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the montium group phylogeny; study the evolution of protein-coding genes and cis-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.http://g3journal.org/lookup/doi/10.1534/g3.119.400959drosophilamontiumgenomeassembly |
spellingShingle | Michael J. Bronski Ciera C. Martinez Holli A. Weld Michael B. Eisen Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses G3: Genes, Genomes, Genetics drosophila montium genome assembly |
title | Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses |
title_full | Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses |
title_fullStr | Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses |
title_full_unstemmed | Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses |
title_short | Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses |
title_sort | whole genome sequences of 23 species from the drosophila montium species group diptera drosophilidae a resource for testing evolutionary hypotheses |
topic | drosophila montium genome assembly |
url | http://g3journal.org/lookup/doi/10.1534/g3.119.400959 |
work_keys_str_mv | AT michaeljbronski wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses AT cieracmartinez wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses AT holliaweld wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses AT michaelbeisen wholegenomesequencesof23speciesfromthedrosophilamontiumspeciesgroupdipteradrosophilidaearesourcefortestingevolutionaryhypotheses |