SAUTE: sequence assembly using target enrichment
Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require findin...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-07-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-021-04174-9 |
_version_ | 1818457115416592384 |
---|---|
author | Alexandre Souvorov Richa Agarwala |
author_facet | Alexandre Souvorov Richa Agarwala |
author_sort | Alexandre Souvorov |
collection | DOAJ |
description | Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision. |
first_indexed | 2024-12-14T22:37:26Z |
format | Article |
id | doaj.art-d4fc765b390d448e86692a7e07a65688 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-12-14T22:37:26Z |
publishDate | 2021-07-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-d4fc765b390d448e86692a7e07a656882022-12-21T22:45:05ZengBMCBMC Bioinformatics1471-21052021-07-0122112210.1186/s12859-021-04174-9SAUTE: sequence assembly using target enrichmentAlexandre Souvorov0Richa Agarwala1NCBI/NLM/NIH/DHHSNCBI/NLM/NIH/DHHSAbstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision.https://doi.org/10.1186/s12859-021-04174-9Illumina readsDe-novo assemblyde Bruijn graphsAntimicrobial resistanceRNA-seq |
spellingShingle | Alexandre Souvorov Richa Agarwala SAUTE: sequence assembly using target enrichment BMC Bioinformatics Illumina reads De-novo assembly de Bruijn graphs Antimicrobial resistance RNA-seq |
title | SAUTE: sequence assembly using target enrichment |
title_full | SAUTE: sequence assembly using target enrichment |
title_fullStr | SAUTE: sequence assembly using target enrichment |
title_full_unstemmed | SAUTE: sequence assembly using target enrichment |
title_short | SAUTE: sequence assembly using target enrichment |
title_sort | saute sequence assembly using target enrichment |
topic | Illumina reads De-novo assembly de Bruijn graphs Antimicrobial resistance RNA-seq |
url | https://doi.org/10.1186/s12859-021-04174-9 |
work_keys_str_mv | AT alexandresouvorov sautesequenceassemblyusingtargetenrichment AT richaagarwala sautesequenceassemblyusingtargetenrichment |