SAUTE: sequence assembly using target enrichment

Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require findin...

Full description

Bibliographic Details
Main Authors: Alexandre Souvorov, Richa Agarwala
Format: Article
Language:English
Published: BMC 2021-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-021-04174-9
_version_ 1818457115416592384
author Alexandre Souvorov
Richa Agarwala
author_facet Alexandre Souvorov
Richa Agarwala
author_sort Alexandre Souvorov
collection DOAJ
description Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision.
first_indexed 2024-12-14T22:37:26Z
format Article
id doaj.art-d4fc765b390d448e86692a7e07a65688
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-14T22:37:26Z
publishDate 2021-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-d4fc765b390d448e86692a7e07a656882022-12-21T22:45:05ZengBMCBMC Bioinformatics1471-21052021-07-0122112210.1186/s12859-021-04174-9SAUTE: sequence assembly using target enrichmentAlexandre Souvorov0Richa Agarwala1NCBI/NLM/NIH/DHHSNCBI/NLM/NIH/DHHSAbstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision.https://doi.org/10.1186/s12859-021-04174-9Illumina readsDe-novo assemblyde Bruijn graphsAntimicrobial resistanceRNA-seq
spellingShingle Alexandre Souvorov
Richa Agarwala
SAUTE: sequence assembly using target enrichment
BMC Bioinformatics
Illumina reads
De-novo assembly
de Bruijn graphs
Antimicrobial resistance
RNA-seq
title SAUTE: sequence assembly using target enrichment
title_full SAUTE: sequence assembly using target enrichment
title_fullStr SAUTE: sequence assembly using target enrichment
title_full_unstemmed SAUTE: sequence assembly using target enrichment
title_short SAUTE: sequence assembly using target enrichment
title_sort saute sequence assembly using target enrichment
topic Illumina reads
De-novo assembly
de Bruijn graphs
Antimicrobial resistance
RNA-seq
url https://doi.org/10.1186/s12859-021-04174-9
work_keys_str_mv AT alexandresouvorov sautesequenceassemblyusingtargetenrichment
AT richaagarwala sautesequenceassemblyusingtargetenrichment