Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to b...

Полное описание

Библиографические подробности
Главные авторы: Wymant, C, Blanquart, F, Golubchik, T, Gall, A, Bakker, M, Bezemer, D, Croucher, N, Hall, M, Hillebregt, M, Ong, S, Ratmann, O, Albert, J, Bannert, N, Fellay, J, Fransen, K, Gourlay, A, Grabowski, M, Gunsenheimer-Bartmeyer, B, Günthard, H, Kivelä, P, Kouyos, R, Laeyendecker, O, Liitsola, K, Meyer, L, Porter, K, Ristola, M, van Sighem, A, Berkhout, B, Cornelissen, M, Kellam, P, Reiss, P, Fraser, C
Формат: Journal article
Язык:English
Опубликовано: Oxford University Press 2018
_version_ 1826296926391238656
author Wymant, C
Blanquart, F
Golubchik, T
Gall, A
Bakker, M
Bezemer, D
Croucher, N
Hall, M
Hillebregt, M
Ong, S
Ratmann, O
Albert, J
Bannert, N
Fellay, J
Fransen, K
Gourlay, A
Grabowski, M
Gunsenheimer-Bartmeyer, B
Günthard, H
Kivelä, P
Kouyos, R
Laeyendecker, O
Liitsola, K
Meyer, L
Porter, K
Ristola, M
van Sighem, A
Berkhout, B
Cornelissen, M
Kellam, P
Reiss, P
Fraser, C
author_facet Wymant, C
Blanquart, F
Golubchik, T
Gall, A
Bakker, M
Bezemer, D
Croucher, N
Hall, M
Hillebregt, M
Ong, S
Ratmann, O
Albert, J
Bannert, N
Fellay, J
Fransen, K
Gourlay, A
Grabowski, M
Gunsenheimer-Bartmeyer, B
Günthard, H
Kivelä, P
Kouyos, R
Laeyendecker, O
Liitsola, K
Meyer, L
Porter, K
Ristola, M
van Sighem, A
Berkhout, B
Cornelissen, M
Kellam, P
Reiss, P
Fraser, C
author_sort Wymant, C
collection OXFORD
description Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
first_indexed 2024-03-07T04:23:50Z
format Journal article
id oxford-uuid:cbf24d6c-77c6-4565-b818-3ffd0732452d
institution University of Oxford
language English
last_indexed 2024-03-07T04:23:50Z
publishDate 2018
publisher Oxford University Press
record_format dspace
spelling oxford-uuid:cbf24d6c-77c6-4565-b818-3ffd0732452d2022-03-27T07:18:25ZEasy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiverJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:cbf24d6c-77c6-4565-b818-3ffd0732452dEnglishSymplectic Elements at OxfordOxford University Press2018Wymant, CBlanquart, FGolubchik, TGall, ABakker, MBezemer, DCroucher, NHall, MHillebregt, MOng, SRatmann, OAlbert, JBannert, NFellay, JFransen, KGourlay, AGrabowski, MGunsenheimer-Bartmeyer, BGünthard, HKivelä, PKouyos, RLaeyendecker, OLiitsola, KMeyer, LPorter, KRistola, Mvan Sighem, ABerkhout, BCornelissen, MKellam, PReiss, PFraser, CStudying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
spellingShingle Wymant, C
Blanquart, F
Golubchik, T
Gall, A
Bakker, M
Bezemer, D
Croucher, N
Hall, M
Hillebregt, M
Ong, S
Ratmann, O
Albert, J
Bannert, N
Fellay, J
Fransen, K
Gourlay, A
Grabowski, M
Gunsenheimer-Bartmeyer, B
Günthard, H
Kivelä, P
Kouyos, R
Laeyendecker, O
Liitsola, K
Meyer, L
Porter, K
Ristola, M
van Sighem, A
Berkhout, B
Cornelissen, M
Kellam, P
Reiss, P
Fraser, C
Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_full Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_fullStr Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_full_unstemmed Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_short Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver
title_sort easy and accurate reconstruction of whole hiv genomes from short read sequence data with shiver
work_keys_str_mv AT wymantc easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT blanquartf easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT golubchikt easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT galla easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT bakkerm easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT bezemerd easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT crouchern easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT hallm easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT hillebregtm easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT ongs easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT ratmanno easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT albertj easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT bannertn easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT fellayj easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT fransenk easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gourlaya easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT grabowskim easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gunsenheimerbartmeyerb easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT gunthardh easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT kivelap easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT kouyosr easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT laeyendeckero easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT liitsolak easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT meyerl easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT porterk easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT ristolam easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT vansighema easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT berkhoutb easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT cornelissenm easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT kellamp easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT reissp easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver
AT fraserc easyandaccuratereconstructionofwholehivgenomesfromshortreadsequencedatawithshiver