Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]

Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person tr...

Full description

Bibliographic Details
Main Authors:	Khadija Said Mohammed, Nelson Kibinge, Pjotr Prins, Charles N. Agoti, Matthew Cotten, D.J. Nokes, Samuel Brand, George Githinji
Format:	Article
Language:	English
Published:	Wellcome 2018-09-01
Series:	Wellcome Open Research
Online Access:	https://wellcomeopenresearch.org/articles/3-21/v2

_version_	1818887340719865856
author	Khadija Said Mohammed Nelson Kibinge Pjotr Prins Charles N. Agoti Matthew Cotten D.J. Nokes Samuel Brand George Githinji
author_facet	Khadija Said Mohammed Nelson Kibinge Pjotr Prins Charles N. Agoti Matthew Cotten D.J. Nokes Samuel Brand George Githinji
author_sort	Khadija Said Mohammed
collection	DOAJ
description	Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers’ agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.
first_indexed	2024-12-19T16:35:41Z
format	Article
id	doaj.art-732f0cff20de4c108cb9d08e266e1439
institution	Directory Open Access Journal
issn	2398-502X
language	English
last_indexed	2024-12-19T16:35:41Z
publishDate	2018-09-01
publisher	Wellcome
record_format	Article
series	Wellcome Open Research
spelling	doaj.art-732f0cff20de4c108cb9d08e266e14392022-12-21T20:13:58ZengWellcomeWellcome Open Research2398-502X2018-09-01310.12688/wellcomeopenres.13538.216071Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]Khadija Said Mohammed0Nelson Kibinge1Pjotr Prins2Charles N. Agoti3Matthew Cotten4D.J. Nokes5Samuel Brand6George Githinji7Pwani University, Kilifi, KenyaKEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, KenyaKEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, KenyaPwani University, Kilifi, KenyaVirosciences Department, Erasmus Medical Centre, Rotterdam, The NetherlandsKEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, KenyaSchool of Life Sciences and Zeeman Institute (SBIDER), University of Warwick, Coventry, UKKEMRI-Wellcome Trust Research Programme, KEMRI Centre for Geographic Medicine Research – Coast, Kilifi, KenyaBackground: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers’ agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.https://wellcomeopenresearch.org/articles/3-21/v2
spellingShingle	Khadija Said Mohammed Nelson Kibinge Pjotr Prins Charles N. Agoti Matthew Cotten D.J. Nokes Samuel Brand George Githinji Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved] Wellcome Open Research
title	Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]
title_full	Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]
title_fullStr	Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]
title_full_unstemmed	Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]
title_short	Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]
title_sort	evaluating the performance of tools used to call minority variants from whole genome short read data version 2 referees 2 approved
url	https://wellcomeopenresearch.org/articles/3-21/v2
work_keys_str_mv	AT khadijasaidmohammed evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT nelsonkibinge evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT pjotrprins evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT charlesnagoti evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT matthewcotten evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT djnokes evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT samuelbrand evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved AT georgegithinji evaluatingtheperformanceoftoolsusedtocallminorityvariantsfromwholegenomeshortreaddataversion2referees2approved

Evaluating the performance of tools used to call minority variants from whole genome short-read data [version 2; referees: 2 approved]

Similar Items