Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations

Abstract Background In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from...

Full description

Bibliographic Details
Main Authors:	Bart J. G. Broeckx, Luc Peelman, Jimmy H. Saunders, Dieter Deforce, Lieven Clement
Format:	Article
Language:	English
Published:	BMC 2017-12-01
Series:	BMC Bioinformatics
Subjects:	1000 Genomes project variant database Allele frequency dbSNP HapMap Variant filtering Variant database
Online Access:	http://link.springer.com/article/10.1186/s12859-017-1951-y

_version_	1819145866841161728
author	Bart J. G. Broeckx Luc Peelman Jimmy H. Saunders Dieter Deforce Lieven Clement
author_facet	Bart J. G. Broeckx Luc Peelman Jimmy H. Saunders Dieter Deforce Lieven Clement
author_sort	Bart J. G. Broeckx
collection	DOAJ
description	Abstract Background In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. Results Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. Conclusions We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided.
first_indexed	2024-12-22T13:04:50Z
format	Article
id	doaj.art-8a643cf1f73545c8a803ecfe88746328
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-12-22T13:04:50Z
publishDate	2017-12-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-8a643cf1f73545c8a803ecfe887463282022-12-21T18:24:54ZengBMCBMC Bioinformatics1471-21052017-12-0118111010.1186/s12859-017-1951-yUsing variant databases for variant prioritization and to detect erroneous genotype-phenotype associationsBart J. G. Broeckx0Luc Peelman1Jimmy H. Saunders2Dieter Deforce3Lieven Clement4Laboratory of Animal Genetics, Faculty of Veterinary Medicine, Ghent UniversityLaboratory of Animal Genetics, Faculty of Veterinary Medicine, Ghent UniversityDepartment of Medical Imaging and Orthopedics, Faculty of Veterinary Medicine, Ghent UniversityLaboratory of Pharmaceutical Biotechnology, Faculty of Pharmaceutical Sciences, Ghent UniversityDepartment of Applied Mathematics, Computer Science and Statistics, Faculty of Sciences, Ghent UniversityAbstract Background In the search for novel causal mutations, public and/or private variant databases are nearly always used to facilitate the search as they result in a massive reduction of putative variants in one step. Practically, variant filtering is often done by either using all variants from the variant database (called the absence-approach, i.e. it is assumed that disease-causing variants do not reside in variant databases) or by using the subset of variants with an allelic frequency > 1% (called the 1%-approach). We investigate the validity of these two approaches in terms of false negatives (the true disease-causing variant does not pass all filters) and false positives (a harmless mutation passes all filters and is erroneously retained in the list of putative disease-causing variants) and compare it with an novel approach which we named the quantile-based approach. This approach applies variable instead of static frequency thresholds and the calculation of these thresholds is based on prior knowledge of disease prevalence, inheritance models, database size and database characteristics. Results Based on real-life data, we demonstrate that the quantile-based approach outperforms the absence-approach in terms of false negatives. At the same time, this quantile-based approach deals more appropriately with the variable allele frequencies of disease-causing alleles in variant databases relative to the 1%-approach and as such allows a better control of the number of false positives. We also introduce an alternative application for variant database usage and the quantile-based approach. If disease-causing variants in variant databases deviate substantially from theoretical expectancies calculated with the quantile-based approach, their association between genotype and phenotype had to be reconsidered in 12 out of 13 cases. Conclusions We developed a novel method and demonstrated that this so-called quantile-based approach is a highly suitable method for variant filtering. In addition, the quantile-based approach can also be used for variant flagging. For user friendliness, lookup tables and easy-to-use R calculators are provided.http://link.springer.com/article/10.1186/s12859-017-1951-y1000 Genomes project variant databaseAllele frequencydbSNPHapMapVariant filteringVariant database
spellingShingle	Bart J. G. Broeckx Luc Peelman Jimmy H. Saunders Dieter Deforce Lieven Clement Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations BMC Bioinformatics 1000 Genomes project variant database Allele frequency dbSNP HapMap Variant filtering Variant database
title	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_full	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_fullStr	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_full_unstemmed	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_short	Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations
title_sort	using variant databases for variant prioritization and to detect erroneous genotype phenotype associations
topic	1000 Genomes project variant database Allele frequency dbSNP HapMap Variant filtering Variant database
url	http://link.springer.com/article/10.1186/s12859-017-1951-y
work_keys_str_mv	AT bartjgbroeckx usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT lucpeelman usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT jimmyhsaunders usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT dieterdeforce usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations AT lievenclement usingvariantdatabasesforvariantprioritizationandtodetecterroneousgenotypephenotypeassociations

Using variant databases for variant prioritization and to detect erroneous genotype-phenotype associations

Similar Items