High density marker panels, SNPs prioritizing and accuracy of genomic selection

Abstract Background The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in...

Full description

Bibliographic Details
Main Authors: Ling-Yun Chang, Sajjad Toghiani, Ashley Ling, Sammy E. Aggrey, Romdhane Rekaya
Format: Article
Language:English
Published: BMC 2018-01-01
Series:BMC Genetics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12863-017-0595-2
_version_ 1818040040767356928
author Ling-Yun Chang
Sajjad Toghiani
Ashley Ling
Sammy E. Aggrey
Romdhane Rekaya
author_facet Ling-Yun Chang
Sajjad Toghiani
Ashley Ling
Sammy E. Aggrey
Romdhane Rekaya
author_sort Ling-Yun Chang
collection DOAJ
description Abstract Background The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in the number of parameters in the genetic association model compared to the number of phenotypes has led to further deterioration in statistical power and an increase in co-linearity and false positive rates. At best, HD panels do not significantly improve GS accuracy and, at worst, reduce accuracy. This is true for both regression and variance component approaches. To remedy this situation, some form of single nucleotide polymorphisms (SNP) filtering or external information is needed. Current methods for prioritizing SNP markers (i.e. BayesB, BayesCπ) are sensitive to the increased co-linearity in HD panels which could limit their performance. Results In this study, the usefulness of FST, a measure of allele frequency variation among populations, as an external source of information in GS was evaluated. A simulation was carried out for a trait with heritability of 0.4. Data was divided into three subpopulations based on phenotype distribution (bottom 5%, middle 90%, top 5%). Marker data were simulated to mimic a 770 K and 1.5 million SNP marker panel. A ten-chromosome genome with 200 K and 400 K SNPs was simulated. Several scenarios with varying distributions for the quantitative trait loci (QTL) effects were simulated. Using all 200 K markers and no filtering, the accuracy of genomic prediction was 0.77. When marker effects were simulated from a gamma distribution, SNPs pre-selected based on the 99.5, 99.0 and 97.5% quantile of the FST score distribution resulted in an accuracy of 0.725, 0.797, and 0.853, respectively. Similar results were observed under other simulation scenarios. Clearly, the accuracy obtained using all SNPs can be easily achieved using only 0.5 to 1% of all markers. Conclusions These results indicate that SNP filtering using already available external information could increase the accuracy of GS. This is especially important as next-generation sequencing technology becomes more affordable and accessible to human, animal and plant applications.
first_indexed 2024-12-10T08:08:13Z
format Article
id doaj.art-e20d18b2480b46d38b6c489166dbecb8
institution Directory Open Access Journal
issn 1471-2156
language English
last_indexed 2024-12-10T08:08:13Z
publishDate 2018-01-01
publisher BMC
record_format Article
series BMC Genetics
spelling doaj.art-e20d18b2480b46d38b6c489166dbecb82022-12-22T01:56:37ZengBMCBMC Genetics1471-21562018-01-0119111010.1186/s12863-017-0595-2High density marker panels, SNPs prioritizing and accuracy of genomic selectionLing-Yun Chang0Sajjad Toghiani1Ashley Ling2Sammy E. Aggrey3Romdhane Rekaya4Department of Animal and Dairy Science, University of GeorgiaDepartment of Animal and Dairy Science, University of GeorgiaDepartment of Animal and Dairy Science, University of GeorgiaDepartment of Poultry Science, University of GeorgiaDepartment of Animal and Dairy Science, University of GeorgiaAbstract Background The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in the number of parameters in the genetic association model compared to the number of phenotypes has led to further deterioration in statistical power and an increase in co-linearity and false positive rates. At best, HD panels do not significantly improve GS accuracy and, at worst, reduce accuracy. This is true for both regression and variance component approaches. To remedy this situation, some form of single nucleotide polymorphisms (SNP) filtering or external information is needed. Current methods for prioritizing SNP markers (i.e. BayesB, BayesCπ) are sensitive to the increased co-linearity in HD panels which could limit their performance. Results In this study, the usefulness of FST, a measure of allele frequency variation among populations, as an external source of information in GS was evaluated. A simulation was carried out for a trait with heritability of 0.4. Data was divided into three subpopulations based on phenotype distribution (bottom 5%, middle 90%, top 5%). Marker data were simulated to mimic a 770 K and 1.5 million SNP marker panel. A ten-chromosome genome with 200 K and 400 K SNPs was simulated. Several scenarios with varying distributions for the quantitative trait loci (QTL) effects were simulated. Using all 200 K markers and no filtering, the accuracy of genomic prediction was 0.77. When marker effects were simulated from a gamma distribution, SNPs pre-selected based on the 99.5, 99.0 and 97.5% quantile of the FST score distribution resulted in an accuracy of 0.725, 0.797, and 0.853, respectively. Similar results were observed under other simulation scenarios. Clearly, the accuracy obtained using all SNPs can be easily achieved using only 0.5 to 1% of all markers. Conclusions These results indicate that SNP filtering using already available external information could increase the accuracy of GS. This is especially important as next-generation sequencing technology becomes more affordable and accessible to human, animal and plant applications.http://link.springer.com/article/10.1186/s12863-017-0595-2SNP prioritizingGenomic selectionHigh density
spellingShingle Ling-Yun Chang
Sajjad Toghiani
Ashley Ling
Sammy E. Aggrey
Romdhane Rekaya
High density marker panels, SNPs prioritizing and accuracy of genomic selection
BMC Genetics
SNP prioritizing
Genomic selection
High density
title High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_full High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_fullStr High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_full_unstemmed High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_short High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_sort high density marker panels snps prioritizing and accuracy of genomic selection
topic SNP prioritizing
Genomic selection
High density
url http://link.springer.com/article/10.1186/s12863-017-0595-2
work_keys_str_mv AT lingyunchang highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT sajjadtoghiani highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT ashleyling highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT sammyeaggrey highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT romdhanerekaya highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection