GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
Haplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic infor...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-04-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2020.00282/full |
_version_ | 1818854990991589376 |
---|---|
author | Dzianis Prakapenka Chunkao Wang Zuoxiang Liang Cheng Bian Cheng Bian Cheng Tan Cheng Tan Yang Da |
author_facet | Dzianis Prakapenka Chunkao Wang Zuoxiang Liang Cheng Bian Cheng Bian Cheng Tan Cheng Tan Yang Da |
author_sort | Dzianis Prakapenka |
collection | DOAJ |
description | Haplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic information, we developed a computing pipeline to implement haplotype analysis with capabilities for preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. Data preparation includes utility programs for haplotype imputing; defining haplotype blocks by a fixed number of SNPs, a fixed distance in base pairs per block, or user defined block lengths based on structural or functional genomic information or a mixture of both types of information; and defining haplotype genotypes within each haplotype block. GVCHAP is the main program for genomic prediction and estimation, calculates GREML (genomic restricted maximum likelihood) estimates of variance components and heritabilities, and calculates GBLUP (genomic best linear unbiased prediction) for additive and dominance values of single SNPs as well as additive values of haplotypes with reliability estimates for training and validation populations. A two-step strategy and a method of multi-node processing are implemented to remove the computing bottleneck due to the creation of genomic relationship matrices for large samples. The analysis of GVCHAP results includes calculation of observed prediction accuracies from validation studies and preparation of input files for graphical visualization of heritability estimates of haplotype blocks as well as estimates of SNP effects and heritabilities. The entire pipeline provides an efficient and versatile computing tool for identifying the most accurate haplotype model among many candidate haplotype models utilizing structural and functional genomic information for genomic selection. |
first_indexed | 2024-12-19T08:01:30Z |
format | Article |
id | doaj.art-e2af11008bc942ccae5f5f1e58fac7b2 |
institution | Directory Open Access Journal |
issn | 1664-8021 |
language | English |
last_indexed | 2024-12-19T08:01:30Z |
publishDate | 2020-04-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Genetics |
spelling | doaj.art-e2af11008bc942ccae5f5f1e58fac7b22022-12-21T20:29:51ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-04-011110.3389/fgene.2020.00282515392GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP MarkersDzianis Prakapenka0Chunkao Wang1Zuoxiang Liang2Cheng Bian3Cheng Bian4Cheng Tan5Cheng Tan6Yang Da7Department of Animal Science, University of Minnesota, Saint Paul, MN, United StatesDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesState Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, ChinaDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesNational Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, ChinaDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesHaplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic information, we developed a computing pipeline to implement haplotype analysis with capabilities for preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. Data preparation includes utility programs for haplotype imputing; defining haplotype blocks by a fixed number of SNPs, a fixed distance in base pairs per block, or user defined block lengths based on structural or functional genomic information or a mixture of both types of information; and defining haplotype genotypes within each haplotype block. GVCHAP is the main program for genomic prediction and estimation, calculates GREML (genomic restricted maximum likelihood) estimates of variance components and heritabilities, and calculates GBLUP (genomic best linear unbiased prediction) for additive and dominance values of single SNPs as well as additive values of haplotypes with reliability estimates for training and validation populations. A two-step strategy and a method of multi-node processing are implemented to remove the computing bottleneck due to the creation of genomic relationship matrices for large samples. The analysis of GVCHAP results includes calculation of observed prediction accuracies from validation studies and preparation of input files for graphical visualization of heritability estimates of haplotype blocks as well as estimates of SNP effects and heritabilities. The entire pipeline provides an efficient and versatile computing tool for identifying the most accurate haplotype model among many candidate haplotype models utilizing structural and functional genomic information for genomic selection.https://www.frontiersin.org/article/10.3389/fgene.2020.00282/fullgenomic selectionhaplotypeSNPheritabilityprediction accuracy |
spellingShingle | Dzianis Prakapenka Chunkao Wang Zuoxiang Liang Cheng Bian Cheng Bian Cheng Tan Cheng Tan Yang Da GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers Frontiers in Genetics genomic selection haplotype SNP heritability prediction accuracy |
title | GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers |
title_full | GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers |
title_fullStr | GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers |
title_full_unstemmed | GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers |
title_short | GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers |
title_sort | gvchap a computing pipeline for genomic prediction and variance component estimation using haplotypes and snp markers |
topic | genomic selection haplotype SNP heritability prediction accuracy |
url | https://www.frontiersin.org/article/10.3389/fgene.2020.00282/full |
work_keys_str_mv | AT dzianisprakapenka gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT chunkaowang gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT zuoxiangliang gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT chengbian gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT chengbian gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT chengtan gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT chengtan gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers AT yangda gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers |