GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers

Haplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic infor...

Full description

Bibliographic Details
Main Authors: Dzianis Prakapenka, Chunkao Wang, Zuoxiang Liang, Cheng Bian, Cheng Tan, Yang Da
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-04-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.00282/full
_version_ 1818854990991589376
author Dzianis Prakapenka
Chunkao Wang
Zuoxiang Liang
Cheng Bian
Cheng Bian
Cheng Tan
Cheng Tan
Yang Da
author_facet Dzianis Prakapenka
Chunkao Wang
Zuoxiang Liang
Cheng Bian
Cheng Bian
Cheng Tan
Cheng Tan
Yang Da
author_sort Dzianis Prakapenka
collection DOAJ
description Haplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic information, we developed a computing pipeline to implement haplotype analysis with capabilities for preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. Data preparation includes utility programs for haplotype imputing; defining haplotype blocks by a fixed number of SNPs, a fixed distance in base pairs per block, or user defined block lengths based on structural or functional genomic information or a mixture of both types of information; and defining haplotype genotypes within each haplotype block. GVCHAP is the main program for genomic prediction and estimation, calculates GREML (genomic restricted maximum likelihood) estimates of variance components and heritabilities, and calculates GBLUP (genomic best linear unbiased prediction) for additive and dominance values of single SNPs as well as additive values of haplotypes with reliability estimates for training and validation populations. A two-step strategy and a method of multi-node processing are implemented to remove the computing bottleneck due to the creation of genomic relationship matrices for large samples. The analysis of GVCHAP results includes calculation of observed prediction accuracies from validation studies and preparation of input files for graphical visualization of heritability estimates of haplotype blocks as well as estimates of SNP effects and heritabilities. The entire pipeline provides an efficient and versatile computing tool for identifying the most accurate haplotype model among many candidate haplotype models utilizing structural and functional genomic information for genomic selection.
first_indexed 2024-12-19T08:01:30Z
format Article
id doaj.art-e2af11008bc942ccae5f5f1e58fac7b2
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-19T08:01:30Z
publishDate 2020-04-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-e2af11008bc942ccae5f5f1e58fac7b22022-12-21T20:29:51ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-04-011110.3389/fgene.2020.00282515392GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP MarkersDzianis Prakapenka0Chunkao Wang1Zuoxiang Liang2Cheng Bian3Cheng Bian4Cheng Tan5Cheng Tan6Yang Da7Department of Animal Science, University of Minnesota, Saint Paul, MN, United StatesDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesState Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing, ChinaDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesNational Engineering Research Center for Breeding Swine Industry, South China Agricultural University, Guangzhou, ChinaDepartment of Animal Science, University of Minnesota, Saint Paul, MN, United StatesHaplotype prediction models open many possibilities to improve the accuracy of genomic selection but require more data processing and computing time than single-SNP prediction models. To facilitate haplotype analysis for genomic prediction and estimation using structural and functional genomic information, we developed a computing pipeline to implement haplotype analysis with capabilities for preparation of input data for haplotype analysis, genomic prediction and estimation using GVCHAP, and analysis of GVCHAP results. Data preparation includes utility programs for haplotype imputing; defining haplotype blocks by a fixed number of SNPs, a fixed distance in base pairs per block, or user defined block lengths based on structural or functional genomic information or a mixture of both types of information; and defining haplotype genotypes within each haplotype block. GVCHAP is the main program for genomic prediction and estimation, calculates GREML (genomic restricted maximum likelihood) estimates of variance components and heritabilities, and calculates GBLUP (genomic best linear unbiased prediction) for additive and dominance values of single SNPs as well as additive values of haplotypes with reliability estimates for training and validation populations. A two-step strategy and a method of multi-node processing are implemented to remove the computing bottleneck due to the creation of genomic relationship matrices for large samples. The analysis of GVCHAP results includes calculation of observed prediction accuracies from validation studies and preparation of input files for graphical visualization of heritability estimates of haplotype blocks as well as estimates of SNP effects and heritabilities. The entire pipeline provides an efficient and versatile computing tool for identifying the most accurate haplotype model among many candidate haplotype models utilizing structural and functional genomic information for genomic selection.https://www.frontiersin.org/article/10.3389/fgene.2020.00282/fullgenomic selectionhaplotypeSNPheritabilityprediction accuracy
spellingShingle Dzianis Prakapenka
Chunkao Wang
Zuoxiang Liang
Cheng Bian
Cheng Bian
Cheng Tan
Cheng Tan
Yang Da
GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
Frontiers in Genetics
genomic selection
haplotype
SNP
heritability
prediction accuracy
title GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
title_full GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
title_fullStr GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
title_full_unstemmed GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
title_short GVCHAP: A Computing Pipeline for Genomic Prediction and Variance Component Estimation Using Haplotypes and SNP Markers
title_sort gvchap a computing pipeline for genomic prediction and variance component estimation using haplotypes and snp markers
topic genomic selection
haplotype
SNP
heritability
prediction accuracy
url https://www.frontiersin.org/article/10.3389/fgene.2020.00282/full
work_keys_str_mv AT dzianisprakapenka gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT chunkaowang gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT zuoxiangliang gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT chengbian gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT chengbian gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT chengtan gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT chengtan gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers
AT yangda gvchapacomputingpipelineforgenomicpredictionandvariancecomponentestimationusinghaplotypesandsnpmarkers