Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data

<p>Abstract</p> <p>Background</p> <p>In many contexts, pedigrees for individuals are known even though not all individuals have been fully genotyped. In one extreme case, the genotypes for a set of full siblings are known, with no knowledge of parental genotypes. We pro...

Full description

Bibliographic Details
Main Author: Nettelblad Carl
Format: Article
Language:English
Published: BMC 2012-10-01
Series:BMC Genetics
Subjects:
Online Access:http://www.biomedcentral.com/1471-2156/13/85
_version_ 1818040254576197632
author Nettelblad Carl
author_facet Nettelblad Carl
author_sort Nettelblad Carl
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>In many contexts, pedigrees for individuals are known even though not all individuals have been fully genotyped. In one extreme case, the genotypes for a set of full siblings are known, with no knowledge of parental genotypes. We propose a method for inferring phased haplotypes and genotypes for all individuals, even those with missing data, in such pedigrees, allowing a multitude of classic and recent methods for linkage and genome analysis to be used more efficiently.</p> <p>Results</p> <p>By artificially removing the founder generation genotype data from a well-studied simulated dataset, the quality of reconstructed genotypes in that generation can be verified. For the full structure of repeated matings with 15 offspring per mating, 10 dams per sire, 99.89<it>%</it>of all founder markers were phased correctly, given only the unphased genotypes for offspring. The accuracy was reduced only slightly, to 99.51<it>%</it>, when introducing a 2% error rate in offspring genotypes. When reduced to only 5 full-sib offspring in a single sire-dam mating, the corresponding percentage is 92.62<it>%</it>, which compares favorably with 89.28<it>%</it>from the leading Merlin package. Furthermore, Merlin is unable to handle more than approximately 10 sibs, as the number of states tracked rises exponentially with family size, while our approach has no such limit and handles 150 half-sibs with ease in our experiments.</p> <p>Conclusions</p> <p>Our method is able to reconstruct genotypes for parents when genotype data is only available for offspring individuals, as well as haplotypes for all individuals. Compared to the Merlin package, we can handle larger pedigrees and produce superior results, mainly due to the fact that Merlin uses the Viterbi algorithm on the state space to infer the genotype sequence. Tracking of haplotype and allele origin can be used in any application where the marker set does not directly influence genotype variation influencing traits. Inference of genotypes can also reduce the effects of genotyping errors and missing data. The <monospace>cnF2freq</monospace> codebase implementing our approach is available under a BSD-style license.</p>
first_indexed 2024-12-10T08:11:36Z
format Article
id doaj.art-d7606c5f4f4e485c830e43f10ff35a75
institution Directory Open Access Journal
issn 1471-2156
language English
last_indexed 2024-12-10T08:11:36Z
publishDate 2012-10-01
publisher BMC
record_format Article
series BMC Genetics
spelling doaj.art-d7606c5f4f4e485c830e43f10ff35a752022-12-22T01:56:34ZengBMCBMC Genetics1471-21562012-10-011318510.1186/1471-2156-13-85Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype dataNettelblad Carl<p>Abstract</p> <p>Background</p> <p>In many contexts, pedigrees for individuals are known even though not all individuals have been fully genotyped. In one extreme case, the genotypes for a set of full siblings are known, with no knowledge of parental genotypes. We propose a method for inferring phased haplotypes and genotypes for all individuals, even those with missing data, in such pedigrees, allowing a multitude of classic and recent methods for linkage and genome analysis to be used more efficiently.</p> <p>Results</p> <p>By artificially removing the founder generation genotype data from a well-studied simulated dataset, the quality of reconstructed genotypes in that generation can be verified. For the full structure of repeated matings with 15 offspring per mating, 10 dams per sire, 99.89<it>%</it>of all founder markers were phased correctly, given only the unphased genotypes for offspring. The accuracy was reduced only slightly, to 99.51<it>%</it>, when introducing a 2% error rate in offspring genotypes. When reduced to only 5 full-sib offspring in a single sire-dam mating, the corresponding percentage is 92.62<it>%</it>, which compares favorably with 89.28<it>%</it>from the leading Merlin package. Furthermore, Merlin is unable to handle more than approximately 10 sibs, as the number of states tracked rises exponentially with family size, while our approach has no such limit and handles 150 half-sibs with ease in our experiments.</p> <p>Conclusions</p> <p>Our method is able to reconstruct genotypes for parents when genotype data is only available for offspring individuals, as well as haplotypes for all individuals. Compared to the Merlin package, we can handle larger pedigrees and produce superior results, mainly due to the fact that Merlin uses the Viterbi algorithm on the state space to infer the genotype sequence. Tracking of haplotype and allele origin can be used in any application where the marker set does not directly influence genotype variation influencing traits. Inference of genotypes can also reduce the effects of genotyping errors and missing data. The <monospace>cnF2freq</monospace> codebase implementing our approach is available under a BSD-style license.</p>http://www.biomedcentral.com/1471-2156/13/85HaplotypingPhasingGenotype inferenceNuclear family dataHidden Markov models
spellingShingle Nettelblad Carl
Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
BMC Genetics
Haplotyping
Phasing
Genotype inference
Nuclear family data
Hidden Markov models
title Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
title_full Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
title_fullStr Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
title_full_unstemmed Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
title_short Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data
title_sort inferring haplotypes and parental genotypes in larger full sib ships and other pedigrees with missing or erroneous genotype data
topic Haplotyping
Phasing
Genotype inference
Nuclear family data
Hidden Markov models
url http://www.biomedcentral.com/1471-2156/13/85
work_keys_str_mv AT nettelbladcarl inferringhaplotypesandparentalgenotypesinlargerfullsibshipsandotherpedigreeswithmissingorerroneousgenotypedata