FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support

Abstract Background Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate...

Full description

Bibliographic Details
Main Authors: Konrad Zych, Gerrit Gort, Chris A. Maliepaard, Ritsert C. Jansen, Roeland E. Voorrips
Format: Article
Language:English
Published: BMC 2019-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2703-y
_version_ 1818068006325977088
author Konrad Zych
Gerrit Gort
Chris A. Maliepaard
Ritsert C. Jansen
Roeland E. Voorrips
author_facet Konrad Zych
Gerrit Gort
Chris A. Maliepaard
Ritsert C. Jansen
Roeland E. Voorrips
author_sort Konrad Zych
collection DOAJ
description Abstract Background Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data. Results Our upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated. Conclusion Our updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.
first_indexed 2024-12-10T15:32:43Z
format Article
id doaj.art-49eed4b9976341b2a12a464f14d00727
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-10T15:32:43Z
publishDate 2019-03-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-49eed4b9976341b2a12a464f14d007272022-12-22T01:43:20ZengBMCBMC Bioinformatics1471-21052019-03-012011810.1186/s12859-019-2703-yFitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data supportKonrad Zych0Gerrit Gort1Chris A. Maliepaard2Ritsert C. Jansen3Roeland E. Voorrips4Groningen Bioinformatics Centre, University of GroningenWageningen University and Research – BiometrisWageningen University and Research - Plant BreedingGroningen Bioinformatics Centre, University of GroningenWageningen University and Research - Plant BreedingAbstract Background Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data. Results Our upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated. Conclusion Our updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.http://link.springer.com/article/10.1186/s12859-019-2703-yGenomicsGenotypingGenotype callingPolyploidsAutotetraploidsfitPoly
spellingShingle Konrad Zych
Gerrit Gort
Chris A. Maliepaard
Ritsert C. Jansen
Roeland E. Voorrips
FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support
BMC Bioinformatics
Genomics
Genotyping
Genotype calling
Polyploids
Autotetraploids
fitPoly
title FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support
title_full FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support
title_fullStr FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support
title_full_unstemmed FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support
title_short FitTetra 2.0 – improved genotype calling for tetraploids with multiple population and parental data support
title_sort fittetra 2 0 improved genotype calling for tetraploids with multiple population and parental data support
topic Genomics
Genotyping
Genotype calling
Polyploids
Autotetraploids
fitPoly
url http://link.springer.com/article/10.1186/s12859-019-2703-y
work_keys_str_mv AT konradzych fittetra20improvedgenotypecallingfortetraploidswithmultiplepopulationandparentaldatasupport
AT gerritgort fittetra20improvedgenotypecallingfortetraploidswithmultiplepopulationandparentaldatasupport
AT chrisamaliepaard fittetra20improvedgenotypecallingfortetraploidswithmultiplepopulationandparentaldatasupport
AT ritsertcjansen fittetra20improvedgenotypecallingfortetraploidswithmultiplepopulationandparentaldatasupport
AT roelandevoorrips fittetra20improvedgenotypecallingfortetraploidswithmultiplepopulationandparentaldatasupport