LinkImputeR: user-guided genotype calling and imputation for non-model organisms

Abstract Background Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation...

Full description

Bibliographic Details
Main Authors:	Daniel Money, Zoë Migicovsky, Kyle Gardner, Sean Myles
Format:	Article
Language:	English
Published:	BMC 2017-07-01
Series:	BMC Genomics
Subjects:	Imputation GBS SNP Read count
Online Access:	http://link.springer.com/article/10.1186/s12864-017-3873-5

_version_	1818953465488998400
author	Daniel Money Zoë Migicovsky Kyle Gardner Sean Myles
author_facet	Daniel Money Zoë Migicovsky Kyle Gardner Sean Myles
author_sort	Daniel Money
collection	DOAJ
description	Abstract Background Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Results Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. Conclusions By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from NGS technologies. It enables the user to quickly and easily examine the effects of varying thresholds and filters on the number and quality of the resulting genotype calls. In this manner, users can decide on thresholds that are most suitable for their purposes. We show that LinkImputeR can significantly augment the value and utility of NGS data sets, especially in non-model organisms with poor genomic resources.
first_indexed	2024-12-20T10:06:42Z
format	Article
id	doaj.art-6abb05f18b194aeba858554a1b53037e
institution	Directory Open Access Journal
issn	1471-2164
language	English
last_indexed	2024-12-20T10:06:42Z
publishDate	2017-07-01
publisher	BMC
record_format	Article
series	BMC Genomics
spelling	doaj.art-6abb05f18b194aeba858554a1b53037e2022-12-21T19:44:13ZengBMCBMC Genomics1471-21642017-07-0118111210.1186/s12864-017-3873-5LinkImputeR: user-guided genotype calling and imputation for non-model organismsDaniel Money0Zoë Migicovsky1Kyle Gardner2Sean Myles3Department of Plant and Animal Sciences, Faculty of Agriculture, Dalhousie UniversityDepartment of Plant and Animal Sciences, Faculty of Agriculture, Dalhousie UniversityDepartment of Plant and Animal Sciences, Faculty of Agriculture, Dalhousie UniversityDepartment of Plant and Animal Sciences, Faculty of Agriculture, Dalhousie UniversityAbstract Background Genomic studies such as genome-wide association and genomic selection require genome-wide genotype data. All existing technologies used to create these data result in missing genotypes, which are often then inferred using genotype imputation software. However, existing imputation methods most often make use only of genotypes that are successfully inferred after having passed a certain read depth threshold. Because of this, any read information for genotypes that did not pass the threshold, and were thus set to missing, is ignored. Most genomic studies also choose read depth thresholds and quality filters without investigating their effects on the size and quality of the resulting genotype data. Moreover, almost all genotype imputation methods require ordered markers and are therefore of limited utility in non-model organisms. Results Here we introduce LinkImputeR, a software program that exploits the read count information that is normally ignored, and makes use of all available DNA sequence information for the purposes of genotype calling and imputation. It is specifically designed for non-model organisms since it requires neither ordered markers nor a reference panel of genotypes. Using next-generation DNA sequence (NGS) data from apple, cannabis and grape, we quantify the effect of varying read count and missingness thresholds on the quantity and quality of genotypes generated from LinkImputeR. We demonstrate that LinkImputeR can increase the number of genotype calls by more than an order of magnitude, can improve genotyping accuracy by several percent and can thus improve the power of downstream analyses. Moreover, we show that the effects of quality and read depth filters can differ substantially between data sets and should therefore be investigated on a per-study basis. Conclusions By exploiting DNA sequence data that is normally ignored during genotype calling and imputation, LinkImputeR can significantly improve both the quantity and quality of genotype data generated from NGS technologies. It enables the user to quickly and easily examine the effects of varying thresholds and filters on the number and quality of the resulting genotype calls. In this manner, users can decide on thresholds that are most suitable for their purposes. We show that LinkImputeR can significantly augment the value and utility of NGS data sets, especially in non-model organisms with poor genomic resources.http://link.springer.com/article/10.1186/s12864-017-3873-5ImputationGBSSNPRead count
spellingShingle	Daniel Money Zoë Migicovsky Kyle Gardner Sean Myles LinkImputeR: user-guided genotype calling and imputation for non-model organisms BMC Genomics Imputation GBS SNP Read count
title	LinkImputeR: user-guided genotype calling and imputation for non-model organisms
title_full	LinkImputeR: user-guided genotype calling and imputation for non-model organisms
title_fullStr	LinkImputeR: user-guided genotype calling and imputation for non-model organisms
title_full_unstemmed	LinkImputeR: user-guided genotype calling and imputation for non-model organisms
title_short	LinkImputeR: user-guided genotype calling and imputation for non-model organisms
title_sort	linkimputer user guided genotype calling and imputation for non model organisms
topic	Imputation GBS SNP Read count
url	http://link.springer.com/article/10.1186/s12864-017-3873-5
work_keys_str_mv	AT danielmoney linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms AT zoemigicovsky linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms AT kylegardner linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms AT seanmyles linkimputeruserguidedgenotypecallingandimputationfornonmodelorganisms

LinkImputeR: user-guided genotype calling and imputation for non-model organisms

Similar Items