Genotype Calling from Population-Genomic Sequencing Data

Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As...

Full description

Bibliographic Details
Main Authors: Takahiro Maruki, Michael Lynch
Format: Article
Language:English
Published: Oxford University Press 2017-05-01
Series:G3: Genes, Genomes, Genetics
Subjects:
Online Access:http://g3journal.org/lookup/doi/10.1534/g3.117.039008
_version_ 1819095045112856576
author Takahiro Maruki
Michael Lynch
author_facet Takahiro Maruki
Michael Lynch
author_sort Takahiro Maruki
collection DOAJ
description Genotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes.
first_indexed 2024-12-21T23:37:03Z
format Article
id doaj.art-404c354a81e14cc0b53cbb212987e44b
institution Directory Open Access Journal
issn 2160-1836
language English
last_indexed 2024-12-21T23:37:03Z
publishDate 2017-05-01
publisher Oxford University Press
record_format Article
series G3: Genes, Genomes, Genetics
spelling doaj.art-404c354a81e14cc0b53cbb212987e44b2022-12-21T18:46:19ZengOxford University PressG3: Genes, Genomes, Genetics2160-18362017-05-01751393140410.1534/g3.117.0390082Genotype Calling from Population-Genomic Sequencing DataTakahiro MarukiMichael LynchGenotype calling plays important roles in population-genomic studies, which have been greatly accelerated by sequencing technologies. To take full advantage of the resultant information, we have developed maximum-likelihood (ML) methods for calling genotypes from high-throughput sequencing data. As the statistical uncertainties associated with sequencing data depend on depths of coverage, we have developed two types of genotype callers. One approach is appropriate for low-coverage sequencing data, and incorporates population-level information on genotype frequencies and error rates pre-estimated by an ML method. Performance evaluation using computer simulations and human data shows that the proposed framework yields less biased estimates of allele frequencies and more accurate genotype calls than current widely used methods. Another type of genotype caller applies to high-coverage sequencing data, requires no prior genotype-frequency estimates, and makes no assumption on the number of alleles at a polymorphic site. Using computer simulations, we determine the depth of coverage necessary to accurately characterize polymorphisms using this second method. We applied the proposed method to high-coverage (mean 18×) sequencing data of 83 clones from a population of Daphnia pulex. The results show that the proposed method enables conservative and reasonably powerful detection of polymorphisms with arbitrary numbers of alleles. We have extended the proposed method to the analysis of genomic data for polyploid organisms, showing that calling accurate polyploid genotypes requires much higher coverage than diploid genotypes.http://g3journal.org/lookup/doi/10.1534/g3.117.039008genotype callpolymorphismpopulation genomics
spellingShingle Takahiro Maruki
Michael Lynch
Genotype Calling from Population-Genomic Sequencing Data
G3: Genes, Genomes, Genetics
genotype call
polymorphism
population genomics
title Genotype Calling from Population-Genomic Sequencing Data
title_full Genotype Calling from Population-Genomic Sequencing Data
title_fullStr Genotype Calling from Population-Genomic Sequencing Data
title_full_unstemmed Genotype Calling from Population-Genomic Sequencing Data
title_short Genotype Calling from Population-Genomic Sequencing Data
title_sort genotype calling from population genomic sequencing data
topic genotype call
polymorphism
population genomics
url http://g3journal.org/lookup/doi/10.1534/g3.117.039008
work_keys_str_mv AT takahiromaruki genotypecallingfrompopulationgenomicsequencingdata
AT michaellynch genotypecallingfrompopulationgenomicsequencingdata