A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.

Admixture-the mixing of genomes from divergent populations-is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a nu...

Full description

Bibliographic Details
Main Authors: Russell Corbett-Detig, Rasmus Nielsen
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS Genetics
Online Access:http://europepmc.org/articles/PMC5242547?pdf=render
_version_ 1818131074258042880
author Russell Corbett-Detig
Rasmus Nielsen
author_facet Russell Corbett-Detig
Rasmus Nielsen
author_sort Russell Corbett-Detig
collection DOAJ
description Admixture-the mixing of genomes from divergent populations-is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require genotypes, which is not feasible to obtain for many next-generation sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy-i.e. 100 or more chromosomes. As this method is very general, we expect it will be useful for local ancestry inference in a wider variety of populations than what previously has been possible. We then applied our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of local recombination rates are negatively correlated with the proportion of African ancestry, suggesting that selection against foreign ancestry is the least efficient in low recombination regions. Finally we show that clinal outlier loci are enriched for genes associated with gene regulatory functions, consistent with a role of regulatory evolution in ecological adaptation of admixed D. melanogaster populations. Our results illustrate the potential of local ancestry inference for elucidating fundamental evolutionary processes.
first_indexed 2024-12-11T08:15:09Z
format Article
id doaj.art-dbe55781e620406592f6ff8e83affd36
institution Directory Open Access Journal
issn 1553-7390
1553-7404
language English
last_indexed 2024-12-11T08:15:09Z
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj.art-dbe55781e620406592f6ff8e83affd362022-12-22T01:14:47ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042017-01-01131e100652910.1371/journal.pgen.1006529A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.Russell Corbett-DetigRasmus NielsenAdmixture-the mixing of genomes from divergent populations-is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require genotypes, which is not feasible to obtain for many next-generation sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy-i.e. 100 or more chromosomes. As this method is very general, we expect it will be useful for local ancestry inference in a wider variety of populations than what previously has been possible. We then applied our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of local recombination rates are negatively correlated with the proportion of African ancestry, suggesting that selection against foreign ancestry is the least efficient in low recombination regions. Finally we show that clinal outlier loci are enriched for genes associated with gene regulatory functions, consistent with a role of regulatory evolution in ecological adaptation of admixed D. melanogaster populations. Our results illustrate the potential of local ancestry inference for elucidating fundamental evolutionary processes.http://europepmc.org/articles/PMC5242547?pdf=render
spellingShingle Russell Corbett-Detig
Rasmus Nielsen
A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.
PLoS Genetics
title A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.
title_full A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.
title_fullStr A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.
title_full_unstemmed A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.
title_short A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy.
title_sort hidden markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy
url http://europepmc.org/articles/PMC5242547?pdf=render
work_keys_str_mv AT russellcorbettdetig ahiddenmarkovmodelapproachforsimultaneouslyestimatinglocalancestryandadmixturetimeusingnextgenerationsequencedatainsamplesofarbitraryploidy
AT rasmusnielsen ahiddenmarkovmodelapproachforsimultaneouslyestimatinglocalancestryandadmixturetimeusingnextgenerationsequencedatainsamplesofarbitraryploidy
AT russellcorbettdetig hiddenmarkovmodelapproachforsimultaneouslyestimatinglocalancestryandadmixturetimeusingnextgenerationsequencedatainsamplesofarbitraryploidy
AT rasmusnielsen hiddenmarkovmodelapproachforsimultaneouslyestimatinglocalancestryandadmixturetimeusingnextgenerationsequencedatainsamplesofarbitraryploidy