Robust inference of population size histories from genomic sequencing data.

Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-assoc...

Full description

Bibliographic Details
Main Authors: Gautam Upadhya, Matthias Steinrücken
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-09-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1010419
_version_ 1811168436319944704
author Gautam Upadhya
Matthias Steinrücken
author_facet Gautam Upadhya
Matthias Steinrücken
author_sort Gautam Upadhya
collection DOAJ
description Unraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly. Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (TMRCA) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes. We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using TMRCA as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data.
first_indexed 2024-04-10T16:26:18Z
format Article
id doaj.art-e4d7ce38aa664bdfad4c37b027f6d151
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-04-10T16:26:18Z
publishDate 2022-09-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-e4d7ce38aa664bdfad4c37b027f6d1512023-02-09T05:31:40ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582022-09-01189e101041910.1371/journal.pcbi.1010419Robust inference of population size histories from genomic sequencing data.Gautam UpadhyaMatthias SteinrückenUnraveling the complex demographic histories of natural populations is a central problem in population genetics. Understanding past demographic events is of general anthropological interest, but is also an important step in establishing accurate null models when identifying adaptive or disease-associated genetic variation. An important class of tools for inferring past population size changes from genomic sequence data are Coalescent Hidden Markov Models (CHMMs). These models make efficient use of the linkage information in population genomic datasets by using the local genealogies relating sampled individuals as latent states that evolve along the chromosome in an HMM framework. Extending these models to large sample sizes is challenging, since the number of possible latent states increases rapidly. Here, we present our method CHIMP (CHMM History-Inference Maximum-Likelihood Procedure), a novel CHMM method for inferring the size history of a population. It can be applied to large samples (hundreds of haplotypes) and only requires unphased genomes as input. The two implementations of CHIMP that we present here use either the height of the genealogical tree (TMRCA) or the total branch length, respectively, as the latent variable at each position in the genome. The requisite transition and emission probabilities are obtained by numerically solving certain systems of differential equations derived from the ancestral process with recombination. The parameters of the population size history are subsequently inferred using an Expectation-Maximization algorithm. In addition, we implement a composite likelihood scheme to allow the method to scale to large sample sizes. We demonstrate the efficiency and accuracy of our method in a variety of benchmark tests using simulated data and present comparisons to other state-of-the-art methods. Specifically, our implementation using TMRCA as the latent variable shows comparable performance and provides accurate estimates of effective population sizes in intermediate and ancient times. Our method is agnostic to the phasing of the data, which makes it a promising alternative in scenarios where high quality data is not available, and has potential applications for pseudo-haploid data.https://doi.org/10.1371/journal.pcbi.1010419
spellingShingle Gautam Upadhya
Matthias Steinrücken
Robust inference of population size histories from genomic sequencing data.
PLoS Computational Biology
title Robust inference of population size histories from genomic sequencing data.
title_full Robust inference of population size histories from genomic sequencing data.
title_fullStr Robust inference of population size histories from genomic sequencing data.
title_full_unstemmed Robust inference of population size histories from genomic sequencing data.
title_short Robust inference of population size histories from genomic sequencing data.
title_sort robust inference of population size histories from genomic sequencing data
url https://doi.org/10.1371/journal.pcbi.1010419
work_keys_str_mv AT gautamupadhya robustinferenceofpopulationsizehistoriesfromgenomicsequencingdata
AT matthiassteinrucken robustinferenceofpopulationsizehistoriesfromgenomicsequencingdata