Robust design for coalescent model inference

The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an import...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí: Parag, K, Pybus, O
Formáid: Journal article
Teanga:English
Foilsithe / Cruthaithe: Oxford University Press 2019
_version_ 1826298199100358656
author Parag, K
Pybus, O
author_facet Parag, K
Pybus, O
author_sort Parag, K
collection OXFORD
description The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretisation for sequentially Markovian coalescent models. In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. `Robust' means that the total and maximum uncertainty of our parameter estimates are minimised, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions, and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.
first_indexed 2024-03-07T04:43:13Z
format Journal article
id oxford-uuid:d25b7da8-9ffd-41e8-b9cd-f47179b22143
institution University of Oxford
language English
last_indexed 2024-03-07T04:43:13Z
publishDate 2019
publisher Oxford University Press
record_format dspace
spelling oxford-uuid:d25b7da8-9ffd-41e8-b9cd-f47179b221432022-03-27T08:03:22ZRobust design for coalescent model inferenceJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d25b7da8-9ffd-41e8-b9cd-f47179b22143EnglishSymplectic Elements at OxfordOxford University Press2019Parag, KPybus, OThe coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretisation for sequentially Markovian coalescent models. In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. `Robust' means that the total and maximum uncertainty of our parameter estimates are minimised, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions, and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.
spellingShingle Parag, K
Pybus, O
Robust design for coalescent model inference
title Robust design for coalescent model inference
title_full Robust design for coalescent model inference
title_fullStr Robust design for coalescent model inference
title_full_unstemmed Robust design for coalescent model inference
title_short Robust design for coalescent model inference
title_sort robust design for coalescent model inference
work_keys_str_mv AT paragk robustdesignforcoalescentmodelinference
AT pybuso robustdesignforcoalescentmodelinference