A Fast Method for the Selection of Samples in Populations with Available Genealogical Data

Optimal selection of samples in populations should provide the best coverage of sample variations for the available sampling resources. In populations with known genealogical connections, or pedigrees, this amounts to finding the set of samples with the largest sum of mutual distances in a genealogi...

Full description

Bibliographic Details
Main Authors: Dalibor Hršak, Ivan Katanić, Strahil Ristov
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Diversity
Subjects:
Online Access:https://www.mdpi.com/1424-2818/14/2/150
_version_ 1797481032782446592
author Dalibor Hršak
Ivan Katanić
Strahil Ristov
author_facet Dalibor Hršak
Ivan Katanić
Strahil Ristov
author_sort Dalibor Hršak
collection DOAJ
description Optimal selection of samples in populations should provide the best coverage of sample variations for the available sampling resources. In populations with known genealogical connections, or pedigrees, this amounts to finding the set of samples with the largest sum of mutual distances in a genealogical tree. We present an optimal, and a faster sub-optimal, method for the selection of <i>K</i> samples from a population of <i>N</i> individuals. The optimal method works in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><msup><mi>K</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>, and the sub-optimal in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mi>K</mi></mrow></semantics></math></inline-formula>, which is more practical for large populations. The sub-optimal algorithm can process pedigrees of millions of individuals in a matter of minutes. With the real-life pedigrees, the difference in the quality of the output of the two algorithms is negligible. We provide the Python3 source codes for the two methods.
first_indexed 2024-03-09T22:09:37Z
format Article
id doaj.art-a465e81df98b4c18a639000cae692583
institution Directory Open Access Journal
issn 1424-2818
language English
last_indexed 2024-03-09T22:09:37Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Diversity
spelling doaj.art-a465e81df98b4c18a639000cae6925832023-11-23T19:35:40ZengMDPI AGDiversity1424-28182022-02-0114215010.3390/d14020150A Fast Method for the Selection of Samples in Populations with Available Genealogical DataDalibor Hršak0Ivan Katanić1Strahil Ristov2Ruđer Bošković Institute, 10000 Zagreb, CroatiaFaculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, CroatiaRuđer Bošković Institute, 10000 Zagreb, CroatiaOptimal selection of samples in populations should provide the best coverage of sample variations for the available sampling resources. In populations with known genealogical connections, or pedigrees, this amounts to finding the set of samples with the largest sum of mutual distances in a genealogical tree. We present an optimal, and a faster sub-optimal, method for the selection of <i>K</i> samples from a population of <i>N</i> individuals. The optimal method works in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><msup><mi>K</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>, and the sub-optimal in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mi>K</mi></mrow></semantics></math></inline-formula>, which is more practical for large populations. The sub-optimal algorithm can process pedigrees of millions of individuals in a matter of minutes. With the real-life pedigrees, the difference in the quality of the output of the two algorithms is negligible. We provide the Python3 source codes for the two methods.https://www.mdpi.com/1424-2818/14/2/150genealogical treesampling planoptimal population coveragepedigree samplingmitochondrial DNAY chromosome
spellingShingle Dalibor Hršak
Ivan Katanić
Strahil Ristov
A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
Diversity
genealogical tree
sampling plan
optimal population coverage
pedigree sampling
mitochondrial DNA
Y chromosome
title A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
title_full A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
title_fullStr A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
title_full_unstemmed A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
title_short A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
title_sort fast method for the selection of samples in populations with available genealogical data
topic genealogical tree
sampling plan
optimal population coverage
pedigree sampling
mitochondrial DNA
Y chromosome
url https://www.mdpi.com/1424-2818/14/2/150
work_keys_str_mv AT daliborhrsak afastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata
AT ivankatanic afastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata
AT strahilristov afastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata
AT daliborhrsak fastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata
AT ivankatanic fastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata
AT strahilristov fastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata