A Fast Method for the Selection of Samples in Populations with Available Genealogical Data
Optimal selection of samples in populations should provide the best coverage of sample variations for the available sampling resources. In populations with known genealogical connections, or pedigrees, this amounts to finding the set of samples with the largest sum of mutual distances in a genealogi...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | Diversity |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-2818/14/2/150 |
_version_ | 1797481032782446592 |
---|---|
author | Dalibor Hršak Ivan Katanić Strahil Ristov |
author_facet | Dalibor Hršak Ivan Katanić Strahil Ristov |
author_sort | Dalibor Hršak |
collection | DOAJ |
description | Optimal selection of samples in populations should provide the best coverage of sample variations for the available sampling resources. In populations with known genealogical connections, or pedigrees, this amounts to finding the set of samples with the largest sum of mutual distances in a genealogical tree. We present an optimal, and a faster sub-optimal, method for the selection of <i>K</i> samples from a population of <i>N</i> individuals. The optimal method works in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><msup><mi>K</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>, and the sub-optimal in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mi>K</mi></mrow></semantics></math></inline-formula>, which is more practical for large populations. The sub-optimal algorithm can process pedigrees of millions of individuals in a matter of minutes. With the real-life pedigrees, the difference in the quality of the output of the two algorithms is negligible. We provide the Python3 source codes for the two methods. |
first_indexed | 2024-03-09T22:09:37Z |
format | Article |
id | doaj.art-a465e81df98b4c18a639000cae692583 |
institution | Directory Open Access Journal |
issn | 1424-2818 |
language | English |
last_indexed | 2024-03-09T22:09:37Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Diversity |
spelling | doaj.art-a465e81df98b4c18a639000cae6925832023-11-23T19:35:40ZengMDPI AGDiversity1424-28182022-02-0114215010.3390/d14020150A Fast Method for the Selection of Samples in Populations with Available Genealogical DataDalibor Hršak0Ivan Katanić1Strahil Ristov2Ruđer Bošković Institute, 10000 Zagreb, CroatiaFaculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, CroatiaRuđer Bošković Institute, 10000 Zagreb, CroatiaOptimal selection of samples in populations should provide the best coverage of sample variations for the available sampling resources. In populations with known genealogical connections, or pedigrees, this amounts to finding the set of samples with the largest sum of mutual distances in a genealogical tree. We present an optimal, and a faster sub-optimal, method for the selection of <i>K</i> samples from a population of <i>N</i> individuals. The optimal method works in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><msup><mi>K</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>, and the sub-optimal in time proportional to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>N</mi><mi>K</mi></mrow></semantics></math></inline-formula>, which is more practical for large populations. The sub-optimal algorithm can process pedigrees of millions of individuals in a matter of minutes. With the real-life pedigrees, the difference in the quality of the output of the two algorithms is negligible. We provide the Python3 source codes for the two methods.https://www.mdpi.com/1424-2818/14/2/150genealogical treesampling planoptimal population coveragepedigree samplingmitochondrial DNAY chromosome |
spellingShingle | Dalibor Hršak Ivan Katanić Strahil Ristov A Fast Method for the Selection of Samples in Populations with Available Genealogical Data Diversity genealogical tree sampling plan optimal population coverage pedigree sampling mitochondrial DNA Y chromosome |
title | A Fast Method for the Selection of Samples in Populations with Available Genealogical Data |
title_full | A Fast Method for the Selection of Samples in Populations with Available Genealogical Data |
title_fullStr | A Fast Method for the Selection of Samples in Populations with Available Genealogical Data |
title_full_unstemmed | A Fast Method for the Selection of Samples in Populations with Available Genealogical Data |
title_short | A Fast Method for the Selection of Samples in Populations with Available Genealogical Data |
title_sort | fast method for the selection of samples in populations with available genealogical data |
topic | genealogical tree sampling plan optimal population coverage pedigree sampling mitochondrial DNA Y chromosome |
url | https://www.mdpi.com/1424-2818/14/2/150 |
work_keys_str_mv | AT daliborhrsak afastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata AT ivankatanic afastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata AT strahilristov afastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata AT daliborhrsak fastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata AT ivankatanic fastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata AT strahilristov fastmethodfortheselectionofsamplesinpopulationswithavailablegenealogicaldata |