Fast and accurate detection of Identical-By-Descent segment sharing in large genomic datasets

<p>Identical-By-Descent (IBD) segments are a fundamental measure of ge- netic relatedness and IBD detection is of great interest in many applica- tions, including demographic inference and haplotype-based association. Algorithms that scale to large datasets (more than 10.000 diploid indi- vidu...

Повний опис

Бібліографічні деталі
Автор: Saada, J
Інші автори: Palamara, P
Формат: Дисертація
Мова:English
Опубліковано: 2019
Предмети:
Опис
Резюме:<p>Identical-By-Descent (IBD) segments are a fundamental measure of ge- netic relatedness and IBD detection is of great interest in many applica- tions, including demographic inference and haplotype-based association. Algorithms that scale to large datasets (more than 10.000 diploid indi- viduals), such as GERMLINE and RaPID sacrifice modeling accuracy in favor of computational speed. Efficient model-based algorithms for IBD detection such as RefinedIBD, on the other hand, enable detecting shorter IBD segments, but do not scale as well in large data sets. In this thesis, we present two new approaches, called PASMC and GASMC, aiming to bridge the gap between these two paradigms. We introduce the basics of coalescent theory to understand IBD segments detection and then pro- vide an overview of a state-of-the art methods for detecting IBD segments, comparing the most commonly used and recent existing methods. We fi- nally present PASMC and GASMC. Both methods utilize efficient string matching to detect candidate IBD segments and subsequently apply the recently developed ASMC algorithm based on coalescent HMM to filter out candidate segments that are unlikely to be IBD.</p>