An effective filter for IBD detection in large data sets.

Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient po...

Full description

Bibliographic Details
Main Authors:	Lin Huang, Sivan Bercovici, Jesse M Rodriguez, Serafim Batzoglou
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2014-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC3965454?pdf=render

_version_	1828852566763503616
author	Lin Huang Sivan Bercovici Jesse M Rodriguez Serafim Batzoglou
author_facet	Lin Huang Sivan Bercovici Jesse M Rodriguez Serafim Batzoglou
author_sort	Lin Huang
collection	DOAJ
description	Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segments that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts. To enable the study of large cohorts, we introduce SpeeDB, a method that facilitates fast IBD detection in large unphased genotype data sets. Given a target individual and a database of individuals that potentially share IBD segments with the target, SpeeDB applies an efficient opposite-homozygous filter, which excludes chromosomal segments from the database that are highly unlikely to be IBD with the corresponding segments from the target individual. The remaining segments can then be evaluated by any IBD detection method of choice. When examining simulated individuals sharing 4 cM IBD regions, SpeeDB filtered out 99.5% of genomic regions from consideration while retaining 99% of the true IBD segments. Applying the SpeeDB filter prior to detecting IBD in simulated fourth cousins resulted in an overall running time that was 10,000x faster than inferring IBD without the filter and retained 99% of the true IBD segments in the output.
first_indexed	2024-12-12T23:55:57Z
format	Article
id	doaj.art-1a63a283442f4dd8b107a7d8642680a0
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-12-12T23:55:57Z
publishDate	2014-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-1a63a283442f4dd8b107a7d8642680a02022-12-22T00:06:35ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0193e9271310.1371/journal.pone.0092713An effective filter for IBD detection in large data sets.Lin HuangSivan BercoviciJesse M RodriguezSerafim BatzoglouIdentity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segments that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts. To enable the study of large cohorts, we introduce SpeeDB, a method that facilitates fast IBD detection in large unphased genotype data sets. Given a target individual and a database of individuals that potentially share IBD segments with the target, SpeeDB applies an efficient opposite-homozygous filter, which excludes chromosomal segments from the database that are highly unlikely to be IBD with the corresponding segments from the target individual. The remaining segments can then be evaluated by any IBD detection method of choice. When examining simulated individuals sharing 4 cM IBD regions, SpeeDB filtered out 99.5% of genomic regions from consideration while retaining 99% of the true IBD segments. Applying the SpeeDB filter prior to detecting IBD in simulated fourth cousins resulted in an overall running time that was 10,000x faster than inferring IBD without the filter and retained 99% of the true IBD segments in the output.http://europepmc.org/articles/PMC3965454?pdf=render
spellingShingle	Lin Huang Sivan Bercovici Jesse M Rodriguez Serafim Batzoglou An effective filter for IBD detection in large data sets. PLoS ONE
title	An effective filter for IBD detection in large data sets.
title_full	An effective filter for IBD detection in large data sets.
title_fullStr	An effective filter for IBD detection in large data sets.
title_full_unstemmed	An effective filter for IBD detection in large data sets.
title_short	An effective filter for IBD detection in large data sets.
title_sort	effective filter for ibd detection in large data sets
url	http://europepmc.org/articles/PMC3965454?pdf=render
work_keys_str_mv	AT linhuang aneffectivefilterforibddetectioninlargedatasets AT sivanbercovici aneffectivefilterforibddetectioninlargedatasets AT jessemrodriguez aneffectivefilterforibddetectioninlargedatasets AT serafimbatzoglou aneffectivefilterforibddetectioninlargedatasets AT linhuang effectivefilterforibddetectioninlargedatasets AT sivanbercovici effectivefilterforibddetectioninlargedatasets AT jessemrodriguez effectivefilterforibddetectioninlargedatasets AT serafimbatzoglou effectivefilterforibddetectioninlargedatasets

An effective filter for IBD detection in large data sets.

Similar Items