RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals...

Full description

Bibliographic Details
Main Authors: Ardalan Naseri, Junjie Shi, Xihong Lin, Shaojie Zhang, Degui Zhi
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-01-01
Series:PLoS Genetics
Online Access:https://doi.org/10.1371/journal.pgen.1009315
_version_ 1819029809302339584
author Ardalan Naseri
Junjie Shi
Xihong Lin
Shaojie Zhang
Degui Zhi
author_facet Ardalan Naseri
Junjie Shi
Xihong Lin
Shaojie Zhang
Degui Zhi
author_sort Ardalan Naseri
collection DOAJ
description Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.
first_indexed 2024-12-21T06:20:09Z
format Article
id doaj.art-6f9029bfe84946b08c8180966d1612ba
institution Directory Open Access Journal
issn 1553-7390
1553-7404
language English
last_indexed 2024-12-21T06:20:09Z
publishDate 2021-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj.art-6f9029bfe84946b08c8180966d1612ba2022-12-21T19:13:17ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042021-01-01171e100931510.1371/journal.pgen.1009315RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.Ardalan NaseriJunjie ShiXihong LinShaojie ZhangDegui ZhiInference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.https://doi.org/10.1371/journal.pgen.1009315
spellingShingle Ardalan Naseri
Junjie Shi
Xihong Lin
Shaojie Zhang
Degui Zhi
RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
PLoS Genetics
title RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
title_full RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
title_fullStr RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
title_full_unstemmed RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
title_short RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
title_sort raffi accurate and fast familial relationship inference in large scale biobank studies using rapid
url https://doi.org/10.1371/journal.pgen.1009315
work_keys_str_mv AT ardalannaseri raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid
AT junjieshi raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid
AT xihonglin raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid
AT shaojiezhang raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid
AT deguizhi raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid