RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.
Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2021-01-01
|
Series: | PLoS Genetics |
Online Access: | https://doi.org/10.1371/journal.pgen.1009315 |
_version_ | 1819029809302339584 |
---|---|
author | Ardalan Naseri Junjie Shi Xihong Lin Shaojie Zhang Degui Zhi |
author_facet | Ardalan Naseri Junjie Shi Xihong Lin Shaojie Zhang Degui Zhi |
author_sort | Ardalan Naseri |
collection | DOAJ |
description | Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts. |
first_indexed | 2024-12-21T06:20:09Z |
format | Article |
id | doaj.art-6f9029bfe84946b08c8180966d1612ba |
institution | Directory Open Access Journal |
issn | 1553-7390 1553-7404 |
language | English |
last_indexed | 2024-12-21T06:20:09Z |
publishDate | 2021-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS Genetics |
spelling | doaj.art-6f9029bfe84946b08c8180966d1612ba2022-12-21T19:13:17ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042021-01-01171e100931510.1371/journal.pgen.1009315RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.Ardalan NaseriJunjie ShiXihong LinShaojie ZhangDegui ZhiInference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.https://doi.org/10.1371/journal.pgen.1009315 |
spellingShingle | Ardalan Naseri Junjie Shi Xihong Lin Shaojie Zhang Degui Zhi RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. PLoS Genetics |
title | RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. |
title_full | RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. |
title_fullStr | RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. |
title_full_unstemmed | RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. |
title_short | RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID. |
title_sort | raffi accurate and fast familial relationship inference in large scale biobank studies using rapid |
url | https://doi.org/10.1371/journal.pgen.1009315 |
work_keys_str_mv | AT ardalannaseri raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid AT junjieshi raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid AT xihonglin raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid AT shaojiezhang raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid AT deguizhi raffiaccurateandfastfamilialrelationshipinferenceinlargescalebiobankstudiesusingrapid |