Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data

Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing dept...

Full description

Bibliographic Details
Main Authors: Zheng Xu, Song Yan, Shuai Yuan, Cong Wu, Sixia Chen, Zifang Guo, Yun Li
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Stats
Subjects:
Online Access:https://www.mdpi.com/2571-905X/6/1/29
_version_ 1827747578017480704
author Zheng Xu
Song Yan
Shuai Yuan
Cong Wu
Sixia Chen
Zifang Guo
Yun Li
author_facet Zheng Xu
Song Yan
Shuai Yuan
Cong Wu
Sixia Chen
Zifang Guo
Yun Li
author_sort Zheng Xu
collection DOAJ
description Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>.
first_indexed 2024-03-11T05:54:43Z
format Article
id doaj.art-ce0fb09a0ce34ae7bbd2f475a9ea9796
institution Directory Open Access Journal
issn 2571-905X
language English
last_indexed 2024-03-11T05:54:43Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Stats
spelling doaj.art-ce0fb09a0ce34ae7bbd2f475a9ea97962023-11-17T13:54:07ZengMDPI AGStats2571-905X2023-03-016146848110.3390/stats6010029Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing DataZheng Xu0Song Yan1Shuai Yuan2Cong Wu3Sixia Chen4Zifang Guo5Yun Li6Department of Mathematics and Statistics, Wright State University, Dayton, OH 45324, USADepartment of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USAGlaxosmithkline, plc, Collegeville, PA 19426, USADepartment of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68508, USADepartment of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USAMerck & Co., Inc., Rahway, NJ 07065, USADepartment of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USASequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>.https://www.mdpi.com/2571-905X/6/1/29association studynext-generation sequencinggenotypegenotype likelihood functiontesting
spellingShingle Zheng Xu
Song Yan
Shuai Yuan
Cong Wu
Sixia Chen
Zifang Guo
Yun Li
Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
Stats
association study
next-generation sequencing
genotype
genotype likelihood function
testing
title Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
title_full Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
title_fullStr Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
title_full_unstemmed Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
title_short Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
title_sort efficient two stage analysis for complex trait association with arbitrary depth sequencing data
topic association study
next-generation sequencing
genotype
genotype likelihood function
testing
url https://www.mdpi.com/2571-905X/6/1/29
work_keys_str_mv AT zhengxu efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata
AT songyan efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata
AT shuaiyuan efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata
AT congwu efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata
AT sixiachen efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata
AT zifangguo efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata
AT yunli efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata