Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data
Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing dept...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Stats |
Subjects: | |
Online Access: | https://www.mdpi.com/2571-905X/6/1/29 |
_version_ | 1827747578017480704 |
---|---|
author | Zheng Xu Song Yan Shuai Yuan Cong Wu Sixia Chen Zifang Guo Yun Li |
author_facet | Zheng Xu Song Yan Shuai Yuan Cong Wu Sixia Chen Zifang Guo Yun Li |
author_sort | Zheng Xu |
collection | DOAJ |
description | Sequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>. |
first_indexed | 2024-03-11T05:54:43Z |
format | Article |
id | doaj.art-ce0fb09a0ce34ae7bbd2f475a9ea9796 |
institution | Directory Open Access Journal |
issn | 2571-905X |
language | English |
last_indexed | 2024-03-11T05:54:43Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Stats |
spelling | doaj.art-ce0fb09a0ce34ae7bbd2f475a9ea97962023-11-17T13:54:07ZengMDPI AGStats2571-905X2023-03-016146848110.3390/stats6010029Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing DataZheng Xu0Song Yan1Shuai Yuan2Cong Wu3Sixia Chen4Zifang Guo5Yun Li6Department of Mathematics and Statistics, Wright State University, Dayton, OH 45324, USADepartment of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USAGlaxosmithkline, plc, Collegeville, PA 19426, USADepartment of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68508, USADepartment of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USAMerck & Co., Inc., Rahway, NJ 07065, USADepartment of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USASequencing-based genetic association analysis is typically performed by first generating genotype calls from sequence data and then performing association tests on the called genotypes. Standard approaches require accurate genotype calling (GC), which can be achieved either with high sequencing depth (typically available in a small number of individuals) or via computationally intensive multi-sample linkage disequilibrium (LD)-aware methods. We propose a computationally efficient two-stage combination approach for association analysis, in which single-nucleotide polymorphisms (SNPs) are screened in the first stage via a rapid maximum likelihood (ML)-based method on sequence data directly (without first calling genotypes), and then the selected SNPs are evaluated in the second stage by performing association tests on genotypes from multi-sample LD-aware calling. Extensive simulation- and real data-based studies show that the proposed two-stage approaches can save 80% of the computational costs and still obtain more than 90% of the power of the classical method to genotype all markers at various depths <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>d</mi><mo>≥</mo><mn>2</mn></mrow></semantics></math></inline-formula>.https://www.mdpi.com/2571-905X/6/1/29association studynext-generation sequencinggenotypegenotype likelihood functiontesting |
spellingShingle | Zheng Xu Song Yan Shuai Yuan Cong Wu Sixia Chen Zifang Guo Yun Li Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data Stats association study next-generation sequencing genotype genotype likelihood function testing |
title | Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data |
title_full | Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data |
title_fullStr | Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data |
title_full_unstemmed | Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data |
title_short | Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data |
title_sort | efficient two stage analysis for complex trait association with arbitrary depth sequencing data |
topic | association study next-generation sequencing genotype genotype likelihood function testing |
url | https://www.mdpi.com/2571-905X/6/1/29 |
work_keys_str_mv | AT zhengxu efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata AT songyan efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata AT shuaiyuan efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata AT congwu efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata AT sixiachen efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata AT zifangguo efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata AT yunli efficienttwostageanalysisforcomplextraitassociationwitharbitrarydepthsequencingdata |