Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies

<p>Abstract</p> <p>Background</p> <p>With the completion of the international HapMap project, many studies have been conducted to investigate the association between complex diseases and haplotype variants. Such haplotype-based association studies, however, often face t...

Full description

Bibliographic Details
Main Authors: Huang Su-Yun, Tzeng Jung-Ying, Lee Mei-Hsien, Hsiao Chuhsing
Format: Article
Language:English
Published: BMC 2011-05-01
Series:BMC Genetics
Online Access:http://www.biomedcentral.com/1471-2156/12/48
_version_ 1811291152337338368
author Huang Su-Yun
Tzeng Jung-Ying
Lee Mei-Hsien
Hsiao Chuhsing
author_facet Huang Su-Yun
Tzeng Jung-Ying
Lee Mei-Hsien
Hsiao Chuhsing
author_sort Huang Su-Yun
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>With the completion of the international HapMap project, many studies have been conducted to investigate the association between complex diseases and haplotype variants. Such haplotype-based association studies, however, often face two difficulties; one is the large number of haplotype configurations in the chromosome region under study, and the other is the ambiguity in haplotype phase when only genotype data are observed. The latter complexity may be handled based on an EM algorithm with family data incorporated, whereas the former can be more problematic, especially when haplotypes of rare frequencies are involved. Here based on family data we propose to cluster long haplotypes of linked SNPs in a biological sense, so that the number of haplotypes can be reduced and the power of statistical tests of association can be increased.</p> <p>Results</p> <p>In this paper we employ family genotype data and combine a clustering scheme with a likelihood ratio statistic to test the association between quantitative phenotypes and haplotype variants. Haplotypes are first grouped based on their evolutionary closeness to establish a set containing core haplotypes. Then, we construct for each family the transmission and non-transmission phase in terms of these core haplotypes, taking into account simultaneously the phase ambiguity as weights. The likelihood ratio test (LRT) is next conducted with these weighted and clustered haplotypes to test for association with disease. This combination of evolution-guided haplotype clustering and weighted assignment in LRT is able, via its core-coding system, to incorporate into analysis both haplotype phase ambiguity and transmission uncertainty. Simulation studies show that this proposed procedure is more informative and powerful than three family-based association tests, FAMHAP, FBAT, and an LRT with a group consisting exclusively of rare haplotypes.</p> <p>Conclusions</p> <p>The proposed procedure takes into account the uncertainty in phase determination and in transmission, utilizes the evolutionary information contained in haplotypes, reduces the dimension in haplotype space and the degrees of freedom in tests, and performs better in association studies. This evolution-guided clustering procedure is particularly useful for long haplotypes containing linked SNPs, and is applicable to other haplotype-based association tests. This procedure is now implemented in R and is free for download.</p>
first_indexed 2024-04-13T04:25:13Z
format Article
id doaj.art-a7a3fd40d50b4f76b83e24bb2aae3a03
institution Directory Open Access Journal
issn 1471-2156
language English
last_indexed 2024-04-13T04:25:13Z
publishDate 2011-05-01
publisher BMC
record_format Article
series BMC Genetics
spelling doaj.art-a7a3fd40d50b4f76b83e24bb2aae3a032022-12-22T03:02:34ZengBMCBMC Genetics1471-21562011-05-011214810.1186/1471-2156-12-48Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association StudiesHuang Su-YunTzeng Jung-YingLee Mei-HsienHsiao Chuhsing<p>Abstract</p> <p>Background</p> <p>With the completion of the international HapMap project, many studies have been conducted to investigate the association between complex diseases and haplotype variants. Such haplotype-based association studies, however, often face two difficulties; one is the large number of haplotype configurations in the chromosome region under study, and the other is the ambiguity in haplotype phase when only genotype data are observed. The latter complexity may be handled based on an EM algorithm with family data incorporated, whereas the former can be more problematic, especially when haplotypes of rare frequencies are involved. Here based on family data we propose to cluster long haplotypes of linked SNPs in a biological sense, so that the number of haplotypes can be reduced and the power of statistical tests of association can be increased.</p> <p>Results</p> <p>In this paper we employ family genotype data and combine a clustering scheme with a likelihood ratio statistic to test the association between quantitative phenotypes and haplotype variants. Haplotypes are first grouped based on their evolutionary closeness to establish a set containing core haplotypes. Then, we construct for each family the transmission and non-transmission phase in terms of these core haplotypes, taking into account simultaneously the phase ambiguity as weights. The likelihood ratio test (LRT) is next conducted with these weighted and clustered haplotypes to test for association with disease. This combination of evolution-guided haplotype clustering and weighted assignment in LRT is able, via its core-coding system, to incorporate into analysis both haplotype phase ambiguity and transmission uncertainty. Simulation studies show that this proposed procedure is more informative and powerful than three family-based association tests, FAMHAP, FBAT, and an LRT with a group consisting exclusively of rare haplotypes.</p> <p>Conclusions</p> <p>The proposed procedure takes into account the uncertainty in phase determination and in transmission, utilizes the evolutionary information contained in haplotypes, reduces the dimension in haplotype space and the degrees of freedom in tests, and performs better in association studies. This evolution-guided clustering procedure is particularly useful for long haplotypes containing linked SNPs, and is applicable to other haplotype-based association tests. This procedure is now implemented in R and is free for download.</p>http://www.biomedcentral.com/1471-2156/12/48
spellingShingle Huang Su-Yun
Tzeng Jung-Ying
Lee Mei-Hsien
Hsiao Chuhsing
Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies
BMC Genetics
title Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies
title_full Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies
title_fullStr Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies
title_full_unstemmed Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies
title_short Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies
title_sort combining an evolution guided clustering algorithm and haplotype based lrt in family association studies
url http://www.biomedcentral.com/1471-2156/12/48
work_keys_str_mv AT huangsuyun combininganevolutionguidedclusteringalgorithmandhaplotypebasedlrtinfamilyassociationstudies
AT tzengjungying combininganevolutionguidedclusteringalgorithmandhaplotypebasedlrtinfamilyassociationstudies
AT leemeihsien combininganevolutionguidedclusteringalgorithmandhaplotypebasedlrtinfamilyassociationstudies
AT hsiaochuhsing combininganevolutionguidedclusteringalgorithmandhaplotypebasedlrtinfamilyassociationstudies