A Survey of SNP Data Analysis

Every person differs from every other person regarding their physical appearance, susceptibility to disease, response to medications, and so on. However, 99.9 percent of human DNA is the same. As such, differences in human genomes are very worthy of study. Single-Nucleotide Polymorphisms (SNPs) are...

Full description

Bibliographic Details
Main Authors: Xiaojun Ding, Xuan Guo
Format: Article
Language:English
Published: Tsinghua University Press 2018-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2018.9020015
_version_ 1811250416989503488
author Xiaojun Ding
Xuan Guo
author_facet Xiaojun Ding
Xuan Guo
author_sort Xiaojun Ding
collection DOAJ
description Every person differs from every other person regarding their physical appearance, susceptibility to disease, response to medications, and so on. However, 99.9 percent of human DNA is the same. As such, differences in human genomes are very worthy of study. Single-Nucleotide Polymorphisms (SNPs) are the simplest form and most common source of genetic polymorphism. SNPs have been used to successfully identify defective genes that cause Mendelian diseases. However, most common human diseases are complex and are caused by multiple SNPs. Each SNP explains only a small fraction of genetic causes. Experiments on individual SNPs may reveal their non-detectable effects on complex diseases. Pathogenesis is a complicated topic, and it is difficult to correctly predict multiple SNPs. As such, the analysis of SNP data is a critical task in the study of genetic diseases. In this paper, we divide the methods for genome-wide SNP data analysis into two categories: single-trait Genome-Wide Association Studies (GWAS) in which pathology is mined from data of a single phenotype, and multiple-trait GWAS which identifies cross-phenotype associations. For single-trait GWAS, we review methods ranging from the simple to the complex, including TEAM, BOOST, AntEpiSeeker, SNPRuler, EDCF, HiSeeker, ORF, MLR-tagging, MSCD, and MIC. For multiple-trait GWAS, we describe methods in terms of their employed regression models, dimension-reduction methods, and meta-analysis methods. We also list the advantages and disadvantages of these methods. Finally, we discuss the future directions of SNP data analysis for genome-wide association.
first_indexed 2024-04-12T16:04:23Z
format Article
id doaj.art-57deddb47a3a4e45be5ec65d5c0d3cb7
institution Directory Open Access Journal
issn 2096-0654
language English
last_indexed 2024-04-12T16:04:23Z
publishDate 2018-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj.art-57deddb47a3a4e45be5ec65d5c0d3cb72022-12-22T03:26:07ZengTsinghua University PressBig Data Mining and Analytics2096-06542018-09-011317319010.26599/BDMA.2018.9020015A Survey of SNP Data AnalysisXiaojun Ding0Xuan Guo1<institution content-type="dept">School of Computer Science and Engineering</institution>, <institution>Yulin Normal University</institution>, <city>Yulin</city> <postal-code>537000</postal-code>, and <institution content-type="dept">School of Information Engineering</institution>, <institution>Jiangxi University of Science and Technology</institution>, <city>Ganzhou </city><postal-code>341000</postal-code>, <country>China</country>.<institution content-type="dept">Department of Computer Science and Engineering</institution>, <institution>University of North Texas</institution>, <city>Denton</city>, <state>TX</state> <postal-code>76203-5017</postal-code>, <country>USA</country>.Every person differs from every other person regarding their physical appearance, susceptibility to disease, response to medications, and so on. However, 99.9 percent of human DNA is the same. As such, differences in human genomes are very worthy of study. Single-Nucleotide Polymorphisms (SNPs) are the simplest form and most common source of genetic polymorphism. SNPs have been used to successfully identify defective genes that cause Mendelian diseases. However, most common human diseases are complex and are caused by multiple SNPs. Each SNP explains only a small fraction of genetic causes. Experiments on individual SNPs may reveal their non-detectable effects on complex diseases. Pathogenesis is a complicated topic, and it is difficult to correctly predict multiple SNPs. As such, the analysis of SNP data is a critical task in the study of genetic diseases. In this paper, we divide the methods for genome-wide SNP data analysis into two categories: single-trait Genome-Wide Association Studies (GWAS) in which pathology is mined from data of a single phenotype, and multiple-trait GWAS which identifies cross-phenotype associations. For single-trait GWAS, we review methods ranging from the simple to the complex, including TEAM, BOOST, AntEpiSeeker, SNPRuler, EDCF, HiSeeker, ORF, MLR-tagging, MSCD, and MIC. For multiple-trait GWAS, we describe methods in terms of their employed regression models, dimension-reduction methods, and meta-analysis methods. We also list the advantages and disadvantages of these methods. Finally, we discuss the future directions of SNP data analysis for genome-wide association.https://www.sciopen.com/article/10.26599/BDMA.2018.9020015snp interactionssnp combinationsgwascase-control studydisease association analysiscross-phenotype association studies
spellingShingle Xiaojun Ding
Xuan Guo
A Survey of SNP Data Analysis
Big Data Mining and Analytics
snp interactions
snp combinations
gwas
case-control study
disease association analysis
cross-phenotype association studies
title A Survey of SNP Data Analysis
title_full A Survey of SNP Data Analysis
title_fullStr A Survey of SNP Data Analysis
title_full_unstemmed A Survey of SNP Data Analysis
title_short A Survey of SNP Data Analysis
title_sort survey of snp data analysis
topic snp interactions
snp combinations
gwas
case-control study
disease association analysis
cross-phenotype association studies
url https://www.sciopen.com/article/10.26599/BDMA.2018.9020015
work_keys_str_mv AT xiaojunding asurveyofsnpdataanalysis
AT xuanguo asurveyofsnpdataanalysis
AT xiaojunding surveyofsnpdataanalysis
AT xuanguo surveyofsnpdataanalysis