Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations

<p>Abstract</p> <p>Background</p> <p>Ancestry informative markers (AIMs) are a type of genetic marker that is informative for tracing the ancestral ethnicity of individuals. Application of AIMs has gained substantial attention in population genetics, forensic sciences,...

Full description

Bibliographic Details
Main Authors: Yang Hsin-Chou, Wang Pei-Li, Lin Chien-Wei, Chen Chien-Hsiun, Chen Chun-Houh
Format: Article
Language:English
Published: BMC 2012-07-01
Series:BMC Genomics
Subjects:
Online Access:http://www.biomedcentral.com/1471-2164/13/346
_version_ 1811278529667530752
author Yang Hsin-Chou
Wang Pei-Li
Lin Chien-Wei
Chen Chien-Hsiun
Chen Chun-Houh
author_facet Yang Hsin-Chou
Wang Pei-Li
Lin Chien-Wei
Chen Chien-Hsiun
Chen Chun-Houh
author_sort Yang Hsin-Chou
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Ancestry informative markers (AIMs) are a type of genetic marker that is informative for tracing the ancestral ethnicity of individuals. Application of AIMs has gained substantial attention in population genetics, forensic sciences, and medical genetics. Single nucleotide polymorphisms (SNPs), the materials of AIMs, are useful for classifying individuals from distinct continental origins but cannot discriminate individuals with subtle genetic differences from closely related ancestral lineages. Proof-of-principle studies have shown that gene expression (GE) also is a heritable human variation that exhibits differential intensity distributions among ethnic groups. GE supplies ethnic information supplemental to SNPs; this motivated us to integrate SNP and GE markers to construct AIM panels with a reduced number of required markers and provide high accuracy in ancestry inference. Few studies in the literature have considered GE in this aspect, and none have integrated SNP and GE markers to aid classification of samples from closely related ethnic populations.</p> <p>Results</p> <p>We integrated a forward variable selection procedure into flexible discriminant analysis to identify key SNP and/or GE markers with the highest cross-validation prediction accuracy. By analyzing genome-wide SNP and/or GE markers in 210 independent samples from four ethnic groups in the HapMap II Project, we found that average testing accuracies for a majority of classification analyses were quite high, except for SNP-only analyses that were performed to discern study samples containing individuals from two close Asian populations. The average testing accuracies ranged from 0.53 to 0.79 for SNP-only analyses and increased to around 0.90 when GE markers were integrated together with SNP markers for the classification of samples from closely related Asian populations. Compared to GE-only analyses, integrative analyses of SNP and GE markers showed comparable testing accuracies and a reduced number of selected markers in AIM panels.</p> <p>Conclusions</p> <p>Integrative analysis of SNP and GE markers provides high-accuracy and/or cost-effective classification results for assigning samples from closely related or distantly related ancestral lineages to their original ancestral populations. User-friendly BIASLESS (<b>B</b>iomarkers <b>I</b>dentification <b>a</b>nd <b>S</b>amp<b>les S</b>ubdivision) software was developed as an efficient tool for selecting key SNP and/or GE markers and then building models for sample subdivision. BIASLESS was programmed in R and R-GUI and is available online at <url>http://www.stat.sinica.edu.tw/hsinchou/genetics/prediction/BIASLESS.htm</url>.</p>
first_indexed 2024-04-13T00:37:29Z
format Article
id doaj.art-23efc8fcf945490e9108a8389af459c9
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-04-13T00:37:29Z
publishDate 2012-07-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-23efc8fcf945490e9108a8389af459c92022-12-22T03:10:18ZengBMCBMC Genomics1471-21642012-07-0113134610.1186/1471-2164-13-346Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populationsYang Hsin-ChouWang Pei-LiLin Chien-WeiChen Chien-HsiunChen Chun-Houh<p>Abstract</p> <p>Background</p> <p>Ancestry informative markers (AIMs) are a type of genetic marker that is informative for tracing the ancestral ethnicity of individuals. Application of AIMs has gained substantial attention in population genetics, forensic sciences, and medical genetics. Single nucleotide polymorphisms (SNPs), the materials of AIMs, are useful for classifying individuals from distinct continental origins but cannot discriminate individuals with subtle genetic differences from closely related ancestral lineages. Proof-of-principle studies have shown that gene expression (GE) also is a heritable human variation that exhibits differential intensity distributions among ethnic groups. GE supplies ethnic information supplemental to SNPs; this motivated us to integrate SNP and GE markers to construct AIM panels with a reduced number of required markers and provide high accuracy in ancestry inference. Few studies in the literature have considered GE in this aspect, and none have integrated SNP and GE markers to aid classification of samples from closely related ethnic populations.</p> <p>Results</p> <p>We integrated a forward variable selection procedure into flexible discriminant analysis to identify key SNP and/or GE markers with the highest cross-validation prediction accuracy. By analyzing genome-wide SNP and/or GE markers in 210 independent samples from four ethnic groups in the HapMap II Project, we found that average testing accuracies for a majority of classification analyses were quite high, except for SNP-only analyses that were performed to discern study samples containing individuals from two close Asian populations. The average testing accuracies ranged from 0.53 to 0.79 for SNP-only analyses and increased to around 0.90 when GE markers were integrated together with SNP markers for the classification of samples from closely related Asian populations. Compared to GE-only analyses, integrative analyses of SNP and GE markers showed comparable testing accuracies and a reduced number of selected markers in AIM panels.</p> <p>Conclusions</p> <p>Integrative analysis of SNP and GE markers provides high-accuracy and/or cost-effective classification results for assigning samples from closely related or distantly related ancestral lineages to their original ancestral populations. User-friendly BIASLESS (<b>B</b>iomarkers <b>I</b>dentification <b>a</b>nd <b>S</b>amp<b>les S</b>ubdivision) software was developed as an efficient tool for selecting key SNP and/or GE markers and then building models for sample subdivision. BIASLESS was programmed in R and R-GUI and is available online at <url>http://www.stat.sinica.edu.tw/hsinchou/genetics/prediction/BIASLESS.htm</url>.</p>http://www.biomedcentral.com/1471-2164/13/346Single nucleotide polymorphism (SNP)Allele frequencyGene expressionHapMapClassification analysisAncestry informative marker (AIM)
spellingShingle Yang Hsin-Chou
Wang Pei-Li
Lin Chien-Wei
Chen Chien-Hsiun
Chen Chun-Houh
Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
BMC Genomics
Single nucleotide polymorphism (SNP)
Allele frequency
Gene expression
HapMap
Classification analysis
Ancestry informative marker (AIM)
title Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
title_full Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
title_fullStr Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
title_full_unstemmed Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
title_short Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
title_sort integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations
topic Single nucleotide polymorphism (SNP)
Allele frequency
Gene expression
HapMap
Classification analysis
Ancestry informative marker (AIM)
url http://www.biomedcentral.com/1471-2164/13/346
work_keys_str_mv AT yanghsinchou integrativeanalysisofsinglenucleotidepolymorphismsandgeneexpressionefficientlydistinguishessamplesfromcloselyrelatedethnicpopulations
AT wangpeili integrativeanalysisofsinglenucleotidepolymorphismsandgeneexpressionefficientlydistinguishessamplesfromcloselyrelatedethnicpopulations
AT linchienwei integrativeanalysisofsinglenucleotidepolymorphismsandgeneexpressionefficientlydistinguishessamplesfromcloselyrelatedethnicpopulations
AT chenchienhsiun integrativeanalysisofsinglenucleotidepolymorphismsandgeneexpressionefficientlydistinguishessamplesfromcloselyrelatedethnicpopulations
AT chenchunhouh integrativeanalysisofsinglenucleotidepolymorphismsandgeneexpressionefficientlydistinguishessamplesfromcloselyrelatedethnicpopulations