Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies

Functional data analysis has demonstrated significant success in time series analysis. In recent biomedical research, it has also been used to analyze sequence variations in genome-wide association studies (GWAS). The observations of genetic variants, called single-nucleotide polymorphisms (SNPs), o...

Full description

Bibliographic Details
Main Authors: Pei-Yun Sun, Guoqi Qian
Format: Article
Language:English
Published: MDPI AG 2023-06-01
Series:Engineering Proceedings
Subjects:
Online Access:https://www.mdpi.com/2673-4591/39/1/29
_version_ 1797580325116706816
author Pei-Yun Sun
Guoqi Qian
author_facet Pei-Yun Sun
Guoqi Qian
author_sort Pei-Yun Sun
collection DOAJ
description Functional data analysis has demonstrated significant success in time series analysis. In recent biomedical research, it has also been used to analyze sequence variations in genome-wide association studies (GWAS). The observations of genetic variants, called single-nucleotide polymorphisms (SNPs), of an individual are distributed over the loci of a DNA sequence. Thus, it can be regarded as a realization of a stochastic process, which is no different from a time series. However, SNPs are usually coded as the number of minor alleles, which are categorical. The usual least-square smoothing in FDA only works well when the data is continuous and normally distributed. The normality assumption will be violated for categorical SNP data. In this work, we propose a two-step method for smoothing categorical SNPs using a novel method and constructing haplotypes having strong associations with the disease using functional generalized linear models. We show its effectiveness through a real-world PennCATH dataset.
first_indexed 2024-03-10T22:48:27Z
format Article
id doaj.art-91b8e896f5be43419c47d5f8b3628aaa
institution Directory Open Access Journal
issn 2673-4591
language English
last_indexed 2024-03-10T22:48:27Z
publishDate 2023-06-01
publisher MDPI AG
record_format Article
series Engineering Proceedings
spelling doaj.art-91b8e896f5be43419c47d5f8b3628aaa2023-11-19T10:30:35ZengMDPI AGEngineering Proceedings2673-45912023-06-013912910.3390/engproc2023039029Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association StudiesPei-Yun Sun0Guoqi Qian1School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, AustraliaSchool of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, AustraliaFunctional data analysis has demonstrated significant success in time series analysis. In recent biomedical research, it has also been used to analyze sequence variations in genome-wide association studies (GWAS). The observations of genetic variants, called single-nucleotide polymorphisms (SNPs), of an individual are distributed over the loci of a DNA sequence. Thus, it can be regarded as a realization of a stochastic process, which is no different from a time series. However, SNPs are usually coded as the number of minor alleles, which are categorical. The usual least-square smoothing in FDA only works well when the data is continuous and normally distributed. The normality assumption will be violated for categorical SNP data. In this work, we propose a two-step method for smoothing categorical SNPs using a novel method and constructing haplotypes having strong associations with the disease using functional generalized linear models. We show its effectiveness through a real-world PennCATH dataset.https://www.mdpi.com/2673-4591/39/1/29stochastic processfunctional data analysisgenome-wide association studyepistasishaplotypevariable selection
spellingShingle Pei-Yun Sun
Guoqi Qian
Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
Engineering Proceedings
stochastic process
functional data analysis
genome-wide association study
epistasis
haplotype
variable selection
title Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
title_full Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
title_fullStr Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
title_full_unstemmed Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
title_short Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
title_sort statistical haplotypes based on functional sequence data analysis for genome wide association studies
topic stochastic process
functional data analysis
genome-wide association study
epistasis
haplotype
variable selection
url https://www.mdpi.com/2673-4591/39/1/29
work_keys_str_mv AT peiyunsun statisticalhaplotypesbasedonfunctionalsequencedataanalysisforgenomewideassociationstudies
AT guoqiqian statisticalhaplotypesbasedonfunctionalsequencedataanalysisforgenomewideassociationstudies