Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies
Functional data analysis has demonstrated significant success in time series analysis. In recent biomedical research, it has also been used to analyze sequence variations in genome-wide association studies (GWAS). The observations of genetic variants, called single-nucleotide polymorphisms (SNPs), o...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-06-01
|
Series: | Engineering Proceedings |
Subjects: | |
Online Access: | https://www.mdpi.com/2673-4591/39/1/29 |
_version_ | 1797580325116706816 |
---|---|
author | Pei-Yun Sun Guoqi Qian |
author_facet | Pei-Yun Sun Guoqi Qian |
author_sort | Pei-Yun Sun |
collection | DOAJ |
description | Functional data analysis has demonstrated significant success in time series analysis. In recent biomedical research, it has also been used to analyze sequence variations in genome-wide association studies (GWAS). The observations of genetic variants, called single-nucleotide polymorphisms (SNPs), of an individual are distributed over the loci of a DNA sequence. Thus, it can be regarded as a realization of a stochastic process, which is no different from a time series. However, SNPs are usually coded as the number of minor alleles, which are categorical. The usual least-square smoothing in FDA only works well when the data is continuous and normally distributed. The normality assumption will be violated for categorical SNP data. In this work, we propose a two-step method for smoothing categorical SNPs using a novel method and constructing haplotypes having strong associations with the disease using functional generalized linear models. We show its effectiveness through a real-world PennCATH dataset. |
first_indexed | 2024-03-10T22:48:27Z |
format | Article |
id | doaj.art-91b8e896f5be43419c47d5f8b3628aaa |
institution | Directory Open Access Journal |
issn | 2673-4591 |
language | English |
last_indexed | 2024-03-10T22:48:27Z |
publishDate | 2023-06-01 |
publisher | MDPI AG |
record_format | Article |
series | Engineering Proceedings |
spelling | doaj.art-91b8e896f5be43419c47d5f8b3628aaa2023-11-19T10:30:35ZengMDPI AGEngineering Proceedings2673-45912023-06-013912910.3390/engproc2023039029Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association StudiesPei-Yun Sun0Guoqi Qian1School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, AustraliaSchool of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, AustraliaFunctional data analysis has demonstrated significant success in time series analysis. In recent biomedical research, it has also been used to analyze sequence variations in genome-wide association studies (GWAS). The observations of genetic variants, called single-nucleotide polymorphisms (SNPs), of an individual are distributed over the loci of a DNA sequence. Thus, it can be regarded as a realization of a stochastic process, which is no different from a time series. However, SNPs are usually coded as the number of minor alleles, which are categorical. The usual least-square smoothing in FDA only works well when the data is continuous and normally distributed. The normality assumption will be violated for categorical SNP data. In this work, we propose a two-step method for smoothing categorical SNPs using a novel method and constructing haplotypes having strong associations with the disease using functional generalized linear models. We show its effectiveness through a real-world PennCATH dataset.https://www.mdpi.com/2673-4591/39/1/29stochastic processfunctional data analysisgenome-wide association studyepistasishaplotypevariable selection |
spellingShingle | Pei-Yun Sun Guoqi Qian Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies Engineering Proceedings stochastic process functional data analysis genome-wide association study epistasis haplotype variable selection |
title | Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies |
title_full | Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies |
title_fullStr | Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies |
title_full_unstemmed | Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies |
title_short | Statistical Haplotypes Based on Functional Sequence Data Analysis for Genome-Wide Association Studies |
title_sort | statistical haplotypes based on functional sequence data analysis for genome wide association studies |
topic | stochastic process functional data analysis genome-wide association study epistasis haplotype variable selection |
url | https://www.mdpi.com/2673-4591/39/1/29 |
work_keys_str_mv | AT peiyunsun statisticalhaplotypesbasedonfunctionalsequencedataanalysisforgenomewideassociationstudies AT guoqiqian statisticalhaplotypesbasedonfunctionalsequencedataanalysisforgenomewideassociationstudies |