Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI

Introduction: Alzheimer’s disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment. As a typical complex disease, the mechanism of AD’s occurrence and development still lacks sufficient...

Full description

Bibliographic Details
Main Authors: Juan Zhou, Yangping Qiu, Xiangyu Liu, Ziruo Xie, Shanguo Lv, Yuanyuan Peng, Xiong Li
Format: Article
Language:English
Published: IMR Press 2022-01-01
Series:Frontiers in Bioscience-Landmark
Subjects:
Online Access:https://www.imrpress.com/journal/FBL/27/1/10.31083/j.fbl2701037
_version_ 1818774222330134528
author Juan Zhou
Yangping Qiu
Xiangyu Liu
Ziruo Xie
Shanguo Lv
Yuanyuan Peng
Xiong Li
author_facet Juan Zhou
Yangping Qiu
Xiangyu Liu
Ziruo Xie
Shanguo Lv
Yuanyuan Peng
Xiong Li
author_sort Juan Zhou
collection DOAJ
description Introduction: Alzheimer’s disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment. As a typical complex disease, the mechanism of AD’s occurrence and development still lacks sufficient understanding. Research design and methods: In this study, we aim to directly analyze the relationship between DNA variants and phenotypes based on the whole genome sequencing data. Firstly, to enhance the biological meanings of our study, we annotate the deleterious variants and mapped them to nearest protein coding genes. Then, to eliminate the redundant features and reduce the burden of downstream analysis, a multi-objective evaluation strategy based on entropy theory is applied for ranking all candidate genes. Finally, we use multi-classifier XGBoost for classifying unbalanced data composed with 46 AD samples, 483 mild cognitive impairment (MCI) samples and 279 cognitive normal (CN) samples. Results: The experimental results on real whole genome sequencing data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) show that our method not only has satisfactory classification performance but also finds significance correlation between AD and RIN3, a known susceptibility gene of AD. In addition, pathway enrichment analysis was carried out using the top 20 feature genes, and three pathways were confirmed to be significantly related to the formation of AD. Conclusions: From the experimental results, we demonstrated that the efficacy of our proposed method has practical significance.
first_indexed 2024-12-18T10:37:43Z
format Article
id doaj.art-60d27c4d8bb54d349c5bc4414ed611b1
institution Directory Open Access Journal
issn 2768-6701
language English
last_indexed 2024-12-18T10:37:43Z
publishDate 2022-01-01
publisher IMR Press
record_format Article
series Frontiers in Bioscience-Landmark
spelling doaj.art-60d27c4d8bb54d349c5bc4414ed611b12022-12-21T21:10:42ZengIMR PressFrontiers in Bioscience-Landmark2768-67012022-01-0127103710.31083/j.fbl2701037S2768-6701(22)00375-6Annotating whole genome variants and constructing a multi-classifier based on samples of ADNIJuan Zhou0Yangping Qiu1Xiangyu Liu2Ziruo Xie3Shanguo Lv4Yuanyuan Peng5Xiong Li6School of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaSchool of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaSchool of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaSchool of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaSchool of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaSchool of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaSchool of Software, East China Jiaotong University, 330013 Nanchang, Jiangxi, ChinaIntroduction: Alzheimer’s disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment. As a typical complex disease, the mechanism of AD’s occurrence and development still lacks sufficient understanding. Research design and methods: In this study, we aim to directly analyze the relationship between DNA variants and phenotypes based on the whole genome sequencing data. Firstly, to enhance the biological meanings of our study, we annotate the deleterious variants and mapped them to nearest protein coding genes. Then, to eliminate the redundant features and reduce the burden of downstream analysis, a multi-objective evaluation strategy based on entropy theory is applied for ranking all candidate genes. Finally, we use multi-classifier XGBoost for classifying unbalanced data composed with 46 AD samples, 483 mild cognitive impairment (MCI) samples and 279 cognitive normal (CN) samples. Results: The experimental results on real whole genome sequencing data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) show that our method not only has satisfactory classification performance but also finds significance correlation between AD and RIN3, a known susceptibility gene of AD. In addition, pathway enrichment analysis was carried out using the top 20 feature genes, and three pathways were confirmed to be significantly related to the formation of AD. Conclusions: From the experimental results, we demonstrated that the efficacy of our proposed method has practical significance.https://www.imrpress.com/journal/FBL/27/1/10.31083/j.fbl2701037unbalanced datamulti-class classificationmulti-objective optimization
spellingShingle Juan Zhou
Yangping Qiu
Xiangyu Liu
Ziruo Xie
Shanguo Lv
Yuanyuan Peng
Xiong Li
Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
Frontiers in Bioscience-Landmark
unbalanced data
multi-class classification
multi-objective optimization
title Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
title_full Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
title_fullStr Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
title_full_unstemmed Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
title_short Annotating whole genome variants and constructing a multi-classifier based on samples of ADNI
title_sort annotating whole genome variants and constructing a multi classifier based on samples of adni
topic unbalanced data
multi-class classification
multi-objective optimization
url https://www.imrpress.com/journal/FBL/27/1/10.31083/j.fbl2701037
work_keys_str_mv AT juanzhou annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni
AT yangpingqiu annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni
AT xiangyuliu annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni
AT ziruoxie annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni
AT shanguolv annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni
AT yuanyuanpeng annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni
AT xiongli annotatingwholegenomevariantsandconstructingamulticlassifierbasedonsamplesofadni