Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer

Abstract Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite thes...

Full description

Bibliographic Details
Main Authors: Youpeng Yang, Qiuhong Zeng, Gaotong Liu, Shiyao Zheng, Tianyang Luo, Yibin Guo, Jia Tang, Yi Huang
Format: Article
Language:English
Published: BMC 2023-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05529-0
_version_ 1797397754565099520
author Youpeng Yang
Qiuhong Zeng
Gaotong Liu
Shiyao Zheng
Tianyang Luo
Yibin Guo
Jia Tang
Yi Huang
author_facet Youpeng Yang
Qiuhong Zeng
Gaotong Liu
Shiyao Zheng
Tianyang Luo
Yibin Guo
Jia Tang
Yi Huang
author_sort Youpeng Yang
collection DOAJ
description Abstract Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method.
first_indexed 2024-03-09T01:14:43Z
format Article
id doaj.art-716d0e747c9b4c8fbb7284abdaf37efc
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-09T01:14:43Z
publishDate 2023-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-716d0e747c9b4c8fbb7284abdaf37efc2023-12-10T12:33:46ZengBMCBMC Bioinformatics1471-21052023-12-0124111410.1186/s12859-023-05529-0Hierarchical classification-based pan-cancer methylation analysis to classify primary cancerYoupeng Yang0Qiuhong Zeng1Gaotong Liu2Shiyao Zheng3Tianyang Luo4Yibin Guo5Jia Tang6Yi Huang7Medicine School, Sun Yat-sen UniversityGeneplus-Shenzhen InstituteGeneplus-Shenzhen InstituteMedicine School, Sun Yat-sen UniversityMedicine School, Sun Yat-sen UniversityMedicine School, Sun Yat-sen UniversityNHC Key Laboratory of Male Reproduction and Genetics, Guangdong Provincial Reproductive Science Institute (Guangdong Provincial Fertility Hospital)Geneplus-Shenzhen InstituteAbstract Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method.https://doi.org/10.1186/s12859-023-05529-0CancerClassificationCluster analysisMachine learning
spellingShingle Youpeng Yang
Qiuhong Zeng
Gaotong Liu
Shiyao Zheng
Tianyang Luo
Yibin Guo
Jia Tang
Yi Huang
Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
BMC Bioinformatics
Cancer
Classification
Cluster analysis
Machine learning
title Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
title_full Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
title_fullStr Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
title_full_unstemmed Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
title_short Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
title_sort hierarchical classification based pan cancer methylation analysis to classify primary cancer
topic Cancer
Classification
Cluster analysis
Machine learning
url https://doi.org/10.1186/s12859-023-05529-0
work_keys_str_mv AT youpengyang hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT qiuhongzeng hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT gaotongliu hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT shiyaozheng hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT tianyangluo hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT yibinguo hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT jiatang hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer
AT yihuang hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer