Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
Abstract Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite thes...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12859-023-05529-0 |
_version_ | 1797397754565099520 |
---|---|
author | Youpeng Yang Qiuhong Zeng Gaotong Liu Shiyao Zheng Tianyang Luo Yibin Guo Jia Tang Yi Huang |
author_facet | Youpeng Yang Qiuhong Zeng Gaotong Liu Shiyao Zheng Tianyang Luo Yibin Guo Jia Tang Yi Huang |
author_sort | Youpeng Yang |
collection | DOAJ |
description | Abstract Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method. |
first_indexed | 2024-03-09T01:14:43Z |
format | Article |
id | doaj.art-716d0e747c9b4c8fbb7284abdaf37efc |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-03-09T01:14:43Z |
publishDate | 2023-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-716d0e747c9b4c8fbb7284abdaf37efc2023-12-10T12:33:46ZengBMCBMC Bioinformatics1471-21052023-12-0124111410.1186/s12859-023-05529-0Hierarchical classification-based pan-cancer methylation analysis to classify primary cancerYoupeng Yang0Qiuhong Zeng1Gaotong Liu2Shiyao Zheng3Tianyang Luo4Yibin Guo5Jia Tang6Yi Huang7Medicine School, Sun Yat-sen UniversityGeneplus-Shenzhen InstituteGeneplus-Shenzhen InstituteMedicine School, Sun Yat-sen UniversityMedicine School, Sun Yat-sen UniversityMedicine School, Sun Yat-sen UniversityNHC Key Laboratory of Male Reproduction and Genetics, Guangdong Provincial Reproductive Science Institute (Guangdong Provincial Fertility Hospital)Geneplus-Shenzhen InstituteAbstract Hierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method.https://doi.org/10.1186/s12859-023-05529-0CancerClassificationCluster analysisMachine learning |
spellingShingle | Youpeng Yang Qiuhong Zeng Gaotong Liu Shiyao Zheng Tianyang Luo Yibin Guo Jia Tang Yi Huang Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer BMC Bioinformatics Cancer Classification Cluster analysis Machine learning |
title | Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer |
title_full | Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer |
title_fullStr | Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer |
title_full_unstemmed | Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer |
title_short | Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer |
title_sort | hierarchical classification based pan cancer methylation analysis to classify primary cancer |
topic | Cancer Classification Cluster analysis Machine learning |
url | https://doi.org/10.1186/s12859-023-05529-0 |
work_keys_str_mv | AT youpengyang hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT qiuhongzeng hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT gaotongliu hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT shiyaozheng hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT tianyangluo hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT yibinguo hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT jiatang hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer AT yihuang hierarchicalclassificationbasedpancancermethylationanalysistoclassifyprimarycancer |