Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data

Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene...

Full description

Bibliographic Details
Main Authors: List Markus, Hauschild Anne-Christin, Tan Qihua, Kruse Torben A., Baumbach Jan, Batra Richa
Format: Article
Language:English
Published: De Gruyter 2014-06-01
Series:Journal of Integrative Bioinformatics
Online Access:https://doi.org/10.1515/jib-2014-236
_version_ 1818720574123278336
author List Markus
Hauschild Anne-Christin
Tan Qihua
Kruse Torben A.
Baumbach Jan
Batra Richa
author_facet List Markus
Hauschild Anne-Christin
Tan Qihua
Kruse Torben A.
Baumbach Jan
Batra Richa
author_sort List Markus
collection DOAJ
description Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.
first_indexed 2024-12-17T20:25:00Z
format Article
id doaj.art-53f0c2ffd4914a8b9cfa6ffd0cf68b7f
institution Directory Open Access Journal
issn 1613-4516
language English
last_indexed 2024-12-17T20:25:00Z
publishDate 2014-06-01
publisher De Gruyter
record_format Article
series Journal of Integrative Bioinformatics
spelling doaj.art-53f0c2ffd4914a8b9cfa6ffd0cf68b7f2022-12-21T21:33:49ZengDe GruyterJournal of Integrative Bioinformatics1613-45162014-06-0111211410.1515/jib-2014-236jib-2014-236Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation DataList Markus0Hauschild Anne-Christin1Tan Qihua2Kruse Torben A.3Baumbach Jan4Batra Richa5Lundbeckfonden Center of Excellence in Nanomedicine (NanoCAN), University of Southern Denmark, 5000Odense, DenmarkComputational Systems Biology Group, Max Planck Institute for Informatics, 66123Saarbrücken, GermanyClinical Institute, University of Southern Denmark, 5000Odense, DenmarkLundbeckfonden Center of Excellence in Nanomedicine (NanoCAN), University of Southern Denmark, 5000Odense, DenmarkDepartment of Mathematics and Computer Science (IMADA), University of Southern Denmark, 5000Odense, DenmarkDepartment of Mathematics and Computer Science (IMADA), University of Southern Denmark, 5000Odense, DenmarkSelecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.https://doi.org/10.1515/jib-2014-236
spellingShingle List Markus
Hauschild Anne-Christin
Tan Qihua
Kruse Torben A.
Baumbach Jan
Batra Richa
Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data
Journal of Integrative Bioinformatics
title Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data
title_full Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data
title_fullStr Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data
title_full_unstemmed Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data
title_short Classification of Breast Cancer Subtypes by combining Gene Expression and DNA Methylation Data
title_sort classification of breast cancer subtypes by combining gene expression and dna methylation data
url https://doi.org/10.1515/jib-2014-236
work_keys_str_mv AT listmarkus classificationofbreastcancersubtypesbycombininggeneexpressionanddnamethylationdata
AT hauschildannechristin classificationofbreastcancersubtypesbycombininggeneexpressionanddnamethylationdata
AT tanqihua classificationofbreastcancersubtypesbycombininggeneexpressionanddnamethylationdata
AT krusetorbena classificationofbreastcancersubtypesbycombininggeneexpressionanddnamethylationdata
AT baumbachjan classificationofbreastcancersubtypesbycombininggeneexpressionanddnamethylationdata
AT batraricha classificationofbreastcancersubtypesbycombininggeneexpressionanddnamethylationdata