Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles

Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target...

Full description

Bibliographic Details
Main Authors: Vijayachitra Modhukur, Shakshi Sharma, Mainak Mondal, Ankita Lawarde, Keiu Kask, Rajesh Sharma, Andres Salumets
Format: Article
Language:English
Published: MDPI AG 2021-07-01
Series:Cancers
Subjects:
Online Access:https://www.mdpi.com/2072-6694/13/15/3768
_version_ 1797525747158482944
author Vijayachitra Modhukur
Shakshi Sharma
Mainak Mondal
Ankita Lawarde
Keiu Kask
Rajesh Sharma
Andres Salumets
author_facet Vijayachitra Modhukur
Shakshi Sharma
Mainak Mondal
Ankita Lawarde
Keiu Kask
Rajesh Sharma
Andres Salumets
author_sort Vijayachitra Modhukur
collection DOAJ
description Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.
first_indexed 2024-03-10T09:18:20Z
format Article
id doaj.art-e11b35165b294e2d8fb722c7a2170759
institution Directory Open Access Journal
issn 2072-6694
language English
last_indexed 2024-03-10T09:18:20Z
publishDate 2021-07-01
publisher MDPI AG
record_format Article
series Cancers
spelling doaj.art-e11b35165b294e2d8fb722c7a21707592023-11-22T05:27:29ZengMDPI AGCancers2072-66942021-07-011315376810.3390/cancers13153768Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation ProfilesVijayachitra Modhukur0Shakshi Sharma1Mainak Mondal2Ankita Lawarde3Keiu Kask4Rajesh Sharma5Andres Salumets6Competence Centre on Health Technologies, 50411 Tartu, EstoniaInstitute of Computer Science, University of Tartu, 51009 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaInstitute of Computer Science, University of Tartu, 51009 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaMetastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.https://www.mdpi.com/2072-6694/13/15/3768DNA methylationTCGAbiomarkersclusteringdifferential methylationmetastasis
spellingShingle Vijayachitra Modhukur
Shakshi Sharma
Mainak Mondal
Ankita Lawarde
Keiu Kask
Rajesh Sharma
Andres Salumets
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
Cancers
DNA methylation
TCGA
biomarkers
clustering
differential methylation
metastasis
title Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
title_full Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
title_fullStr Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
title_full_unstemmed Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
title_short Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
title_sort machine learning approaches to classify primary and metastatic cancers using tissue of origin based dna methylation profiles
topic DNA methylation
TCGA
biomarkers
clustering
differential methylation
metastasis
url https://www.mdpi.com/2072-6694/13/15/3768
work_keys_str_mv AT vijayachitramodhukur machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles
AT shakshisharma machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles
AT mainakmondal machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles
AT ankitalawarde machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles
AT keiukask machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles
AT rajeshsharma machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles
AT andressalumets machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles