Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
Triple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchy...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Journal of Personalized Medicine |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-4426/11/9/881 |
_version_ | 1797518617669009408 |
---|---|
author | Rassanee Bissanum Sitthichok Chaichulee Rawikant Kamolphiwong Raphatphorn Navakanitworakul Kanyanatt Kanokwiroon |
author_facet | Rassanee Bissanum Sitthichok Chaichulee Rawikant Kamolphiwong Raphatphorn Navakanitworakul Kanyanatt Kanokwiroon |
author_sort | Rassanee Bissanum |
collection | DOAJ |
description | Triple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchymal (MES), and luminal androgen receptor (LAR). However, there is currently no standardized method for classifying TNBC subtypes. We attempted to define a gene signature for each subtype, and to develop a classification method based on machine learning (ML) for TNBC subtyping. In these experiments, gene expression microarray data for TNBC patients were downloaded from the Gene Expression Omnibus database. Differentially expressed genes unique to 198 known TNBC cases were identified and selected as a training gene set to train in seven different classification models. We produced a training set consisting of 719 DEGs selected from uniquely expressed genes of all four subtypes. The highest average accuracy of classification of the BLIA, BLIS, MES, and LAR subtypes was achieved by the SVM algorithm (accuracy 95–98.8%; AUC 0.99–1.00). For model validation, we used 334 samples of unknown TNBC subtypes, of which 97 (29.04%), 73 (21.86%), 39 (11.68%) and 59 (17.66%) were predicted to be BLIA, BLIS, MES, and LAR, respectively. However, 66 TNBC samples (19.76%) could not be assigned to any subtype. These samples contained only three upregulated genes (<i>EN1</i>, <i>PROM1</i>, and <i>CCL2</i>). Each TNBC subtype had a unique gene expression pattern, which was confirmed by identification of DEGs and pathway analysis. These results indicated that our training gene set was suitable for development of classification models, and that the SVM algorithm could classify TNBC into four unique subtypes. Accurate and consistent classification of the TNBC subtypes is essential for personalized treatment and prognosis of TNBC. |
first_indexed | 2024-03-10T07:32:17Z |
format | Article |
id | doaj.art-c53ddd7e77b1466f8d0fb901c069f550 |
institution | Directory Open Access Journal |
issn | 2075-4426 |
language | English |
last_indexed | 2024-03-10T07:32:17Z |
publishDate | 2021-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Journal of Personalized Medicine |
spelling | doaj.art-c53ddd7e77b1466f8d0fb901c069f5502023-11-22T13:50:31ZengMDPI AGJournal of Personalized Medicine2075-44262021-09-0111988110.3390/jpm11090881Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine LearningRassanee Bissanum0Sitthichok Chaichulee1Rawikant Kamolphiwong2Raphatphorn Navakanitworakul3Kanyanatt Kanokwiroon4Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandTriple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchymal (MES), and luminal androgen receptor (LAR). However, there is currently no standardized method for classifying TNBC subtypes. We attempted to define a gene signature for each subtype, and to develop a classification method based on machine learning (ML) for TNBC subtyping. In these experiments, gene expression microarray data for TNBC patients were downloaded from the Gene Expression Omnibus database. Differentially expressed genes unique to 198 known TNBC cases were identified and selected as a training gene set to train in seven different classification models. We produced a training set consisting of 719 DEGs selected from uniquely expressed genes of all four subtypes. The highest average accuracy of classification of the BLIA, BLIS, MES, and LAR subtypes was achieved by the SVM algorithm (accuracy 95–98.8%; AUC 0.99–1.00). For model validation, we used 334 samples of unknown TNBC subtypes, of which 97 (29.04%), 73 (21.86%), 39 (11.68%) and 59 (17.66%) were predicted to be BLIA, BLIS, MES, and LAR, respectively. However, 66 TNBC samples (19.76%) could not be assigned to any subtype. These samples contained only three upregulated genes (<i>EN1</i>, <i>PROM1</i>, and <i>CCL2</i>). Each TNBC subtype had a unique gene expression pattern, which was confirmed by identification of DEGs and pathway analysis. These results indicated that our training gene set was suitable for development of classification models, and that the SVM algorithm could classify TNBC into four unique subtypes. Accurate and consistent classification of the TNBC subtypes is essential for personalized treatment and prognosis of TNBC.https://www.mdpi.com/2075-4426/11/9/881TNBC subtypemachine learningmicroarraygene expression profile |
spellingShingle | Rassanee Bissanum Sitthichok Chaichulee Rawikant Kamolphiwong Raphatphorn Navakanitworakul Kanyanatt Kanokwiroon Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning Journal of Personalized Medicine TNBC subtype machine learning microarray gene expression profile |
title | Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning |
title_full | Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning |
title_fullStr | Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning |
title_full_unstemmed | Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning |
title_short | Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning |
title_sort | molecular classification models for triple negative breast cancer subtype using machine learning |
topic | TNBC subtype machine learning microarray gene expression profile |
url | https://www.mdpi.com/2075-4426/11/9/881 |
work_keys_str_mv | AT rassaneebissanum molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning AT sitthichokchaichulee molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning AT rawikantkamolphiwong molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning AT raphatphornnavakanitworakul molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning AT kanyanattkanokwiroon molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning |