Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning

Triple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchy...

Full description

Bibliographic Details
Main Authors: Rassanee Bissanum, Sitthichok Chaichulee, Rawikant Kamolphiwong, Raphatphorn Navakanitworakul, Kanyanatt Kanokwiroon
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Journal of Personalized Medicine
Subjects:
Online Access:https://www.mdpi.com/2075-4426/11/9/881
_version_ 1797518617669009408
author Rassanee Bissanum
Sitthichok Chaichulee
Rawikant Kamolphiwong
Raphatphorn Navakanitworakul
Kanyanatt Kanokwiroon
author_facet Rassanee Bissanum
Sitthichok Chaichulee
Rawikant Kamolphiwong
Raphatphorn Navakanitworakul
Kanyanatt Kanokwiroon
author_sort Rassanee Bissanum
collection DOAJ
description Triple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchymal (MES), and luminal androgen receptor (LAR). However, there is currently no standardized method for classifying TNBC subtypes. We attempted to define a gene signature for each subtype, and to develop a classification method based on machine learning (ML) for TNBC subtyping. In these experiments, gene expression microarray data for TNBC patients were downloaded from the Gene Expression Omnibus database. Differentially expressed genes unique to 198 known TNBC cases were identified and selected as a training gene set to train in seven different classification models. We produced a training set consisting of 719 DEGs selected from uniquely expressed genes of all four subtypes. The highest average accuracy of classification of the BLIA, BLIS, MES, and LAR subtypes was achieved by the SVM algorithm (accuracy 95–98.8%; AUC 0.99–1.00). For model validation, we used 334 samples of unknown TNBC subtypes, of which 97 (29.04%), 73 (21.86%), 39 (11.68%) and 59 (17.66%) were predicted to be BLIA, BLIS, MES, and LAR, respectively. However, 66 TNBC samples (19.76%) could not be assigned to any subtype. These samples contained only three upregulated genes (<i>EN1</i>, <i>PROM1</i>, and <i>CCL2</i>). Each TNBC subtype had a unique gene expression pattern, which was confirmed by identification of DEGs and pathway analysis. These results indicated that our training gene set was suitable for development of classification models, and that the SVM algorithm could classify TNBC into four unique subtypes. Accurate and consistent classification of the TNBC subtypes is essential for personalized treatment and prognosis of TNBC.
first_indexed 2024-03-10T07:32:17Z
format Article
id doaj.art-c53ddd7e77b1466f8d0fb901c069f550
institution Directory Open Access Journal
issn 2075-4426
language English
last_indexed 2024-03-10T07:32:17Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Journal of Personalized Medicine
spelling doaj.art-c53ddd7e77b1466f8d0fb901c069f5502023-11-22T13:50:31ZengMDPI AGJournal of Personalized Medicine2075-44262021-09-0111988110.3390/jpm11090881Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine LearningRassanee Bissanum0Sitthichok Chaichulee1Rawikant Kamolphiwong2Raphatphorn Navakanitworakul3Kanyanatt Kanokwiroon4Department of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandDepartment of Biomedical Sciences and Biomedical Engineering, Faculty of Medicine, Prince of Songkla University, Hat Yai, Songkhla 90110, ThailandTriple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchymal (MES), and luminal androgen receptor (LAR). However, there is currently no standardized method for classifying TNBC subtypes. We attempted to define a gene signature for each subtype, and to develop a classification method based on machine learning (ML) for TNBC subtyping. In these experiments, gene expression microarray data for TNBC patients were downloaded from the Gene Expression Omnibus database. Differentially expressed genes unique to 198 known TNBC cases were identified and selected as a training gene set to train in seven different classification models. We produced a training set consisting of 719 DEGs selected from uniquely expressed genes of all four subtypes. The highest average accuracy of classification of the BLIA, BLIS, MES, and LAR subtypes was achieved by the SVM algorithm (accuracy 95–98.8%; AUC 0.99–1.00). For model validation, we used 334 samples of unknown TNBC subtypes, of which 97 (29.04%), 73 (21.86%), 39 (11.68%) and 59 (17.66%) were predicted to be BLIA, BLIS, MES, and LAR, respectively. However, 66 TNBC samples (19.76%) could not be assigned to any subtype. These samples contained only three upregulated genes (<i>EN1</i>, <i>PROM1</i>, and <i>CCL2</i>). Each TNBC subtype had a unique gene expression pattern, which was confirmed by identification of DEGs and pathway analysis. These results indicated that our training gene set was suitable for development of classification models, and that the SVM algorithm could classify TNBC into four unique subtypes. Accurate and consistent classification of the TNBC subtypes is essential for personalized treatment and prognosis of TNBC.https://www.mdpi.com/2075-4426/11/9/881TNBC subtypemachine learningmicroarraygene expression profile
spellingShingle Rassanee Bissanum
Sitthichok Chaichulee
Rawikant Kamolphiwong
Raphatphorn Navakanitworakul
Kanyanatt Kanokwiroon
Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
Journal of Personalized Medicine
TNBC subtype
machine learning
microarray
gene expression profile
title Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
title_full Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
title_fullStr Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
title_full_unstemmed Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
title_short Molecular Classification Models for Triple Negative Breast Cancer Subtype Using Machine Learning
title_sort molecular classification models for triple negative breast cancer subtype using machine learning
topic TNBC subtype
machine learning
microarray
gene expression profile
url https://www.mdpi.com/2075-4426/11/9/881
work_keys_str_mv AT rassaneebissanum molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning
AT sitthichokchaichulee molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning
AT rawikantkamolphiwong molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning
AT raphatphornnavakanitworakul molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning
AT kanyanattkanokwiroon molecularclassificationmodelsfortriplenegativebreastcancersubtypeusingmachinelearning