A hierarchical clustering approach for colorectal cancer molecular subtypes identification from gene expression data

Background: Colorectal cancer (CRC) is the second leading cause of cancer fatalities and the third most common human disease. Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly. Studies...

Full description

Bibliographic Details
Main Authors: Shivangi Raghav, Aastha Suri, Deepika Kumar, Aakansha Aakansha, Muskan Rathore, Sudipta Roy
Format: Article
Language:English
Published: Elsevier 2024-02-01
Series:Intelligent Medicine
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2667102623000396
Description
Summary:Background: Colorectal cancer (CRC) is the second leading cause of cancer fatalities and the third most common human disease. Identifying molecular subgroups of CRC and treating patients accordingly could result in better therapeutic success compared with treating all CRC patients similarly. Studies have highlighted the significance of CRC as a major cause of mortality worldwide and the potential benefits of identifying molecular subtypes to tailor treatment strategies and improve patient outcomes. Methods: This study proposed an unsupervised learning approach using hierarchical clustering and feature selection to identify molecular subtypes and compares its performance with that of conventional methods. The proposed model contained gene expression data from CRC patients obtained from Kaggle and used dimension reduction techniques followed by Z-score-based outlier removal. Agglomerative hierarchy clustering was used to identify molecular subtypes, with a P-value-based approach for feature selection. The performance of the model was evaluated using various classifiers including multilayer perceptron (MLP). Results: The proposed methodology outperformed conventional methods, with the MLP classifier achieving the highest accuracy of 89% after feature selection. The model successfully identified molecular subtypes of CRC and differentiated between different subtypes based on their gene expression profiles. Conclusion: This method could aid in developing tailored therapeutic strategies for CRC patients, although there is a need for further validation and evaluation of its clinical significance.
ISSN:2667-1026