A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data

Abstract Breast cancer is considered one of the significant health challenges and ranks among the most prevalent and dangerous cancer types affecting women globally. Early breast cancer detection and diagnosis are crucial for effective treatment and personalized therapy. Early detection and diagnosi...

Full description

Bibliographic Details
Main Authors: Tehnan I. A. Mohamed, Absalom E. Ezugwu, Jean Vincent Fonou-Dombeu, Abiodun M. Ikotun, Mohanad Mohammed
Format: Article
Language:English
Published: Nature Portfolio 2023-09-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-41731-z
_version_ 1797576924964323328
author Tehnan I. A. Mohamed
Absalom E. Ezugwu
Jean Vincent Fonou-Dombeu
Abiodun M. Ikotun
Mohanad Mohammed
author_facet Tehnan I. A. Mohamed
Absalom E. Ezugwu
Jean Vincent Fonou-Dombeu
Abiodun M. Ikotun
Mohanad Mohammed
author_sort Tehnan I. A. Mohamed
collection DOAJ
description Abstract Breast cancer is considered one of the significant health challenges and ranks among the most prevalent and dangerous cancer types affecting women globally. Early breast cancer detection and diagnosis are crucial for effective treatment and personalized therapy. Early detection and diagnosis can help patients and physicians discover new treatment options, provide a more suitable quality of life, and ensure increased survival rates. Breast cancer detection using gene expression involves many complexities, such as the issue of dimensionality and the complicatedness of the gene expression data. This paper proposes a bio-inspired CNN model for breast cancer detection using gene expression data downloaded from the cancer genome atlas (TCGA). The data contains 1208 clinical samples of 19,948 genes with 113 normal and 1095 cancerous samples. In the proposed model, Array-Array Intensity Correlation (AAIC) is used at the pre-processing stage for outlier removal, followed by a normalization process to avoid biases in the expression measures. Filtration is used for gene reduction using a threshold value of 0.25. Thereafter the pre-processed gene expression dataset was converted into images which were later converted to grayscale to meet the requirements of the model. The model also uses a hybrid model of CNN architecture with a metaheuristic algorithm, namely the Ebola Optimization Search Algorithm (EOSA), to enhance the detection of breast cancer. The traditional CNN and five hybrid algorithms were compared with the classification result of the proposed model. The competing hybrid algorithms include the Whale Optimization Algorithm (WOA-CNN), the Genetic Algorithm (GA-CNN), the Satin Bowerbird Optimization (SBO-CNN), the Life Choice-Based Optimization (LCBO-CNN), and the Multi-Verse Optimizer (MVO-CNN). The results show that the proposed model determined the classes with high-performance measurements with an accuracy of 98.3%, a precision of 99%, a recall of 99%, an f1-score of 99%, a kappa of 90.3%, a specificity of 92.8%, and a sensitivity of 98.9% for the cancerous class. The results suggest that the proposed method has the potential to be a reliable and precise approach to breast cancer detection, which is crucial for early diagnosis and personalized therapy.
first_indexed 2024-03-10T22:01:43Z
format Article
id doaj.art-be337eb958634bfb8e11b1cc6fc0792b
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-10T22:01:43Z
publishDate 2023-09-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-be337eb958634bfb8e11b1cc6fc0792b2023-11-19T12:57:18ZengNature PortfolioScientific Reports2045-23222023-09-0113111910.1038/s41598-023-41731-zA bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression dataTehnan I. A. Mohamed0Absalom E. Ezugwu1Jean Vincent Fonou-Dombeu2Abiodun M. Ikotun3Mohanad Mohammed4School of Mathematics, Statistics, and Computer Science, University of KwaZulu-NatalUnit for Data Science and Computing, North-West UniversitySchool of Mathematics, Statistics, and Computer Science, University of KwaZulu-NatalSchool of Mathematics, Statistics, and Computer Science, University of KwaZulu-NatalSchool of Mathematics, Statistics, and Computer Science, University of KwaZulu-NatalAbstract Breast cancer is considered one of the significant health challenges and ranks among the most prevalent and dangerous cancer types affecting women globally. Early breast cancer detection and diagnosis are crucial for effective treatment and personalized therapy. Early detection and diagnosis can help patients and physicians discover new treatment options, provide a more suitable quality of life, and ensure increased survival rates. Breast cancer detection using gene expression involves many complexities, such as the issue of dimensionality and the complicatedness of the gene expression data. This paper proposes a bio-inspired CNN model for breast cancer detection using gene expression data downloaded from the cancer genome atlas (TCGA). The data contains 1208 clinical samples of 19,948 genes with 113 normal and 1095 cancerous samples. In the proposed model, Array-Array Intensity Correlation (AAIC) is used at the pre-processing stage for outlier removal, followed by a normalization process to avoid biases in the expression measures. Filtration is used for gene reduction using a threshold value of 0.25. Thereafter the pre-processed gene expression dataset was converted into images which were later converted to grayscale to meet the requirements of the model. The model also uses a hybrid model of CNN architecture with a metaheuristic algorithm, namely the Ebola Optimization Search Algorithm (EOSA), to enhance the detection of breast cancer. The traditional CNN and five hybrid algorithms were compared with the classification result of the proposed model. The competing hybrid algorithms include the Whale Optimization Algorithm (WOA-CNN), the Genetic Algorithm (GA-CNN), the Satin Bowerbird Optimization (SBO-CNN), the Life Choice-Based Optimization (LCBO-CNN), and the Multi-Verse Optimizer (MVO-CNN). The results show that the proposed model determined the classes with high-performance measurements with an accuracy of 98.3%, a precision of 99%, a recall of 99%, an f1-score of 99%, a kappa of 90.3%, a specificity of 92.8%, and a sensitivity of 98.9% for the cancerous class. The results suggest that the proposed method has the potential to be a reliable and precise approach to breast cancer detection, which is crucial for early diagnosis and personalized therapy.https://doi.org/10.1038/s41598-023-41731-z
spellingShingle Tehnan I. A. Mohamed
Absalom E. Ezugwu
Jean Vincent Fonou-Dombeu
Abiodun M. Ikotun
Mohanad Mohammed
A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data
Scientific Reports
title A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data
title_full A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data
title_fullStr A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data
title_full_unstemmed A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data
title_short A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data
title_sort bio inspired convolution neural network architecture for automatic breast cancer detection and classification using rna seq gene expression data
url https://doi.org/10.1038/s41598-023-41731-z
work_keys_str_mv AT tehnaniamohamed abioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT absalomeezugwu abioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT jeanvincentfonoudombeu abioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT abiodunmikotun abioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT mohanadmohammed abioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT tehnaniamohamed bioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT absalomeezugwu bioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT jeanvincentfonoudombeu bioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT abiodunmikotun bioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata
AT mohanadmohammed bioinspiredconvolutionneuralnetworkarchitectureforautomaticbreastcancerdetectionandclassificationusingrnaseqgeneexpressiondata