Prediction of breast cancer diagnosis using machine learning in Malaysian women

Breast cancer is the most prevalent cancer in the world and the main cause of cancer mortality in the twelve regions of the world. Thus, there is a need for efficient screening and diagnosis of the disease. Thus, this thesis aims to explore the use of machine learning (ML) for breast cancer risk est...

Full description

Bibliographic Details
Main Author: Mokhtar, Tengku Muhammad Hanis Tengku
Format: Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://eprints.usm.my/60999/1/TENGKU%20MUHAMMAD%20HANIS%20BIN%20TENGKU%20MOKHTAR-E.pdf
Description
Summary:Breast cancer is the most prevalent cancer in the world and the main cause of cancer mortality in the twelve regions of the world. Thus, there is a need for efficient screening and diagnosis of the disease. Thus, this thesis aims to explore the use of machine learning (ML) for breast cancer risk estimation and prediction. This thesis included six interrelated projects starting from Chapter 2 to Chapter 7. Chapter 2 presents an overview of breast cancer research in Malaysia. A bibliometric analysis was used to describe the research activities of breast cancer research in Malaysia. This project revealed there was no dominant research area in breast cancer research in Malaysia. Additionally, the study found that two growing research themes related to breast cancer in Malaysia were precision medicine and deep learning. Chapter 3 explored the most cited global research related to breast cancer and ML. This project also utilised bibliometric analysis applied to the most cited papers related to breast cancer and ML. This project found that there was a strong interest in the application of ML to breast cancer in the last three decades. The three frequently used ML algorithms were deep learning, support vector machine (SVM), and cluster analysis. In Chapter 4, factors influencing mammographic density among Asian women including Malaysia women were investigated. The study utilised a multiple imputation approach to overcome a missing data issue and a logistic regression to analyse the data. Five factors affecting mammographic density were age, number of children, body mass index, menopause status, and breast imaging-reporting and data system (BI-RADS) classification. The study in Chapter 5 explored the use of patient registration records and ML for breast cancer risk estimation. The ML model developed in this chapter could be used as an over-the-counter screening (OTC) model for women attending breast clinics. Eight ML algorithms were explored in this project. k-nearest neighbour (kNN) models had a significantly better performance compared to the other seven models. Additionally, Chapter 6 presents a meta-analysis of ML models on breast cancer classification. This project seeks to establish the diagnostic accuracy of ML used on mammographic data. This project found that neural network, deep learning, tree-based models, and SVM performed well on mammographic data for breast cancer detection. The study established the good diagnostic accuracy of ML in this area of research, thus, further supporting the use of ML in this area, especially for screening and supplementary diagnostic tools. Lastly, the study in Chapter 7 explored the use of an ensemble of pre-trained networks for breast abnormality classification using digital mammograms. This project explored thirteen pre-trained networks as candidates for the ensemble model. Each network was further fine-tuned, and the top networks were used to develop the ensemble model. The ensemble pre-trained network displayed a good performance in classifying the normal and suspicious mammograms. In conclusion, this thesis highlights the potential of ML in breast cancer risk estimation and prediction. The findings of this thesis contribute to the growing body of literature on ML in breast cancer research and provide valuable insights for future research in this area.