Prediction of breast cancer diagnosis using machine learning in Malaysian women

Breast cancer is the most prevalent cancer in the world and the main cause of cancer mortality in the twelve regions of the world. Thus, there is a need for efficient screening and diagnosis of the disease. Thus, this thesis aims to explore the use of machine learning (ML) for breast cancer risk est...

Full description

Bibliographic Details
Main Author: Mokhtar, Tengku Muhammad Hanis Tengku
Format: Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://eprints.usm.my/60999/1/TENGKU%20MUHAMMAD%20HANIS%20BIN%20TENGKU%20MOKHTAR-E.pdf
_version_ 1811138912949633024
author Mokhtar, Tengku Muhammad Hanis Tengku
author_facet Mokhtar, Tengku Muhammad Hanis Tengku
author_sort Mokhtar, Tengku Muhammad Hanis Tengku
collection USM
description Breast cancer is the most prevalent cancer in the world and the main cause of cancer mortality in the twelve regions of the world. Thus, there is a need for efficient screening and diagnosis of the disease. Thus, this thesis aims to explore the use of machine learning (ML) for breast cancer risk estimation and prediction. This thesis included six interrelated projects starting from Chapter 2 to Chapter 7. Chapter 2 presents an overview of breast cancer research in Malaysia. A bibliometric analysis was used to describe the research activities of breast cancer research in Malaysia. This project revealed there was no dominant research area in breast cancer research in Malaysia. Additionally, the study found that two growing research themes related to breast cancer in Malaysia were precision medicine and deep learning. Chapter 3 explored the most cited global research related to breast cancer and ML. This project also utilised bibliometric analysis applied to the most cited papers related to breast cancer and ML. This project found that there was a strong interest in the application of ML to breast cancer in the last three decades. The three frequently used ML algorithms were deep learning, support vector machine (SVM), and cluster analysis. In Chapter 4, factors influencing mammographic density among Asian women including Malaysia women were investigated. The study utilised a multiple imputation approach to overcome a missing data issue and a logistic regression to analyse the data. Five factors affecting mammographic density were age, number of children, body mass index, menopause status, and breast imaging-reporting and data system (BI-RADS) classification. The study in Chapter 5 explored the use of patient registration records and ML for breast cancer risk estimation. The ML model developed in this chapter could be used as an over-the-counter screening (OTC) model for women attending breast clinics. Eight ML algorithms were explored in this project. k-nearest neighbour (kNN) models had a significantly better performance compared to the other seven models. Additionally, Chapter 6 presents a meta-analysis of ML models on breast cancer classification. This project seeks to establish the diagnostic accuracy of ML used on mammographic data. This project found that neural network, deep learning, tree-based models, and SVM performed well on mammographic data for breast cancer detection. The study established the good diagnostic accuracy of ML in this area of research, thus, further supporting the use of ML in this area, especially for screening and supplementary diagnostic tools. Lastly, the study in Chapter 7 explored the use of an ensemble of pre-trained networks for breast abnormality classification using digital mammograms. This project explored thirteen pre-trained networks as candidates for the ensemble model. Each network was further fine-tuned, and the top networks were used to develop the ensemble model. The ensemble pre-trained network displayed a good performance in classifying the normal and suspicious mammograms. In conclusion, this thesis highlights the potential of ML in breast cancer risk estimation and prediction. The findings of this thesis contribute to the growing body of literature on ML in breast cancer research and provide valuable insights for future research in this area.
first_indexed 2024-09-25T03:57:44Z
format Thesis
id usm.eprints-60999
institution Universiti Sains Malaysia
language English
last_indexed 2024-09-25T03:57:44Z
publishDate 2024
record_format dspace
spelling usm.eprints-609992024-08-21T03:20:41Z http://eprints.usm.my/60999/ Prediction of breast cancer diagnosis using machine learning in Malaysian women Mokhtar, Tengku Muhammad Hanis Tengku R Medicine RA440-440.87 Study and teaching. Research RC254-282 Neoplasms. Tumors. Oncology (including Cancer) Breast cancer is the most prevalent cancer in the world and the main cause of cancer mortality in the twelve regions of the world. Thus, there is a need for efficient screening and diagnosis of the disease. Thus, this thesis aims to explore the use of machine learning (ML) for breast cancer risk estimation and prediction. This thesis included six interrelated projects starting from Chapter 2 to Chapter 7. Chapter 2 presents an overview of breast cancer research in Malaysia. A bibliometric analysis was used to describe the research activities of breast cancer research in Malaysia. This project revealed there was no dominant research area in breast cancer research in Malaysia. Additionally, the study found that two growing research themes related to breast cancer in Malaysia were precision medicine and deep learning. Chapter 3 explored the most cited global research related to breast cancer and ML. This project also utilised bibliometric analysis applied to the most cited papers related to breast cancer and ML. This project found that there was a strong interest in the application of ML to breast cancer in the last three decades. The three frequently used ML algorithms were deep learning, support vector machine (SVM), and cluster analysis. In Chapter 4, factors influencing mammographic density among Asian women including Malaysia women were investigated. The study utilised a multiple imputation approach to overcome a missing data issue and a logistic regression to analyse the data. Five factors affecting mammographic density were age, number of children, body mass index, menopause status, and breast imaging-reporting and data system (BI-RADS) classification. The study in Chapter 5 explored the use of patient registration records and ML for breast cancer risk estimation. The ML model developed in this chapter could be used as an over-the-counter screening (OTC) model for women attending breast clinics. Eight ML algorithms were explored in this project. k-nearest neighbour (kNN) models had a significantly better performance compared to the other seven models. Additionally, Chapter 6 presents a meta-analysis of ML models on breast cancer classification. This project seeks to establish the diagnostic accuracy of ML used on mammographic data. This project found that neural network, deep learning, tree-based models, and SVM performed well on mammographic data for breast cancer detection. The study established the good diagnostic accuracy of ML in this area of research, thus, further supporting the use of ML in this area, especially for screening and supplementary diagnostic tools. Lastly, the study in Chapter 7 explored the use of an ensemble of pre-trained networks for breast abnormality classification using digital mammograms. This project explored thirteen pre-trained networks as candidates for the ensemble model. Each network was further fine-tuned, and the top networks were used to develop the ensemble model. The ensemble pre-trained network displayed a good performance in classifying the normal and suspicious mammograms. In conclusion, this thesis highlights the potential of ML in breast cancer risk estimation and prediction. The findings of this thesis contribute to the growing body of literature on ML in breast cancer research and provide valuable insights for future research in this area. 2024-03 Thesis NonPeerReviewed application/pdf en http://eprints.usm.my/60999/1/TENGKU%20MUHAMMAD%20HANIS%20BIN%20TENGKU%20MOKHTAR-E.pdf Mokhtar, Tengku Muhammad Hanis Tengku (2024) Prediction of breast cancer diagnosis using machine learning in Malaysian women. PhD thesis, Universiti Sains Malaysia.
spellingShingle R Medicine
RA440-440.87 Study and teaching. Research
RC254-282 Neoplasms. Tumors. Oncology (including Cancer)
Mokhtar, Tengku Muhammad Hanis Tengku
Prediction of breast cancer diagnosis using machine learning in Malaysian women
title Prediction of breast cancer diagnosis using machine learning in Malaysian women
title_full Prediction of breast cancer diagnosis using machine learning in Malaysian women
title_fullStr Prediction of breast cancer diagnosis using machine learning in Malaysian women
title_full_unstemmed Prediction of breast cancer diagnosis using machine learning in Malaysian women
title_short Prediction of breast cancer diagnosis using machine learning in Malaysian women
title_sort prediction of breast cancer diagnosis using machine learning in malaysian women
topic R Medicine
RA440-440.87 Study and teaching. Research
RC254-282 Neoplasms. Tumors. Oncology (including Cancer)
url http://eprints.usm.my/60999/1/TENGKU%20MUHAMMAD%20HANIS%20BIN%20TENGKU%20MOKHTAR-E.pdf
work_keys_str_mv AT mokhtartengkumuhammadhanistengku predictionofbreastcancerdiagnosisusingmachinelearninginmalaysianwomen