The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data

Background: Breast cancer is one of the most prevalent types of cancer in Iranian women and the second cause of death in women worldwide. Gene mutations are the key determinants of the disease; therefore, the genetic study of this disease is of paramount importance. One of the genetic evaluation met...

Full description

Bibliographic Details
Main Authors: Ahmad Sohrabi, Neda Saraygord-Afshari, Masoud Roudbari
Format: Article
Language:English
Published: Shiraz University of Medical Sciences 2022-10-01
Series:Middle East Journal of Cancer
Subjects:
Online Access:https://mejc.sums.ac.ir/article_48134_e6a7d4ca2a6eaa499a07d8817520aab4.pdf
_version_ 1828144566073556992
author Ahmad Sohrabi
Neda Saraygord-Afshari
Masoud Roudbari
author_facet Ahmad Sohrabi
Neda Saraygord-Afshari
Masoud Roudbari
author_sort Ahmad Sohrabi
collection DOAJ
description Background: Breast cancer is one of the most prevalent types of cancer in Iranian women and the second cause of death in women worldwide. Gene mutations are the key determinants of the disease; therefore, the genetic study of this disease is of paramount importance. One of the genetic evaluation methods of this disease is microarray technology, which allows the examination of the simultaneous expression of thousands of genes. Clustering is the method for analyzing high-dimension data, which we used in the present research for collecting similar genes in separated clusters.Method: A descriptive and inferential statistical analysis was carried out to evaluate unsupervised learning models of gene expression analysis and five bi-clustering methods (including PLAID (PL), Fabia, Bimax, Cheng & Church (CC), and Xmotif) were compared. For this purpose, we obtained the microarray gene expression data for lapatinib-resistant breast cancer cell lines from previously published research. The enrichment efficacy of the clusters was evaluated with gene ontology, and the results of these five models were compared with the Jaccard index, variance stability, least-square error, and goodness of fit indices. Furthermore, the results of the best model were assessed for building a genes sets network with Bayesian networks.Results: After preprocessing, clustering was performed on the data with the dimension (4710 × 18) of the genes. Four models, except for CC, successfully found bi-clusters in the data set. The data evaluation revealed that the results of the models were almost the same, but the PL model performed better than the others, finding 11 bi-clusters; this model was used to build the network of gene sets.Conclusion: According to the results, the PL method was suitable for clustering the data. Accordingly, it could be recommended for data analysis. In addition, the gene sets network formed on gene expression data was incompetent.
first_indexed 2024-04-11T20:17:13Z
format Article
id doaj.art-9a4743756ccb405080916ffe3a3e2fc9
institution Directory Open Access Journal
issn 2008-6709
2008-6687
language English
last_indexed 2024-04-11T20:17:13Z
publishDate 2022-10-01
publisher Shiraz University of Medical Sciences
record_format Article
series Middle East Journal of Cancer
spelling doaj.art-9a4743756ccb405080916ffe3a3e2fc92022-12-22T04:04:55ZengShiraz University of Medical SciencesMiddle East Journal of Cancer2008-67092008-66872022-10-0113462464010.30476/mejc.2022.89998.155748134The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray DataAhmad Sohrabi0Neda Saraygord-Afshari1Masoud Roudbari2Department of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, IranDepartment of Medical Biotechnology, Faculty of Allied Medical Sciences, Iran University of Medical Sciences, Tehran, IranDepartment of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, IranBackground: Breast cancer is one of the most prevalent types of cancer in Iranian women and the second cause of death in women worldwide. Gene mutations are the key determinants of the disease; therefore, the genetic study of this disease is of paramount importance. One of the genetic evaluation methods of this disease is microarray technology, which allows the examination of the simultaneous expression of thousands of genes. Clustering is the method for analyzing high-dimension data, which we used in the present research for collecting similar genes in separated clusters.Method: A descriptive and inferential statistical analysis was carried out to evaluate unsupervised learning models of gene expression analysis and five bi-clustering methods (including PLAID (PL), Fabia, Bimax, Cheng & Church (CC), and Xmotif) were compared. For this purpose, we obtained the microarray gene expression data for lapatinib-resistant breast cancer cell lines from previously published research. The enrichment efficacy of the clusters was evaluated with gene ontology, and the results of these five models were compared with the Jaccard index, variance stability, least-square error, and goodness of fit indices. Furthermore, the results of the best model were assessed for building a genes sets network with Bayesian networks.Results: After preprocessing, clustering was performed on the data with the dimension (4710 × 18) of the genes. Four models, except for CC, successfully found bi-clusters in the data set. The data evaluation revealed that the results of the models were almost the same, but the PL model performed better than the others, finding 11 bi-clusters; this model was used to build the network of gene sets.Conclusion: According to the results, the PL method was suitable for clustering the data. Accordingly, it could be recommended for data analysis. In addition, the gene sets network formed on gene expression data was incompetent.https://mejc.sums.ac.ir/article_48134_e6a7d4ca2a6eaa499a07d8817520aab4.pdfbreast cancerbi-clusteringcluster analysismicroarray datagene expressionneoplasmsbayesian network
spellingShingle Ahmad Sohrabi
Neda Saraygord-Afshari
Masoud Roudbari
The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
Middle East Journal of Cancer
breast cancer
bi-clustering
cluster analysis
microarray data
gene expression
neoplasms
bayesian network
title The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
title_full The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
title_fullStr The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
title_full_unstemmed The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
title_short The Application of Bi-clustering and Bayesian Network for Gene Sets Network Construction in Breast Cancer Microarray Data
title_sort application of bi clustering and bayesian network for gene sets network construction in breast cancer microarray data
topic breast cancer
bi-clustering
cluster analysis
microarray data
gene expression
neoplasms
bayesian network
url https://mejc.sums.ac.ir/article_48134_e6a7d4ca2a6eaa499a07d8817520aab4.pdf
work_keys_str_mv AT ahmadsohrabi theapplicationofbiclusteringandbayesiannetworkforgenesetsnetworkconstructioninbreastcancermicroarraydata
AT nedasaraygordafshari theapplicationofbiclusteringandbayesiannetworkforgenesetsnetworkconstructioninbreastcancermicroarraydata
AT masoudroudbari theapplicationofbiclusteringandbayesiannetworkforgenesetsnetworkconstructioninbreastcancermicroarraydata
AT ahmadsohrabi applicationofbiclusteringandbayesiannetworkforgenesetsnetworkconstructioninbreastcancermicroarraydata
AT nedasaraygordafshari applicationofbiclusteringandbayesiannetworkforgenesetsnetworkconstructioninbreastcancermicroarraydata
AT masoudroudbari applicationofbiclusteringandbayesiannetworkforgenesetsnetworkconstructioninbreastcancermicroarraydata