RUBic: rapid unsupervised biclustering

Abstract Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering al...

Full description

Bibliographic Details
Main Authors:	Brijesh K. Sriwastava, Anup Kumar Halder, Subhadip Basu, Tapabrata Chakraborti
Format:	Article
Language:	English
Published:	BMC 2023-11-01
Series:	BMC Bioinformatics
Subjects:	Data mining Algorithm design and analysis Biclustering algorithms Computational complexity
Online Access:	https://doi.org/10.1186/s12859-023-05534-3

_version_	1797556108106137600
author	Brijesh K. Sriwastava Anup Kumar Halder Subhadip Basu Tapabrata Chakraborti
author_facet	Brijesh K. Sriwastava Anup Kumar Halder Subhadip Basu Tapabrata Chakraborti
author_sort	Brijesh K. Sriwastava
collection	DOAJ
description	Abstract Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took $$\sim 71.1$$ ∼ 71.1 s to extract 494,872 biclusters. In the human PPI database of size $$4085\times 4085$$ 4085 × 4085 , our method generates 1840 biclusters in $$\sim 48.6$$ ∼ 48.6 s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.
first_indexed	2024-03-10T16:57:05Z
format	Article
id	doaj.art-9c652dffadeb4f478820d968df4dcfdf
institution	Directory Open Access Journal
issn	1471-2105
language	English
last_indexed	2024-03-10T16:57:05Z
publishDate	2023-11-01
publisher	BMC
record_format	Article
series	BMC Bioinformatics
spelling	doaj.art-9c652dffadeb4f478820d968df4dcfdf2023-11-20T11:06:07ZengBMCBMC Bioinformatics1471-21052023-11-0124111610.1186/s12859-023-05534-3RUBic: rapid unsupervised biclusteringBrijesh K. Sriwastava0Anup Kumar Halder1Subhadip Basu2Tapabrata Chakraborti3Computer Science and Engineering Department, Government College of Engineering and Leather TechnologyFaculty of Mathematics and Information Sciences, Warsaw University of TechnologyDepartment of Computer Science and Engineering, Jadavpur UniversityThe Alan Turing Institute and University College LondonAbstract Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took $$\sim 71.1$$ ∼ 71.1 s to extract 494,872 biclusters. In the human PPI database of size $$4085\times 4085$$ 4085 × 4085 , our method generates 1840 biclusters in $$\sim 48.6$$ ∼ 48.6 s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes 101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.https://doi.org/10.1186/s12859-023-05534-3Data miningAlgorithm design and analysisBiclustering algorithmsComputational complexity
spellingShingle	Brijesh K. Sriwastava Anup Kumar Halder Subhadip Basu Tapabrata Chakraborti RUBic: rapid unsupervised biclustering BMC Bioinformatics Data mining Algorithm design and analysis Biclustering algorithms Computational complexity
title	RUBic: rapid unsupervised biclustering
title_full	RUBic: rapid unsupervised biclustering
title_fullStr	RUBic: rapid unsupervised biclustering
title_full_unstemmed	RUBic: rapid unsupervised biclustering
title_short	RUBic: rapid unsupervised biclustering
title_sort	rubic rapid unsupervised biclustering
topic	Data mining Algorithm design and analysis Biclustering algorithms Computational complexity
url	https://doi.org/10.1186/s12859-023-05534-3
work_keys_str_mv	AT brijeshksriwastava rubicrapidunsupervisedbiclustering AT anupkumarhalder rubicrapidunsupervisedbiclustering AT subhadipbasu rubicrapidunsupervisedbiclustering AT tapabratachakraborti rubicrapidunsupervisedbiclustering

RUBic: rapid unsupervised biclustering

Similar Items