RUBic: rapid unsupervised biclustering

Abstract Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering al...

Full description

Bibliographic Details
Main Authors: Brijesh K. Sriwastava, Anup Kumar Halder, Subhadip Basu, Tapabrata Chakraborti
Format: Article
Language:English
Published: BMC 2023-11-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-023-05534-3
_version_ 1797556108106137600
author Brijesh K. Sriwastava
Anup Kumar Halder
Subhadip Basu
Tapabrata Chakraborti
author_facet Brijesh K. Sriwastava
Anup Kumar Halder
Subhadip Basu
Tapabrata Chakraborti
author_sort Brijesh K. Sriwastava
collection DOAJ
description Abstract Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took $$\sim 71.1$$ ∼ 71.1  s to extract 494,872 biclusters. In the human PPI database of size $$4085\times 4085$$ 4085 × 4085 , our method generates 1840 biclusters in $$\sim 48.6$$ ∼ 48.6  s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes   101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.
first_indexed 2024-03-10T16:57:05Z
format Article
id doaj.art-9c652dffadeb4f478820d968df4dcfdf
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-03-10T16:57:05Z
publishDate 2023-11-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-9c652dffadeb4f478820d968df4dcfdf2023-11-20T11:06:07ZengBMCBMC Bioinformatics1471-21052023-11-0124111610.1186/s12859-023-05534-3RUBic: rapid unsupervised biclusteringBrijesh K. Sriwastava0Anup Kumar Halder1Subhadip Basu2Tapabrata Chakraborti3Computer Science and Engineering Department, Government College of Engineering and Leather TechnologyFaculty of Mathematics and Information Sciences, Warsaw University of TechnologyDepartment of Computer Science and Engineering, Jadavpur UniversityThe Alan Turing Institute and University College LondonAbstract Biclustering of biologically meaningful binary information is essential in many applications related to drug discovery, like protein–protein interactions and gene expressions. However, for robust performance in recently emerging large health datasets, it is important for new biclustering algorithms to be scalable and fast. We present a rapid unsupervised biclustering (RUBic) algorithm that achieves this objective with a novel encoding and search strategy. RUBic significantly reduces the computational overhead on both synthetic and experimental datasets shows significant computational benefits, with respect to several state-of-the-art biclustering algorithms. In 100 synthetic binary datasets, our method took $$\sim 71.1$$ ∼ 71.1  s to extract 494,872 biclusters. In the human PPI database of size $$4085\times 4085$$ 4085 × 4085 , our method generates 1840 biclusters in $$\sim 48.6$$ ∼ 48.6  s. On a central nervous system embryonic tumor gene expression dataset of size 712,940, our algorithm takes   101 min to produce 747,069 biclusters, while the recent competing algorithms take significantly more time to produce the same result. RUBic is also evaluated on five different gene expression datasets and shows significant speed-up in execution time with respect to existing approaches to extract significant KEGG-enriched bi-clustering. RUBic can operate on two modes, base and flex, where base mode generates maximal biclusters and flex mode generates less number of clusters and faster based on their biological significance with respect to KEGG pathways. The code is available at ( https://github.com/CMATERJU-BIOINFO/RUBic ) for academic use only.https://doi.org/10.1186/s12859-023-05534-3Data miningAlgorithm design and analysisBiclustering algorithmsComputational complexity
spellingShingle Brijesh K. Sriwastava
Anup Kumar Halder
Subhadip Basu
Tapabrata Chakraborti
RUBic: rapid unsupervised biclustering
BMC Bioinformatics
Data mining
Algorithm design and analysis
Biclustering algorithms
Computational complexity
title RUBic: rapid unsupervised biclustering
title_full RUBic: rapid unsupervised biclustering
title_fullStr RUBic: rapid unsupervised biclustering
title_full_unstemmed RUBic: rapid unsupervised biclustering
title_short RUBic: rapid unsupervised biclustering
title_sort rubic rapid unsupervised biclustering
topic Data mining
Algorithm design and analysis
Biclustering algorithms
Computational complexity
url https://doi.org/10.1186/s12859-023-05534-3
work_keys_str_mv AT brijeshksriwastava rubicrapidunsupervisedbiclustering
AT anupkumarhalder rubicrapidunsupervisedbiclustering
AT subhadipbasu rubicrapidunsupervisedbiclustering
AT tapabratachakraborti rubicrapidunsupervisedbiclustering