Blazing Signature Filter: a library for fast pairwise similarity comparisons

Abstract Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts whi...

Full description

Bibliographic Details
Main Authors: Joon-Yong Lee, Grant M. Fujimoto, Ryan Wilson, H. Steven Wiley, Samuel H. Payne
Format: Article
Language:English
Published: BMC 2018-06-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2210-6
_version_ 1811289070060437504
author Joon-Yong Lee
Grant M. Fujimoto
Ryan Wilson
H. Steven Wiley
Samuel H. Payne
author_facet Joon-Yong Lee
Grant M. Fujimoto
Ryan Wilson
H. Steven Wiley
Samuel H. Payne
author_sort Joon-Yong Lee
collection DOAJ
description Abstract Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. Results The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. Conclusions The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.
first_indexed 2024-04-13T03:48:27Z
format Article
id doaj.art-dc6ed44605af44c883360d5a458797b6
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T03:48:27Z
publishDate 2018-06-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-dc6ed44605af44c883360d5a458797b62022-12-22T03:03:55ZengBMCBMC Bioinformatics1471-21052018-06-0119111210.1186/s12859-018-2210-6Blazing Signature Filter: a library for fast pairwise similarity comparisonsJoon-Yong Lee0Grant M. Fujimoto1Ryan Wilson2H. Steven Wiley3Samuel H. Payne4Integrative Omics, Pacific Northwest National LaboratoryIntegrative Omics, Pacific Northwest National LaboratoryIntegrative Omics, Pacific Northwest National LaboratoryEnvironmental Molecular Sciences Laboratory, Pacific Northwest National LaboratoryIntegrative Omics, Pacific Northwest National LaboratoryAbstract Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. Results The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. Conclusions The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.http://link.springer.com/article/10.1186/s12859-018-2210-6Pairwise similarity comparisonFilteringLarge-scale data mining
spellingShingle Joon-Yong Lee
Grant M. Fujimoto
Ryan Wilson
H. Steven Wiley
Samuel H. Payne
Blazing Signature Filter: a library for fast pairwise similarity comparisons
BMC Bioinformatics
Pairwise similarity comparison
Filtering
Large-scale data mining
title Blazing Signature Filter: a library for fast pairwise similarity comparisons
title_full Blazing Signature Filter: a library for fast pairwise similarity comparisons
title_fullStr Blazing Signature Filter: a library for fast pairwise similarity comparisons
title_full_unstemmed Blazing Signature Filter: a library for fast pairwise similarity comparisons
title_short Blazing Signature Filter: a library for fast pairwise similarity comparisons
title_sort blazing signature filter a library for fast pairwise similarity comparisons
topic Pairwise similarity comparison
Filtering
Large-scale data mining
url http://link.springer.com/article/10.1186/s12859-018-2210-6
work_keys_str_mv AT joonyonglee blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons
AT grantmfujimoto blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons
AT ryanwilson blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons
AT hstevenwiley blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons
AT samuelhpayne blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons