Blazing Signature Filter: a library for fast pairwise similarity comparisons
Abstract Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts whi...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-06-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2210-6 |
_version_ | 1811289070060437504 |
---|---|
author | Joon-Yong Lee Grant M. Fujimoto Ryan Wilson H. Steven Wiley Samuel H. Payne |
author_facet | Joon-Yong Lee Grant M. Fujimoto Ryan Wilson H. Steven Wiley Samuel H. Payne |
author_sort | Joon-Yong Lee |
collection | DOAJ |
description | Abstract Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. Results The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. Conclusions The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware. |
first_indexed | 2024-04-13T03:48:27Z |
format | Article |
id | doaj.art-dc6ed44605af44c883360d5a458797b6 |
institution | Directory Open Access Journal |
issn | 1471-2105 |
language | English |
last_indexed | 2024-04-13T03:48:27Z |
publishDate | 2018-06-01 |
publisher | BMC |
record_format | Article |
series | BMC Bioinformatics |
spelling | doaj.art-dc6ed44605af44c883360d5a458797b62022-12-22T03:03:55ZengBMCBMC Bioinformatics1471-21052018-06-0119111210.1186/s12859-018-2210-6Blazing Signature Filter: a library for fast pairwise similarity comparisonsJoon-Yong Lee0Grant M. Fujimoto1Ryan Wilson2H. Steven Wiley3Samuel H. Payne4Integrative Omics, Pacific Northwest National LaboratoryIntegrative Omics, Pacific Northwest National LaboratoryIntegrative Omics, Pacific Northwest National LaboratoryEnvironmental Molecular Sciences Laboratory, Pacific Northwest National LaboratoryIntegrative Omics, Pacific Northwest National LaboratoryAbstract Background Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data. Results The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes. Conclusions The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.http://link.springer.com/article/10.1186/s12859-018-2210-6Pairwise similarity comparisonFilteringLarge-scale data mining |
spellingShingle | Joon-Yong Lee Grant M. Fujimoto Ryan Wilson H. Steven Wiley Samuel H. Payne Blazing Signature Filter: a library for fast pairwise similarity comparisons BMC Bioinformatics Pairwise similarity comparison Filtering Large-scale data mining |
title | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_full | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_fullStr | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_full_unstemmed | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_short | Blazing Signature Filter: a library for fast pairwise similarity comparisons |
title_sort | blazing signature filter a library for fast pairwise similarity comparisons |
topic | Pairwise similarity comparison Filtering Large-scale data mining |
url | http://link.springer.com/article/10.1186/s12859-018-2210-6 |
work_keys_str_mv | AT joonyonglee blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT grantmfujimoto blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT ryanwilson blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT hstevenwiley blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons AT samuelhpayne blazingsignaturefilteralibraryforfastpairwisesimilaritycomparisons |