Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

Abstract Background Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signa...

Full description

Bibliographic Details
Main Authors: Konstantin Bob, David Teschner, Thomas Kemmer, David Gomez-Zepeda, Stefan Tenzer, Bertil Schmidt, Andreas Hildebrandt
Format: Article
Language:English
Published: BMC 2022-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-04833-5
_version_ 1811287446779854848
author Konstantin Bob
David Teschner
Thomas Kemmer
David Gomez-Zepeda
Stefan Tenzer
Bertil Schmidt
Andreas Hildebrandt
author_facet Konstantin Bob
David Teschner
Thomas Kemmer
David Gomez-Zepeda
Stefan Tenzer
Bertil Schmidt
Andreas Hildebrandt
author_sort Konstantin Bob
collection DOAJ
description Abstract Background Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. Results In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Conclusions Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Availability Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .
first_indexed 2024-04-13T03:18:30Z
format Article
id doaj.art-7fbe9bcd7cb8405d93beb2c2c61ca1d5
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-13T03:18:30Z
publishDate 2022-07-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-7fbe9bcd7cb8405d93beb2c2c61ca1d52022-12-22T03:04:50ZengBMCBMC Bioinformatics1471-21052022-07-0123111610.1186/s12859-022-04833-5Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw dataKonstantin Bob0David Teschner1Thomas Kemmer2David Gomez-Zepeda3Stefan Tenzer4Bertil Schmidt5Andreas Hildebrandt6Institute of Computer Science, Johannes Gutenberg University MainzInstitute of Computer Science, Johannes Gutenberg University MainzInstitute of Computer Science, Johannes Gutenberg University MainzInstitute for Immunology, University Medical Center of the Johannes Gutenberg University MainzInstitute for Immunology, University Medical Center of the Johannes Gutenberg University MainzInstitute of Computer Science, Johannes Gutenberg University MainzInstitute of Computer Science, Johannes Gutenberg University MainzAbstract Background Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties. Results In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs. Conclusions Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data. Availability Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .https://doi.org/10.1186/s12859-022-04833-5Mass spectrometryLocality-sensitive hashingSignal processing
spellingShingle Konstantin Bob
David Teschner
Thomas Kemmer
David Gomez-Zepeda
Stefan Tenzer
Bertil Schmidt
Andreas Hildebrandt
Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
BMC Bioinformatics
Mass spectrometry
Locality-sensitive hashing
Signal processing
title Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
title_full Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
title_fullStr Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
title_full_unstemmed Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
title_short Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data
title_sort locality sensitive hashing enables efficient and scalable signal classification in high throughput mass spectrometry raw data
topic Mass spectrometry
Locality-sensitive hashing
Signal processing
url https://doi.org/10.1186/s12859-022-04833-5
work_keys_str_mv AT konstantinbob localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata
AT davidteschner localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata
AT thomaskemmer localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata
AT davidgomezzepeda localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata
AT stefantenzer localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata
AT bertilschmidt localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata
AT andreashildebrandt localitysensitivehashingenablesefficientandscalablesignalclassificationinhighthroughputmassspectrometryrawdata