MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture

Background:Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified...

Full description

Bibliographic Details
Main Authors: Li, Chuang, Li, Kenli, Li, Keqin, Lin, Feng
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/84179
http://hdl.handle.net/10220/49783
_version_ 1811681161993256960
author Li, Chuang
Li, Kenli
Li, Keqin
Lin, Feng
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Li, Chuang
Li, Kenli
Li, Keqin
Lin, Feng
author_sort Li, Chuang
collection NTU
description Background:Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.Results:This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.Conclusions:For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem.
first_indexed 2024-10-01T03:36:33Z
format Journal Article
id ntu-10356/84179
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:36:33Z
publishDate 2019
record_format dspace
spelling ntu-10356/841792020-03-07T11:50:48Z MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture Li, Chuang Li, Kenli Li, Keqin Lin, Feng School of Computer Science and Engineering Engineering::Computer science and engineering Peptide Identification Tandem Mass Spectrometry Background:Tandem mass spectrometry (MS/MS)-based database searching is a widely acknowledged and widely used method for peptide identification in shotgun proteomics. However, due to the rapid growth of spectra data produced by advanced mass spectrometry and the greatly increased number of modified and digested peptides identified in recent years, the current methods for peptide database searching cannot rapidly and thoroughly process large MS/MS spectra datasets. A breakthrough in efficient database search algorithms is crucial for peptide identification in computational proteomics.Results:This paper presents MCtandem, an efficient tool for large-scale peptide identification on Intel Many Integrated Core (MIC) architecture. To support big data processing capability, a novel parallel match scoring algorithm, named MIC-SDP (spectrum dot product), and its two-level parallelization are presented in MCtandem’s design. In addition, a series of optimization strategies on both the host CPU side and the MIC side, which includes pre-fetching, optimized communication overlapping scheme, multithreading and hyper-threading, are exploited to improve the execution performance.Conclusions:For fair comparisons, we first set up experiments and verified the 28 fold times speedup on a single MIC against the original CPU-based implementation. We then execute the MCtandem for a very large dataset on an MIC cluster (a component of the Tianhe-2 supercomputer) and achieved much higher scalability than in a benchmark MapReduce-based programs, MR-Tandem. MCtandem is an open-source software tool implemented in C++. The source code and the parameter settings are available at https://github.com/LogicZY/MCtandem. Published version 2019-08-27T02:17:18Z 2019-12-06T15:39:58Z 2019-08-27T02:17:18Z 2019-12-06T15:39:58Z 2019 Journal Article Li, C., Li, K., Li, K., & Lin, F. (2019). MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture. BMC Bioinformatics, 20(1), 397-. doi:10.1186/s12859-019-2980-5 https://hdl.handle.net/10356/84179 http://hdl.handle.net/10220/49783 10.1186/s12859-019-2980-5 en BMC Bioinformatics © 2019 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. 13 p. application/pdf
spellingShingle Engineering::Computer science and engineering
Peptide Identification
Tandem Mass Spectrometry
Li, Chuang
Li, Kenli
Li, Keqin
Lin, Feng
MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
title MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
title_full MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
title_fullStr MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
title_full_unstemmed MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
title_short MCtandem : an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture
title_sort mctandem an efficient tool for large scale peptide identification on many integrated core mic architecture
topic Engineering::Computer science and engineering
Peptide Identification
Tandem Mass Spectrometry
url https://hdl.handle.net/10356/84179
http://hdl.handle.net/10220/49783
work_keys_str_mv AT lichuang mctandemanefficienttoolforlargescalepeptideidentificationonmanyintegratedcoremicarchitecture
AT likenli mctandemanefficienttoolforlargescalepeptideidentificationonmanyintegratedcoremicarchitecture
AT likeqin mctandemanefficienttoolforlargescalepeptideidentificationonmanyintegratedcoremicarchitecture
AT linfeng mctandemanefficienttoolforlargescalepeptideidentificationonmanyintegratedcoremicarchitecture