MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects

Abstract Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are...

Full description

Bibliographic Details
Main Authors: He Wang, Kai Peng Lim, Weijia Kong, Huanhuan Gao, Bertrand Jern Han Wong, Ser Xian Phua, Tiannan Guo, Wilson Wen Bin Goh
Format: Article
Language:English
Published: Nature Portfolio 2023-12-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-023-02779-8
_version_ 1797416187172225024
author He Wang
Kai Peng Lim
Weijia Kong
Huanhuan Gao
Bertrand Jern Han Wong
Ser Xian Phua
Tiannan Guo
Wilson Wen Bin Goh
author_facet He Wang
Kai Peng Lim
Weijia Kong
Huanhuan Gao
Bertrand Jern Han Wong
Ser Xian Phua
Tiannan Guo
Wilson Wen Bin Goh
author_sort He Wang
collection DOAJ
description Abstract Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.
first_indexed 2024-03-09T05:59:51Z
format Article
id doaj.art-07c3c211bcf642b5ac75d20d35c89031
institution Directory Open Access Journal
issn 2052-4463
language English
last_indexed 2024-03-09T05:59:51Z
publishDate 2023-12-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj.art-07c3c211bcf642b5ac75d20d35c890312023-12-03T12:10:03ZengNature PortfolioScientific Data2052-44632023-12-0110111110.1038/s41597-023-02779-8MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effectsHe Wang0Kai Peng Lim1Weijia Kong2Huanhuan Gao3Bertrand Jern Han Wong4Ser Xian Phua5Tiannan Guo6Wilson Wen Bin Goh7Lee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityWestlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and BiomedicineSchool of Biological Sciences, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityWestlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and BiomedicineLee Kong Chian School of Medicine, Nanyang Technological UniversityAbstract Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.https://doi.org/10.1038/s41597-023-02779-8
spellingShingle He Wang
Kai Peng Lim
Weijia Kong
Huanhuan Gao
Bertrand Jern Han Wong
Ser Xian Phua
Tiannan Guo
Wilson Wen Bin Goh
MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
Scientific Data
title MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
title_full MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
title_fullStr MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
title_full_unstemmed MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
title_short MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
title_sort multipro dda pasef and diapasef acquired cell line proteomic datasets with deliberate batch effects
url https://doi.org/10.1038/s41597-023-02779-8
work_keys_str_mv AT hewang multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT kaipenglim multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT weijiakong multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT huanhuangao multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT bertrandjernhanwong multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT serxianphua multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT tiannanguo multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects
AT wilsonwenbingoh multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects