MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects
Abstract Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2023-12-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-023-02779-8 |
_version_ | 1797416187172225024 |
---|---|
author | He Wang Kai Peng Lim Weijia Kong Huanhuan Gao Bertrand Jern Han Wong Ser Xian Phua Tiannan Guo Wilson Wen Bin Goh |
author_facet | He Wang Kai Peng Lim Weijia Kong Huanhuan Gao Bertrand Jern Han Wong Ser Xian Phua Tiannan Guo Wilson Wen Bin Goh |
author_sort | He Wang |
collection | DOAJ |
description | Abstract Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms. |
first_indexed | 2024-03-09T05:59:51Z |
format | Article |
id | doaj.art-07c3c211bcf642b5ac75d20d35c89031 |
institution | Directory Open Access Journal |
issn | 2052-4463 |
language | English |
last_indexed | 2024-03-09T05:59:51Z |
publishDate | 2023-12-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj.art-07c3c211bcf642b5ac75d20d35c890312023-12-03T12:10:03ZengNature PortfolioScientific Data2052-44632023-12-0110111110.1038/s41597-023-02779-8MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effectsHe Wang0Kai Peng Lim1Weijia Kong2Huanhuan Gao3Bertrand Jern Han Wong4Ser Xian Phua5Tiannan Guo6Wilson Wen Bin Goh7Lee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityWestlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and BiomedicineSchool of Biological Sciences, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityWestlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and BiomedicineLee Kong Chian School of Medicine, Nanyang Technological UniversityAbstract Mass spectrometry-based proteomics plays a critical role in current biological and clinical research. Technical issues like data integration, missing value imputation, batch effect correction and the exploration of inter-connections amongst these technical issues, can produce errors but are not well studied. Although proteomic technologies have improved significantly in recent years, this alone cannot resolve these issues. What is needed are better algorithms and data processing knowledge. But to obtain these, we need appropriate proteomics datasets for exploration, investigation, and benchmarking. To meet this need, we developed MultiPro (Multi-purpose Proteome Resource), a resource comprising four comprehensive large-scale proteomics datasets with deliberate batch effects using the latest parallel accumulation-serial fragmentation in both Data-Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) modes. Each dataset contains a balanced two-class design based on well-characterized and widely studied cell lines (A549 vs K562 or HCC1806 vs HS578T) with 48 or 36 biological and technical replicates altogether, allowing for investigation of a multitude of technical issues. These datasets allow for investigation of inter-connections between class and batch factors, or to develop approaches to compare and integrate data from DDA and DIA platforms.https://doi.org/10.1038/s41597-023-02779-8 |
spellingShingle | He Wang Kai Peng Lim Weijia Kong Huanhuan Gao Bertrand Jern Han Wong Ser Xian Phua Tiannan Guo Wilson Wen Bin Goh MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects Scientific Data |
title | MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects |
title_full | MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects |
title_fullStr | MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects |
title_full_unstemmed | MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects |
title_short | MultiPro: DDA-PASEF and diaPASEF acquired cell line proteomic datasets with deliberate batch effects |
title_sort | multipro dda pasef and diapasef acquired cell line proteomic datasets with deliberate batch effects |
url | https://doi.org/10.1038/s41597-023-02779-8 |
work_keys_str_mv | AT hewang multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT kaipenglim multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT weijiakong multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT huanhuangao multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT bertrandjernhanwong multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT serxianphua multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT tiannanguo multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects AT wilsonwenbingoh multiproddapasefanddiapasefacquiredcelllineproteomicdatasetswithdeliberatebatcheffects |