Advanced bioinformatics methods for practical applications in proteomics

Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on...

Full description

Bibliographic Details
Main Authors:	Goh, Wilson Wen Bin, Wong, Limsoon
Other Authors:	School of Biological Sciences
Format:	Journal Article
Language:	English
Published:	2020
Subjects:	Science::Biological sciences Proteomics Networks
Online Access:	https://hdl.handle.net/10356/144722

_version_	1826129218236317696
author	Goh, Wilson Wen Bin Wong, Limsoon
author2	School of Biological Sciences
author_facet	School of Biological Sciences Goh, Wilson Wen Bin Wong, Limsoon
author_sort	Goh, Wilson Wen Bin
collection	NTU
description	Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak.
first_indexed	2024-10-01T07:37:07Z
format	Journal Article
id	ntu-10356/144722
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T07:37:07Z
publishDate	2020
record_format	dspace
spelling	ntu-10356/1447222023-02-28T16:57:20Z Advanced bioinformatics methods for practical applications in proteomics Goh, Wilson Wen Bin Wong, Limsoon School of Biological Sciences Science::Biological sciences Proteomics Networks Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak. Ministry of Education (MOE) Accepted version A Singapore Ministry of Education tier-2 grant (MOE2012-T2-1-061) to L. W. 2020-11-20T06:56:49Z 2020-11-20T06:56:49Z 2018 Journal Article Goh, W. W. B., & Wong, L. (2017). Advanced bioinformatics methods for practical applications in proteomics. Briefings in Bioinformatics, 20(1), 347–355. doi:10.1093/bib/bbx128 1467-5463 https://hdl.handle.net/10356/144722 10.1093/bib/bbx128 30657890 1 20 347 355 en Briefings in bioinformatics © 2017 Oxford University Press. All rights reserved. This is a pre-copyedited, author-produced PDF of an article accepted for publication in Briefings in bioinformatics following peer review. The definitive publisher-authenticated version Goh, W. W. B., & Wong, L. (2017). Advanced bioinformatics methods for practical applications in proteomics. Briefings in Bioinformatics, 20(1), 347–355. is available online at:https://doi.org/10.1093/bib/bbx128. application/pdf
spellingShingle	Science::Biological sciences Proteomics Networks Goh, Wilson Wen Bin Wong, Limsoon Advanced bioinformatics methods for practical applications in proteomics
title	Advanced bioinformatics methods for practical applications in proteomics
title_full	Advanced bioinformatics methods for practical applications in proteomics
title_fullStr	Advanced bioinformatics methods for practical applications in proteomics
title_full_unstemmed	Advanced bioinformatics methods for practical applications in proteomics
title_short	Advanced bioinformatics methods for practical applications in proteomics
title_sort	advanced bioinformatics methods for practical applications in proteomics
topic	Science::Biological sciences Proteomics Networks
url	https://hdl.handle.net/10356/144722
work_keys_str_mv	AT gohwilsonwenbin advancedbioinformaticsmethodsforpracticalapplicationsinproteomics AT wonglimsoon advancedbioinformaticsmethodsforpracticalapplicationsinproteomics

Advanced bioinformatics methods for practical applications in proteomics

Similar Items