Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge

Abstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. Ho...

Full description

Bibliographic Details
Main Authors: Tara Eicher, Andrew Patt, Esko Kautto, Raghu Machiraju, Ewy Mathé, Yan Zhang
Format: Article
Language:English
Published: BMC 2019-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-019-3253-z
_version_ 1818601231729295360
author Tara Eicher
Andrew Patt
Esko Kautto
Raghu Machiraju
Ewy Mathé
Yan Zhang
author_facet Tara Eicher
Andrew Patt
Esko Kautto
Raghu Machiraju
Ewy Mathé
Yan Zhang
author_sort Tara Eicher
collection DOAJ
description Abstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.
first_indexed 2024-12-16T12:48:06Z
format Article
id doaj.art-08bc8c48c42543ad81b05381facb78c2
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-16T12:48:06Z
publishDate 2019-12-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-08bc8c48c42543ad81b05381facb78c22022-12-21T22:31:14ZengBMCBMC Bioinformatics1471-21052019-12-0120S2411610.1186/s12859-019-3253-zChallenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challengeTara Eicher0Andrew Patt1Esko Kautto2Raghu Machiraju3Ewy Mathé4Yan Zhang5Department of Computer Science and Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Computer Science and Engineering, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityDepartment of Biomedical Informatics, College of Medicine, The Ohio State UniversityAbstract Background Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. Results We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. Conclusions In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.https://doi.org/10.1186/s12859-019-3253-zProteogenomicsmRNARandom forestsFuzzy logicBayesian networks
spellingShingle Tara Eicher
Andrew Patt
Esko Kautto
Raghu Machiraju
Ewy Mathé
Yan Zhang
Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
BMC Bioinformatics
Proteogenomics
mRNA
Random forests
Fuzzy logic
Bayesian networks
title Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_full Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_fullStr Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_full_unstemmed Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_short Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge
title_sort challenges in proteogenomics a comparison of analysis methods with the case study of the dream proteogenomics sub challenge
topic Proteogenomics
mRNA
Random forests
Fuzzy logic
Bayesian networks
url https://doi.org/10.1186/s12859-019-3253-z
work_keys_str_mv AT taraeicher challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT andrewpatt challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT eskokautto challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT raghumachiraju challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT ewymathe challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge
AT yanzhang challengesinproteogenomicsacomparisonofanalysismethodswiththecasestudyofthedreamproteogenomicssubchallenge