Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys

Abstract Background Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annota...

Full description

Bibliographic Details
Main Authors: J. Michael Proffitt, Jeremy Glenn, Anthony J. Cesnik, Avinash Jadhav, Michael R. Shortreed, Lloyd M. Smith, Kylie Kavanagh, Laura A. Cox, Michael Olivier
Format: Article
Language:English
Published: BMC 2017-11-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-4279-0
_version_ 1819015896018976768
author J. Michael Proffitt
Jeremy Glenn
Anthony J. Cesnik
Avinash Jadhav
Michael R. Shortreed
Lloyd M. Smith
Kylie Kavanagh
Laura A. Cox
Michael Olivier
author_facet J. Michael Proffitt
Jeremy Glenn
Anthony J. Cesnik
Avinash Jadhav
Michael R. Shortreed
Lloyd M. Smith
Kylie Kavanagh
Laura A. Cox
Michael Olivier
author_sort J. Michael Proffitt
collection DOAJ
description Abstract Background Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue. Results We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate. Conclusion Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet.
first_indexed 2024-12-21T02:39:01Z
format Article
id doaj.art-17845563f9824873ac245b29ea628bd8
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-12-21T02:39:01Z
publishDate 2017-11-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-17845563f9824873ac245b29ea628bd82022-12-21T19:18:43ZengBMCBMC Genomics1471-21642017-11-0118111010.1186/s12864-017-4279-0Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeysJ. Michael Proffitt0Jeremy Glenn1Anthony J. Cesnik2Avinash Jadhav3Michael R. Shortreed4Lloyd M. Smith5Kylie Kavanagh6Laura A. Cox7Michael Olivier8Department of Genetics, Texas Biomedical Research InstituteDepartment of Genetics, Texas Biomedical Research InstituteDepartment of Chemistry, University of WisconsinDepartment of Genetics, Texas Biomedical Research InstituteDepartment of Chemistry, University of WisconsinDepartment of Chemistry, University of WisconsinDepartment of Pathology and Comparative Medicine, Wake Forest School of MedicineDepartment of Genetics, Texas Biomedical Research InstituteDepartment of Genetics, Texas Biomedical Research InstituteAbstract Background Shotgun proteomics utilizes a database search strategy to compare detected mass spectra to a library of theoretical spectra derived from reference genome information. As such, the robustness of proteomics results is contingent upon the completeness and accuracy of the gene annotation in the reference genome. For animal models of disease where genomic annotation is incomplete, such as non-human primates, proteogenomic methods can improve the detection of proteins by incorporating transcriptional data from RNA-Seq to improve proteomics search databases used for peptide spectral matching. Customized search databases derived from RNA-Seq data are capable of identifying unannotated genetic and splice variants while simultaneously reducing the number of comparisons to only those transcripts actively expressed in the tissue. Results We collected RNA-Seq and proteomic data from 10 vervet monkey liver samples and used the RNA-Seq data to curate sample-specific search databases which were analyzed in the program Morpheus. We compared these results against those from a search database generated from the reference vervet genome. A total of 284 previously unannotated splice junctions were predicted by the RNA-Seq data, 92 of which were confirmed by peptide spectral matches. More than half (53/92) of these unannotated splice variants had orthologs in other non-human primates, suggesting that failure to match these peptides in the reference analyses likely arose from incomplete gene model information. The sample-specific databases also identified 101 unique peptides containing single amino acid substitutions which were missed by the reference database. Because the sample-specific searches were restricted to actively expressed transcripts, the search databases were smaller, more computationally efficient, and identified more peptides at the empirically derived 1 % false discovery rate. Conclusion Proteogenomic approaches are ideally suited to facilitate the discovery and annotation of proteins in less widely studies animal models such as non-human primates. We expect that these approaches will help to improve existing genome annotations of non-human primate species such as vervet.http://link.springer.com/article/10.1186/s12864-017-4279-0ProteogenomicsProteomicsLiverVervetRNA-SeqMorpheus
spellingShingle J. Michael Proffitt
Jeremy Glenn
Anthony J. Cesnik
Avinash Jadhav
Michael R. Shortreed
Lloyd M. Smith
Kylie Kavanagh
Laura A. Cox
Michael Olivier
Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
BMC Genomics
Proteogenomics
Proteomics
Liver
Vervet
RNA-Seq
Morpheus
title Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_full Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_fullStr Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_full_unstemmed Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_short Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys
title_sort proteomics in non human primates utilizing rna seq data to improve protein identification by mass spectrometry in vervet monkeys
topic Proteogenomics
Proteomics
Liver
Vervet
RNA-Seq
Morpheus
url http://link.springer.com/article/10.1186/s12864-017-4279-0
work_keys_str_mv AT jmichaelproffitt proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT jeremyglenn proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT anthonyjcesnik proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT avinashjadhav proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT michaelrshortreed proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT lloydmsmith proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT kyliekavanagh proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT lauraacox proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys
AT michaelolivier proteomicsinnonhumanprimatesutilizingrnaseqdatatoimproveproteinidentificationbymassspectrometryinvervetmonkeys