The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis

The increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterizat...

Full description

Bibliographic Details
Main Authors: Lara García-Campa, Luis Valledor, Jesús Pascual
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Plants
Subjects:
Online Access:https://www.mdpi.com/2223-7747/12/3/511
_version_ 1827759638410428416
author Lara García-Campa
Luis Valledor
Jesús Pascual
author_facet Lara García-Campa
Luis Valledor
Jesús Pascual
author_sort Lara García-Campa
collection DOAJ
description The increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterization, which is a main ongoing goal in biology. However, the potential of including Oxford Nanopore Technologies Direct RNA Sequencing (ONT-DRS) data has not been explored. In this paper, we analyzed the impact of combining Iso-Seq- and ONT-DRS-derived data on the identification of proteoforms in Arabidopsis MS proteomics data. To this end, we selected a proteomics dataset corresponding to senescent leaves and we performed protein searches using three different protein databases: AtRTD2 and AtRTD3, built from the homonymous transcriptomes, regarded as the most complete and up-to-date available for the species; and a custom hybrid database combining AtRTD3 with publicly available ONT-DRS transcriptomics data generated from Arabidopsis leaves. Our results show that the inclusion and combination of long-read sequencing data from Iso-Seq and ONT-DRS into a proteogenomic workflow enhances proteoform characterization and discovery in bottom-up proteomics studies. This represents a great opportunity to further investigate biological systems at an unprecedented scale, although it brings challenges to current protein searching algorithms.
first_indexed 2024-03-11T09:29:34Z
format Article
id doaj.art-95a9f01c90884b1e8c42156c86ddcfdc
institution Directory Open Access Journal
issn 2223-7747
language English
last_indexed 2024-03-11T09:29:34Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Plants
spelling doaj.art-95a9f01c90884b1e8c42156c86ddcfdc2023-11-16T17:43:25ZengMDPI AGPlants2223-77472023-01-0112351110.3390/plants12030511The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in ArabidopsisLara García-Campa0Luis Valledor1Jesús Pascual2Plant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, SpainPlant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, SpainPlant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, SpainThe increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterization, which is a main ongoing goal in biology. However, the potential of including Oxford Nanopore Technologies Direct RNA Sequencing (ONT-DRS) data has not been explored. In this paper, we analyzed the impact of combining Iso-Seq- and ONT-DRS-derived data on the identification of proteoforms in Arabidopsis MS proteomics data. To this end, we selected a proteomics dataset corresponding to senescent leaves and we performed protein searches using three different protein databases: AtRTD2 and AtRTD3, built from the homonymous transcriptomes, regarded as the most complete and up-to-date available for the species; and a custom hybrid database combining AtRTD3 with publicly available ONT-DRS transcriptomics data generated from Arabidopsis leaves. Our results show that the inclusion and combination of long-read sequencing data from Iso-Seq and ONT-DRS into a proteogenomic workflow enhances proteoform characterization and discovery in bottom-up proteomics studies. This represents a great opportunity to further investigate biological systems at an unprecedented scale, although it brings challenges to current protein searching algorithms.https://www.mdpi.com/2223-7747/12/3/511proteogenomicslong-readsequencingnanoporePacBioprotein database
spellingShingle Lara García-Campa
Luis Valledor
Jesús Pascual
The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
Plants
proteogenomics
long-read
sequencing
nanopore
PacBio
protein database
title The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
title_full The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
title_fullStr The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
title_full_unstemmed The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
title_short The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
title_sort integration of data from different long read sequencing platforms enhances proteoform characterization in arabidopsis
topic proteogenomics
long-read
sequencing
nanopore
PacBio
protein database
url https://www.mdpi.com/2223-7747/12/3/511
work_keys_str_mv AT laragarciacampa theintegrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis
AT luisvalledor theintegrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis
AT jesuspascual theintegrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis
AT laragarciacampa integrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis
AT luisvalledor integrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis
AT jesuspascual integrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis