The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis
The increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterizat...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Plants |
Subjects: | |
Online Access: | https://www.mdpi.com/2223-7747/12/3/511 |
_version_ | 1827759638410428416 |
---|---|
author | Lara García-Campa Luis Valledor Jesús Pascual |
author_facet | Lara García-Campa Luis Valledor Jesús Pascual |
author_sort | Lara García-Campa |
collection | DOAJ |
description | The increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterization, which is a main ongoing goal in biology. However, the potential of including Oxford Nanopore Technologies Direct RNA Sequencing (ONT-DRS) data has not been explored. In this paper, we analyzed the impact of combining Iso-Seq- and ONT-DRS-derived data on the identification of proteoforms in Arabidopsis MS proteomics data. To this end, we selected a proteomics dataset corresponding to senescent leaves and we performed protein searches using three different protein databases: AtRTD2 and AtRTD3, built from the homonymous transcriptomes, regarded as the most complete and up-to-date available for the species; and a custom hybrid database combining AtRTD3 with publicly available ONT-DRS transcriptomics data generated from Arabidopsis leaves. Our results show that the inclusion and combination of long-read sequencing data from Iso-Seq and ONT-DRS into a proteogenomic workflow enhances proteoform characterization and discovery in bottom-up proteomics studies. This represents a great opportunity to further investigate biological systems at an unprecedented scale, although it brings challenges to current protein searching algorithms. |
first_indexed | 2024-03-11T09:29:34Z |
format | Article |
id | doaj.art-95a9f01c90884b1e8c42156c86ddcfdc |
institution | Directory Open Access Journal |
issn | 2223-7747 |
language | English |
last_indexed | 2024-03-11T09:29:34Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Plants |
spelling | doaj.art-95a9f01c90884b1e8c42156c86ddcfdc2023-11-16T17:43:25ZengMDPI AGPlants2223-77472023-01-0112351110.3390/plants12030511The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in ArabidopsisLara García-Campa0Luis Valledor1Jesús Pascual2Plant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, SpainPlant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, SpainPlant Physiology, Department of Organisms and Systems Biology, University of Oviedo, 33003 Oviedo, SpainThe increasing availability of massive omics data requires improving the quality of reference databases and their annotations. The combination of full-length isoform sequencing (Iso-Seq) with short-read transcriptomics and proteomics has been successfully used for increasing proteoform characterization, which is a main ongoing goal in biology. However, the potential of including Oxford Nanopore Technologies Direct RNA Sequencing (ONT-DRS) data has not been explored. In this paper, we analyzed the impact of combining Iso-Seq- and ONT-DRS-derived data on the identification of proteoforms in Arabidopsis MS proteomics data. To this end, we selected a proteomics dataset corresponding to senescent leaves and we performed protein searches using three different protein databases: AtRTD2 and AtRTD3, built from the homonymous transcriptomes, regarded as the most complete and up-to-date available for the species; and a custom hybrid database combining AtRTD3 with publicly available ONT-DRS transcriptomics data generated from Arabidopsis leaves. Our results show that the inclusion and combination of long-read sequencing data from Iso-Seq and ONT-DRS into a proteogenomic workflow enhances proteoform characterization and discovery in bottom-up proteomics studies. This represents a great opportunity to further investigate biological systems at an unprecedented scale, although it brings challenges to current protein searching algorithms.https://www.mdpi.com/2223-7747/12/3/511proteogenomicslong-readsequencingnanoporePacBioprotein database |
spellingShingle | Lara García-Campa Luis Valledor Jesús Pascual The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis Plants proteogenomics long-read sequencing nanopore PacBio protein database |
title | The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis |
title_full | The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis |
title_fullStr | The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis |
title_full_unstemmed | The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis |
title_short | The Integration of Data from Different Long-Read Sequencing Platforms Enhances Proteoform Characterization in Arabidopsis |
title_sort | integration of data from different long read sequencing platforms enhances proteoform characterization in arabidopsis |
topic | proteogenomics long-read sequencing nanopore PacBio protein database |
url | https://www.mdpi.com/2223-7747/12/3/511 |
work_keys_str_mv | AT laragarciacampa theintegrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis AT luisvalledor theintegrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis AT jesuspascual theintegrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis AT laragarciacampa integrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis AT luisvalledor integrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis AT jesuspascual integrationofdatafromdifferentlongreadsequencingplatformsenhancesproteoformcharacterizationinarabidopsis |