Transcript assembly and annotations: Bias and adjustment.

Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to signi...

Full description

Bibliographic Details
Main Authors: Qimin Zhang, Mingfu Shao
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-12-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1011734
_version_ 1797311527235092480
author Qimin Zhang
Mingfu Shao
author_facet Qimin Zhang
Mingfu Shao
author_sort Qimin Zhang
collection DOAJ
description Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
first_indexed 2024-03-08T01:59:59Z
format Article
id doaj.art-9affb20fcd454252a08f86d85045e38f
institution Directory Open Access Journal
issn 1553-734X
1553-7358
language English
last_indexed 2024-03-08T01:59:59Z
publishDate 2023-12-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj.art-9affb20fcd454252a08f86d85045e38f2024-02-14T05:31:22ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582023-12-011912e101173410.1371/journal.pcbi.1011734Transcript assembly and annotations: Bias and adjustment.Qimin ZhangMingfu ShaoTranscript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.https://doi.org/10.1371/journal.pcbi.1011734
spellingShingle Qimin Zhang
Mingfu Shao
Transcript assembly and annotations: Bias and adjustment.
PLoS Computational Biology
title Transcript assembly and annotations: Bias and adjustment.
title_full Transcript assembly and annotations: Bias and adjustment.
title_fullStr Transcript assembly and annotations: Bias and adjustment.
title_full_unstemmed Transcript assembly and annotations: Bias and adjustment.
title_short Transcript assembly and annotations: Bias and adjustment.
title_sort transcript assembly and annotations bias and adjustment
url https://doi.org/10.1371/journal.pcbi.1011734
work_keys_str_mv AT qiminzhang transcriptassemblyandannotationsbiasandadjustment
AT mingfushao transcriptassemblyandannotationsbiasandadjustment