Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor

Abstract Accurate annotation of genes and transposable elements (TEs) is vital for understanding genomes, but current annotation pipelines often misannotate TEs as genes. This study reveals how the general transcription factor II-I repeat domain-containing protein 2 (GTF2IRD2) erroneously annotated...

Full description

Bibliographic Details
Main Authors: Nozhat T. Hassan, David L. Adelson
Format: Article
Language:English
Published: BMC 2023-11-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-023-03102-9
_version_ 1797559373641285632
author Nozhat T. Hassan
David L. Adelson
author_facet Nozhat T. Hassan
David L. Adelson
author_sort Nozhat T. Hassan
collection DOAJ
description Abstract Accurate annotation of genes and transposable elements (TEs) is vital for understanding genomes, but current annotation pipelines often misannotate TEs as genes. This study reveals how the general transcription factor II-I repeat domain-containing protein 2 (GTF2IRD2) erroneously annotated DNA transposons in non-mammalian species, as it contains a 3′ fused hAT transposase domain. We also demonstrate the generality of this problem by identifying misannotated TEs as genes in other vertebrate genomes. Such misannotations can lead to errors in phylogenetic analyses and wasted time for investigators. The study proposes adding a final TE-check to gene annotation pipelines to mitigate this problem.
first_indexed 2024-03-10T17:44:27Z
format Article
id doaj.art-ffd46fdf531d4ed292c4d4953c9cda7c
institution Directory Open Access Journal
issn 1474-760X
language English
last_indexed 2024-03-10T17:44:27Z
publishDate 2023-11-01
publisher BMC
record_format Article
series Genome Biology
spelling doaj.art-ffd46fdf531d4ed292c4d4953c9cda7c2023-11-20T09:35:21ZengBMCGenome Biology1474-760X2023-11-012411810.1186/s13059-023-03102-9Fake IDs? Widespread misannotation of DNA transposons as a general transcription factorNozhat T. Hassan0David L. Adelson1School of Biological Sciences, University of AdelaideSchool of Biological Sciences, University of AdelaideAbstract Accurate annotation of genes and transposable elements (TEs) is vital for understanding genomes, but current annotation pipelines often misannotate TEs as genes. This study reveals how the general transcription factor II-I repeat domain-containing protein 2 (GTF2IRD2) erroneously annotated DNA transposons in non-mammalian species, as it contains a 3′ fused hAT transposase domain. We also demonstrate the generality of this problem by identifying misannotated TEs as genes in other vertebrate genomes. Such misannotations can lead to errors in phylogenetic analyses and wasted time for investigators. The study proposes adding a final TE-check to gene annotation pipelines to mitigate this problem.https://doi.org/10.1186/s13059-023-03102-9Transposable elementGenomeAnnotationTranscription factorGTF2DNA transposon
spellingShingle Nozhat T. Hassan
David L. Adelson
Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor
Genome Biology
Transposable element
Genome
Annotation
Transcription factor
GTF2
DNA transposon
title Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor
title_full Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor
title_fullStr Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor
title_full_unstemmed Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor
title_short Fake IDs? Widespread misannotation of DNA transposons as a general transcription factor
title_sort fake ids widespread misannotation of dna transposons as a general transcription factor
topic Transposable element
Genome
Annotation
Transcription factor
GTF2
DNA transposon
url https://doi.org/10.1186/s13059-023-03102-9
work_keys_str_mv AT nozhatthassan fakeidswidespreadmisannotationofdnatransposonsasageneraltranscriptionfactor
AT davidladelson fakeidswidespreadmisannotationofdnatransposonsasageneraltranscriptionfactor