DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies

Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for min...

Full description

Bibliographic Details
Main Authors: Mikhail Biryukov, Kirill Ustyantsev
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Genes
Subjects:
Online Access:https://www.mdpi.com/2073-4425/13/1/9
_version_ 1797493842565398528
author Mikhail Biryukov
Kirill Ustyantsev
author_facet Mikhail Biryukov
Kirill Ustyantsev
author_sort Mikhail Biryukov
collection DOAJ
description Retrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for mining transposable elements (TEs) are designed for subsequent genome repeat masking, often leaving aside the element lineage classification and its protein domain composition. Additionally, studies focused on the diversity and evolution of a particular group of retrotransposons often require substantial customization efforts from researchers to adapt existing software to their needs. Here, we developed a computational pipeline to mine sequences of protein-coding retrotransposons based on the sequences of their conserved protein domains—DARTS (Domain-Associated Retrotransposon Search). Using the most abundant group of TEs in plants—long terminal repeat (LTR) retrotransposons (LTR-RTs)—we show that DARTS has radically higher sensitivity for LTR-RT identification compared to the widely accepted tool LTRharvest. DARTS can be easily customized for specific user needs. As a result, DARTS returns a set of structurally annotated nucleotide and amino acid sequences which can be readily used in subsequent comparative and phylogenetic analyses. DARTS may facilitate researchers interested in the discovery and detailed analysis of the diversity and evolution of retrotransposons, LTR-RTs, and other protein-coding TEs.
first_indexed 2024-03-10T01:25:46Z
format Article
id doaj.art-554d03dee09648cb917a4a859b23a528
institution Directory Open Access Journal
issn 2073-4425
language English
last_indexed 2024-03-10T01:25:46Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Genes
spelling doaj.art-554d03dee09648cb917a4a859b23a5282023-11-23T13:50:50ZengMDPI AGGenes2073-44252021-12-01131910.3390/genes13010009DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome AssembliesMikhail Biryukov0Kirill Ustyantsev1Sector of Molecular and Genetic Mechanisms of Regeneration, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, RussiaSector of Molecular and Genetic Mechanisms of Regeneration, Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, RussiaRetrotransposons comprise a substantial fraction of eukaryotic genomes, reaching the highest proportions in plants. Therefore, identification and annotation of retrotransposons is an important task in studying the regulation and evolution of plant genomes. The majority of computational tools for mining transposable elements (TEs) are designed for subsequent genome repeat masking, often leaving aside the element lineage classification and its protein domain composition. Additionally, studies focused on the diversity and evolution of a particular group of retrotransposons often require substantial customization efforts from researchers to adapt existing software to their needs. Here, we developed a computational pipeline to mine sequences of protein-coding retrotransposons based on the sequences of their conserved protein domains—DARTS (Domain-Associated Retrotransposon Search). Using the most abundant group of TEs in plants—long terminal repeat (LTR) retrotransposons (LTR-RTs)—we show that DARTS has radically higher sensitivity for LTR-RT identification compared to the widely accepted tool LTRharvest. DARTS can be easily customized for specific user needs. As a result, DARTS returns a set of structurally annotated nucleotide and amino acid sequences which can be readily used in subsequent comparative and phylogenetic analyses. DARTS may facilitate researchers interested in the discovery and detailed analysis of the diversity and evolution of retrotransposons, LTR-RTs, and other protein-coding TEs.https://www.mdpi.com/2073-4425/13/1/9LTR retrotransposonsretroelementsdomain annotationsoftwareautomatic pipeline
spellingShingle Mikhail Biryukov
Kirill Ustyantsev
DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
Genes
LTR retrotransposons
retroelements
domain annotation
software
automatic pipeline
title DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_full DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_fullStr DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_full_unstemmed DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_short DARTS: An Algorithm for Domain-Associated Retrotransposon Search in Genome Assemblies
title_sort darts an algorithm for domain associated retrotransposon search in genome assemblies
topic LTR retrotransposons
retroelements
domain annotation
software
automatic pipeline
url https://www.mdpi.com/2073-4425/13/1/9
work_keys_str_mv AT mikhailbiryukov dartsanalgorithmfordomainassociatedretrotransposonsearchingenomeassemblies
AT kirillustyantsev dartsanalgorithmfordomainassociatedretrotransposonsearchingenomeassemblies