Transposable element subfamily annotation has a reproducibility problem

Abstract Background Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common su...

Full description

Bibliographic Details
Main Authors: Kaitlin M. Carey, Gilia Patterson, Travis J. Wheeler
Format: Article
Language:English
Published: BMC 2021-01-01
Series:Mobile DNA
Subjects:
Online Access:https://doi.org/10.1186/s13100-021-00232-4
_version_ 1818822751772737536
author Kaitlin M. Carey
Gilia Patterson
Travis J. Wheeler
author_facet Kaitlin M. Carey
Gilia Patterson
Travis J. Wheeler
author_sort Kaitlin M. Carey
collection DOAJ
description Abstract Background Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. Results We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. Conclusions The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.
first_indexed 2024-12-18T23:29:04Z
format Article
id doaj.art-eb56fb5f89374c3080815a65ab3c9ba6
institution Directory Open Access Journal
issn 1759-8753
language English
last_indexed 2024-12-18T23:29:04Z
publishDate 2021-01-01
publisher BMC
record_format Article
series Mobile DNA
spelling doaj.art-eb56fb5f89374c3080815a65ab3c9ba62022-12-21T20:47:44ZengBMCMobile DNA1759-87532021-01-011211910.1186/s13100-021-00232-4Transposable element subfamily annotation has a reproducibility problemKaitlin M. Carey0Gilia Patterson1Travis J. Wheeler2Department of Computer Science, University of MontanaDepartment of Computer Science, University of MontanaDepartment of Computer Science, University of MontanaAbstract Background Transposable element (TE) sequences are classified into families based on the reconstructed history of replication, and into subfamilies based on more fine-grained features that are often intended to capture family history. We evaluate the reliability of annotation with common subfamilies by assessing the extent to which subfamily annotation is reproducible in replicate copies created by segmental duplications in the human genome, and in homologous copies shared by human and chimpanzee. Results We find that standard methods annotate over 10% of replicates as belonging to different subfamilies, despite the fact that they are expected to be annotated as belonging to the same subfamily. Point mutations and homologous recombination appear to be responsible for some of this discordant annotation (particularly in the young Alu family), but are unlikely to fully explain the annotation unreliability. Conclusions The surprisingly high level of disagreement in subfamily annotation of homologous sequences highlights a need for further research into definition of TE subfamilies, methods for representing subfamily annotation confidence of TE instances, and approaches to better utilizing such nuanced annotation data in downstream analysis.https://doi.org/10.1186/s13100-021-00232-4Transposable elementsInterspersed repeatsSubfamiliesSegmental duplications
spellingShingle Kaitlin M. Carey
Gilia Patterson
Travis J. Wheeler
Transposable element subfamily annotation has a reproducibility problem
Mobile DNA
Transposable elements
Interspersed repeats
Subfamilies
Segmental duplications
title Transposable element subfamily annotation has a reproducibility problem
title_full Transposable element subfamily annotation has a reproducibility problem
title_fullStr Transposable element subfamily annotation has a reproducibility problem
title_full_unstemmed Transposable element subfamily annotation has a reproducibility problem
title_short Transposable element subfamily annotation has a reproducibility problem
title_sort transposable element subfamily annotation has a reproducibility problem
topic Transposable elements
Interspersed repeats
Subfamilies
Segmental duplications
url https://doi.org/10.1186/s13100-021-00232-4
work_keys_str_mv AT kaitlinmcarey transposableelementsubfamilyannotationhasareproducibilityproblem
AT giliapatterson transposableelementsubfamilyannotationhasareproducibilityproblem
AT travisjwheeler transposableelementsubfamilyannotationhasareproducibilityproblem