Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism

Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the g...

Full description

Bibliographic Details
Main Authors: Le Bao, Daniel Elleder, Raunaq Malhotra, Michael DeGiorgio, Theodora Maravegias, Lindsay Horvath, Laura Carrel, Colin Gillin, Tomáš Hron, Helena Fábryová, David R. Hunter, Mary Poss
Format: Article
Language:English
Published: MDPI AG 2014-11-01
Series:Computation
Subjects:
Online Access:http://www.mdpi.com/2079-3197/2/4/221
_version_ 1811276846218608640
author Le Bao
Daniel Elleder
Raunaq Malhotra
Michael DeGiorgio
Theodora Maravegias
Lindsay Horvath
Laura Carrel
Colin Gillin
Tomáš Hron
Helena Fábryová
David R. Hunter
Mary Poss
author_facet Le Bao
Daniel Elleder
Raunaq Malhotra
Michael DeGiorgio
Theodora Maravegias
Lindsay Horvath
Laura Carrel
Colin Gillin
Tomáš Hron
Helena Fábryová
David R. Hunter
Mary Poss
author_sort Le Bao
collection DOAJ
description Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species’ genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.
first_indexed 2024-04-13T00:05:11Z
format Article
id doaj.art-c8d23ac26268428fa8b85dc08ced9921
institution Directory Open Access Journal
issn 2079-3197
language English
last_indexed 2024-04-13T00:05:11Z
publishDate 2014-11-01
publisher MDPI AG
record_format Article
series Computation
spelling doaj.art-c8d23ac26268428fa8b85dc08ced99212022-12-22T03:11:15ZengMDPI AGComputation2079-31972014-11-012422124510.3390/computation2040221computation2040221Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model OrganismLe Bao0Daniel Elleder1Raunaq Malhotra2Michael DeGiorgio3Theodora Maravegias4Lindsay Horvath5Laura Carrel6Colin Gillin7Tomáš Hron8Helena Fábryová9David R. Hunter10Mary Poss11Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USAInstitute of Molecular Genetics, Academy of Sciences of the Czech Republic, Videnska 1083, Prague, Czech RepublicDepartment of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USADepartment of Biology, The Pennsylvania State University, University Park, PA 16802, USADepartment of Biology, The Pennsylvania State University, University Park, PA 16802, USADepartment of Pathology, Johns Hopkins University, Baltimore, MD 21287, USADepartment of Biochemistry and Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USADepartment of Fish and Wildlife, 4034 Fairview Industrial Dr. S., Salem, OR 97302, USAInstitute of Molecular Genetics, Academy of Sciences of the Czech Republic, Videnska 1083, Prague, Czech RepublicInstitute of Molecular Genetics, Academy of Sciences of the Czech Republic, Videnska 1083, Prague, Czech RepublicDepartment of Statistics, The Pennsylvania State University, University Park, PA 16802, USADepartment of Biology, The Pennsylvania State University, University Park, PA 16802, USAEndogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species’ genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations.http://www.mdpi.com/2079-3197/2/4/221endogenous retrovirusinsertional polymorphismmixture modelsde novo clusteringmule deerpopulation history
spellingShingle Le Bao
Daniel Elleder
Raunaq Malhotra
Michael DeGiorgio
Theodora Maravegias
Lindsay Horvath
Laura Carrel
Colin Gillin
Tomáš Hron
Helena Fábryová
David R. Hunter
Mary Poss
Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
Computation
endogenous retrovirus
insertional polymorphism
mixture models
de novo clustering
mule deer
population history
title Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
title_full Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
title_fullStr Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
title_full_unstemmed Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
title_short Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
title_sort computational and statistical analyses of insertional polymorphic endogenous retroviruses in a non model organism
topic endogenous retrovirus
insertional polymorphism
mixture models
de novo clustering
mule deer
population history
url http://www.mdpi.com/2079-3197/2/4/221
work_keys_str_mv AT lebao computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT danielelleder computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT raunaqmalhotra computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT michaeldegiorgio computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT theodoramaravegias computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT lindsayhorvath computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT lauracarrel computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT colingillin computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT tomashron computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT helenafabryova computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT davidrhunter computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism
AT maryposs computationalandstatisticalanalysesofinsertionalpolymorphicendogenousretrovirusesinanonmodelorganism