Genome annotation and selectional analysis of viral evolution

In the past few years we have witnessed an explosion in the viral genomic data available. GenBank alone holds over 80,000 close to complete viral genomes, and numbers are rising fast. For example, since the submission of the first SARS genome in May 2003, over 140 more have been published. With this...

Full description

Bibliographic Details
Main Author: de Groot, S
Other Authors: Hein, J
Format: Thesis
Language:English
Published: 2008
Subjects:
_version_ 1797084506705887232
author de Groot, S
author2 Hein, J
author_facet Hein, J
de Groot, S
author_sort de Groot, S
collection OXFORD
description In the past few years we have witnessed an explosion in the viral genomic data available. GenBank alone holds over 80,000 close to complete viral genomes, and numbers are rising fast. For example, since the submission of the first SARS genome in May 2003, over 140 more have been published. With this genomic data at hand we hope to finally be able to improve our understanding of viruses. Several papers have been dedicated to the study of genome annotation and selection on viral genomes, in particular focusing attention on the evolutionary behaviour of overlapping reading frames. This is a feature common to viruses, where due to the three periodicity of the genetic code, up to three genes may be encoded simultaneously in one direction. The constraints placed on a nucleotide involved in such a multiple coding region will naturally have an effect on its mutational behaviour, and as a result the pattern of evolution will be more complex. Additionally, due to their fast evolution time, we observe changes in gene structure between viruses of the same family. Finally, as a result of this high divergence, alignments between two genomes will tend to be unreliable, thus complicating the issue of comparative analysis further. Our goal is to present methods which may deal with the above mentioned complications. We first introduce an ab initio pairwise comparative annotation method, which not only accounts for the presence of overlapping reading frames in genomes, but also for differences in gene structure between the two compared sequences. Secondly, we develop a hidden Markov model for the annotation of selection strengths across a viral genome accommodating for inter- as well as intragenic differences in selection. Thirdly, we investigate the effect of using a fixed alignment on the inference of selection by incorporating statistical alignment into our selection analysis. All three methods presented here improve on their respective equivalents in the field. We investigate the nature of selection in overlapping regions in several studies, in particular on the genomes of Hepatitis B and HIV2. We provide a full annotation of selection strengths on a nucleotide level for both viral sequences, highlighting fast evolving regions such as the gp120 protein. We also analyse the mutational behaviour of overlapping regions in both genomes and find that in Hepatitis B selection seems to be of equal strength for single and double coding regions. In HIV2, however, single coding regions appear to be under twice as stringent selection as double coding regions, with a tendency for a fast evolving region to overlap a slow evolving one. Each chapter of our work relates to one of our publications. We introduce in turn each method, its academic context and its results. We subsequently in chapter 5 discuss for each method its achievements, its shortcomings and future possible extensions and improvements to it.
first_indexed 2024-03-07T01:56:04Z
format Thesis
id oxford-uuid:9bc1f480-5556-4f44-8700-8c230a5dbda9
institution University of Oxford
language English
last_indexed 2024-03-07T01:56:04Z
publishDate 2008
record_format dspace
spelling oxford-uuid:9bc1f480-5556-4f44-8700-8c230a5dbda92022-03-27T00:30:57ZGenome annotation and selectional analysis of viral evolutionThesishttp://purl.org/coar/resource_type/c_db06uuid:9bc1f480-5556-4f44-8700-8c230a5dbda9Mathematical genetics and bioinformatics (statistics)EnglishOxford University Research Archive - Valet2008de Groot, SHein, JIn the past few years we have witnessed an explosion in the viral genomic data available. GenBank alone holds over 80,000 close to complete viral genomes, and numbers are rising fast. For example, since the submission of the first SARS genome in May 2003, over 140 more have been published. With this genomic data at hand we hope to finally be able to improve our understanding of viruses. Several papers have been dedicated to the study of genome annotation and selection on viral genomes, in particular focusing attention on the evolutionary behaviour of overlapping reading frames. This is a feature common to viruses, where due to the three periodicity of the genetic code, up to three genes may be encoded simultaneously in one direction. The constraints placed on a nucleotide involved in such a multiple coding region will naturally have an effect on its mutational behaviour, and as a result the pattern of evolution will be more complex. Additionally, due to their fast evolution time, we observe changes in gene structure between viruses of the same family. Finally, as a result of this high divergence, alignments between two genomes will tend to be unreliable, thus complicating the issue of comparative analysis further. Our goal is to present methods which may deal with the above mentioned complications. We first introduce an ab initio pairwise comparative annotation method, which not only accounts for the presence of overlapping reading frames in genomes, but also for differences in gene structure between the two compared sequences. Secondly, we develop a hidden Markov model for the annotation of selection strengths across a viral genome accommodating for inter- as well as intragenic differences in selection. Thirdly, we investigate the effect of using a fixed alignment on the inference of selection by incorporating statistical alignment into our selection analysis. All three methods presented here improve on their respective equivalents in the field. We investigate the nature of selection in overlapping regions in several studies, in particular on the genomes of Hepatitis B and HIV2. We provide a full annotation of selection strengths on a nucleotide level for both viral sequences, highlighting fast evolving regions such as the gp120 protein. We also analyse the mutational behaviour of overlapping regions in both genomes and find that in Hepatitis B selection seems to be of equal strength for single and double coding regions. In HIV2, however, single coding regions appear to be under twice as stringent selection as double coding regions, with a tendency for a fast evolving region to overlap a slow evolving one. Each chapter of our work relates to one of our publications. We introduce in turn each method, its academic context and its results. We subsequently in chapter 5 discuss for each method its achievements, its shortcomings and future possible extensions and improvements to it.
spellingShingle Mathematical genetics and bioinformatics (statistics)
de Groot, S
Genome annotation and selectional analysis of viral evolution
title Genome annotation and selectional analysis of viral evolution
title_full Genome annotation and selectional analysis of viral evolution
title_fullStr Genome annotation and selectional analysis of viral evolution
title_full_unstemmed Genome annotation and selectional analysis of viral evolution
title_short Genome annotation and selectional analysis of viral evolution
title_sort genome annotation and selectional analysis of viral evolution
topic Mathematical genetics and bioinformatics (statistics)
work_keys_str_mv AT degroots genomeannotationandselectionalanalysisofviralevolution