Statistical challenges in the detection of mutation and variation using high throughput sequencing

The aim of this thesis is to obtain a better understanding of mutation rates within as well as between the genomes of humans and chimpanzees using data generated by high throughput sequencers. I will start with a review of the field and an overview of the technologies and protocols used to generate...

Ful tanımlama

Detaylı Bibliyografya
Yazar: Pfeifer, S
Diğer Yazarlar: McVean, G
Materyal Türü: Tez
Dil:English
Baskı/Yayın Bilgisi: 2012
Konular:
_version_ 1826310042413957120
author Pfeifer, S
author2 McVean, G
author_facet McVean, G
Pfeifer, S
author_sort Pfeifer, S
collection OXFORD
description The aim of this thesis is to obtain a better understanding of mutation rates within as well as between the genomes of humans and chimpanzees using data generated by high throughput sequencers. I will start with a review of the field and an overview of the technologies and protocols used to generate and analyse high throughput sequencing data. I apply some of the discussed techniques to show that there is evidence of a selective advantage of pathogenic <em>de novo</em> mutations in the Fibroblast Growth Factor Receptor 3 gene in the male germ line of humans. Furthermore, I use some of the methods to generate a map of genome-wide sequence variation in Western chimpanzees. Ever since Darwin [Darwin, 1871] and Huxley [Huxley, 1863] postulated more than a century ago that African great apes are our closest living evolutionary relatives, the study of chimpanzee individuals is of great scientific interest from an evolutionary point of view, as comparisons between the genomes of human and chimpanzee offer the potential to help to understand the molecular basis for similarities and differences between the two species. I use the generated data to explore the breadth of the nucleotide diversity in the chimpanzee genome in order to shed light on whether or not the local variation in mutation rate has been conserved since the divergence of the two species and to place human nucleotide diversity into perspective with an evolutionary closely related species. I explore the relationship of nucleotide diversity in chimpanzees with specific large-scale genome features to reveal a number of highly significant correlations which explain over 40% of the observed variation. I use data from the 1000 Genomes Project to examine the occurrence of ancestral polymorphisms shared between human and chimpanzee on a genome-wide scale. These ancestral polymorphisms do not only influence fine-scale divergence rates across the genome in very closely related species, they are also good candidates for regions under balancing selection and thus, they are a useful tool to study long-time population demographics and speciation. Using these variants, I postulate that long-term balancing selection may be more common than previously believed. I conclude with a discussion of the results contained in the body of the thesis and suggest a number of areas for future research.
first_indexed 2024-03-07T07:44:39Z
format Thesis
id oxford-uuid:e49ce2fa-aa2c-42d7-bb54-c63e50d14afb
institution University of Oxford
language English
last_indexed 2024-03-07T07:44:39Z
publishDate 2012
record_format dspace
spelling oxford-uuid:e49ce2fa-aa2c-42d7-bb54-c63e50d14afb2023-05-24T15:09:43ZStatistical challenges in the detection of mutation and variation using high throughput sequencingThesishttp://purl.org/coar/resource_type/c_db06uuid:e49ce2fa-aa2c-42d7-bb54-c63e50d14afbGenetics (life sciences)Mathematical genetics and bioinformatics (statistics)Bioinformatics (life sciences)EnglishOxford University Research Archive - Valet2012Pfeifer, SMcVean, GThe aim of this thesis is to obtain a better understanding of mutation rates within as well as between the genomes of humans and chimpanzees using data generated by high throughput sequencers. I will start with a review of the field and an overview of the technologies and protocols used to generate and analyse high throughput sequencing data. I apply some of the discussed techniques to show that there is evidence of a selective advantage of pathogenic <em>de novo</em> mutations in the Fibroblast Growth Factor Receptor 3 gene in the male germ line of humans. Furthermore, I use some of the methods to generate a map of genome-wide sequence variation in Western chimpanzees. Ever since Darwin [Darwin, 1871] and Huxley [Huxley, 1863] postulated more than a century ago that African great apes are our closest living evolutionary relatives, the study of chimpanzee individuals is of great scientific interest from an evolutionary point of view, as comparisons between the genomes of human and chimpanzee offer the potential to help to understand the molecular basis for similarities and differences between the two species. I use the generated data to explore the breadth of the nucleotide diversity in the chimpanzee genome in order to shed light on whether or not the local variation in mutation rate has been conserved since the divergence of the two species and to place human nucleotide diversity into perspective with an evolutionary closely related species. I explore the relationship of nucleotide diversity in chimpanzees with specific large-scale genome features to reveal a number of highly significant correlations which explain over 40% of the observed variation. I use data from the 1000 Genomes Project to examine the occurrence of ancestral polymorphisms shared between human and chimpanzee on a genome-wide scale. These ancestral polymorphisms do not only influence fine-scale divergence rates across the genome in very closely related species, they are also good candidates for regions under balancing selection and thus, they are a useful tool to study long-time population demographics and speciation. Using these variants, I postulate that long-term balancing selection may be more common than previously believed. I conclude with a discussion of the results contained in the body of the thesis and suggest a number of areas for future research.
spellingShingle Genetics (life sciences)
Mathematical genetics and bioinformatics (statistics)
Bioinformatics (life sciences)
Pfeifer, S
Statistical challenges in the detection of mutation and variation using high throughput sequencing
title Statistical challenges in the detection of mutation and variation using high throughput sequencing
title_full Statistical challenges in the detection of mutation and variation using high throughput sequencing
title_fullStr Statistical challenges in the detection of mutation and variation using high throughput sequencing
title_full_unstemmed Statistical challenges in the detection of mutation and variation using high throughput sequencing
title_short Statistical challenges in the detection of mutation and variation using high throughput sequencing
title_sort statistical challenges in the detection of mutation and variation using high throughput sequencing
topic Genetics (life sciences)
Mathematical genetics and bioinformatics (statistics)
Bioinformatics (life sciences)
work_keys_str_mv AT pfeifers statisticalchallengesinthedetectionofmutationandvariationusinghighthroughputsequencing