Analysis of complex genetic variation using population reference graphs
<p>In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variat...
מחבר ראשי: | |
---|---|
מחברים אחרים: | |
פורמט: | Thesis |
שפה: | English |
יצא לאור: |
2017
|
_version_ | 1826301606911541248 |
---|---|
author | Maciuca, S |
author2 | McVean, G |
author_facet | McVean, G Maciuca, S |
author_sort | Maciuca, S |
collection | OXFORD |
description | <p>In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variation within a species. In Chapter 3, I propose a novel data structure that extends the traditional reference genome with known variants, providing a compressed representation of genetic diversity. I present algorithms to match sequencing reads to this extended reference structure and infer a personalised reference genome within close genetic distance from the sample under analysis. Coupled with existing variant calling tools, this personalised reference confers increased power to detect complex variants in diverse regions, compared to the traditional reference genome. In Chapter 4, I evaluate the performance of the method on simulated data and show that it is viable for megabase-sized genomes such as the malaria parasite Plasmodium falciparum -- a typical sample can be analysed in 5.7h on a single CPU, using a small amount of memory. I suggest a number of future optimisations to improve computational efficiency.</p>
<p>In Chapter 5, I apply my method to 1300 Plasmodium falciparum samples from across the world and study two hyper-diverse genes that encode surface antigens in Plasmodium falciparum. I show that the personalised references recover variants of these genes that are missed by standard techniques of mapping reads to the traditional reference genome. Next, I build the first global variation catalogue incorporating dimorphic alleles of a region of functional interest and study their frequency patterns.</p> |
first_indexed | 2024-03-07T05:34:56Z |
format | Thesis |
id | oxford-uuid:e3944c5d-87ab-4d9c-8727-a83111e1f5da |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T05:34:56Z |
publishDate | 2017 |
record_format | dspace |
spelling | oxford-uuid:e3944c5d-87ab-4d9c-8727-a83111e1f5da2022-03-27T10:10:01ZAnalysis of complex genetic variation using population reference graphsThesishttp://purl.org/coar/resource_type/c_db06uuid:e3944c5d-87ab-4d9c-8727-a83111e1f5daEnglishORA Deposit2017Maciuca, SMcVean, GIqbal, Z<p>In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variation within a species. In Chapter 3, I propose a novel data structure that extends the traditional reference genome with known variants, providing a compressed representation of genetic diversity. I present algorithms to match sequencing reads to this extended reference structure and infer a personalised reference genome within close genetic distance from the sample under analysis. Coupled with existing variant calling tools, this personalised reference confers increased power to detect complex variants in diverse regions, compared to the traditional reference genome. In Chapter 4, I evaluate the performance of the method on simulated data and show that it is viable for megabase-sized genomes such as the malaria parasite Plasmodium falciparum -- a typical sample can be analysed in 5.7h on a single CPU, using a small amount of memory. I suggest a number of future optimisations to improve computational efficiency.</p> <p>In Chapter 5, I apply my method to 1300 Plasmodium falciparum samples from across the world and study two hyper-diverse genes that encode surface antigens in Plasmodium falciparum. I show that the personalised references recover variants of these genes that are missed by standard techniques of mapping reads to the traditional reference genome. Next, I build the first global variation catalogue incorporating dimorphic alleles of a region of functional interest and study their frequency patterns.</p> |
spellingShingle | Maciuca, S Analysis of complex genetic variation using population reference graphs |
title | Analysis of complex genetic variation using population reference graphs |
title_full | Analysis of complex genetic variation using population reference graphs |
title_fullStr | Analysis of complex genetic variation using population reference graphs |
title_full_unstemmed | Analysis of complex genetic variation using population reference graphs |
title_short | Analysis of complex genetic variation using population reference graphs |
title_sort | analysis of complex genetic variation using population reference graphs |
work_keys_str_mv | AT maciucas analysisofcomplexgeneticvariationusingpopulationreferencegraphs |