Analysis of complex genetic variation using population reference graphs

<p>In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variat...

Olles dieđut

Bibliográfalaš dieđut
Váldodahkki: Maciuca, S
Eará dahkkit: McVean, G
Materiálatiipa: Oahppočájánas
Giella:English
Almmustuhtton: 2017
_version_ 1826301606911541248
author Maciuca, S
author2 McVean, G
author_facet McVean, G
Maciuca, S
author_sort Maciuca, S
collection OXFORD
description <p>In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variation within a species. In Chapter 3, I propose a novel data structure that extends the traditional reference genome with known variants, providing a compressed representation of genetic diversity. I present algorithms to match sequencing reads to this extended reference structure and infer a personalised reference genome within close genetic distance from the sample under analysis. Coupled with existing variant calling tools, this personalised reference confers increased power to detect complex variants in diverse regions, compared to the traditional reference genome. In Chapter 4, I evaluate the performance of the method on simulated data and show that it is viable for megabase-sized genomes such as the malaria parasite Plasmodium falciparum -- a typical sample can be analysed in 5.7h on a single CPU, using a small amount of memory. I suggest a number of future optimisations to improve computational efficiency.</p> <p>In Chapter 5, I apply my method to 1300 Plasmodium falciparum samples from across the world and study two hyper-diverse genes that encode surface antigens in Plasmodium falciparum. I show that the personalised references recover variants of these genes that are missed by standard techniques of mapping reads to the traditional reference genome. Next, I build the first global variation catalogue incorporating dimorphic alleles of a region of functional interest and study their frequency patterns.</p>
first_indexed 2024-03-07T05:34:56Z
format Thesis
id oxford-uuid:e3944c5d-87ab-4d9c-8727-a83111e1f5da
institution University of Oxford
language English
last_indexed 2024-03-07T05:34:56Z
publishDate 2017
record_format dspace
spelling oxford-uuid:e3944c5d-87ab-4d9c-8727-a83111e1f5da2022-03-27T10:10:01ZAnalysis of complex genetic variation using population reference graphsThesishttp://purl.org/coar/resource_type/c_db06uuid:e3944c5d-87ab-4d9c-8727-a83111e1f5daEnglishORA Deposit2017Maciuca, SMcVean, GIqbal, Z<p>In this thesis, I study the problem of genome inference from short-read DNA sequencing data, with the goal of accurate characterisation of genomic regions with high sequence diversity. I describe a set of novel algorithms based on a generalised reference genome that captures sequence variation within a species. In Chapter 3, I propose a novel data structure that extends the traditional reference genome with known variants, providing a compressed representation of genetic diversity. I present algorithms to match sequencing reads to this extended reference structure and infer a personalised reference genome within close genetic distance from the sample under analysis. Coupled with existing variant calling tools, this personalised reference confers increased power to detect complex variants in diverse regions, compared to the traditional reference genome. In Chapter 4, I evaluate the performance of the method on simulated data and show that it is viable for megabase-sized genomes such as the malaria parasite Plasmodium falciparum -- a typical sample can be analysed in 5.7h on a single CPU, using a small amount of memory. I suggest a number of future optimisations to improve computational efficiency.</p> <p>In Chapter 5, I apply my method to 1300 Plasmodium falciparum samples from across the world and study two hyper-diverse genes that encode surface antigens in Plasmodium falciparum. I show that the personalised references recover variants of these genes that are missed by standard techniques of mapping reads to the traditional reference genome. Next, I build the first global variation catalogue incorporating dimorphic alleles of a region of functional interest and study their frequency patterns.</p>
spellingShingle Maciuca, S
Analysis of complex genetic variation using population reference graphs
title Analysis of complex genetic variation using population reference graphs
title_full Analysis of complex genetic variation using population reference graphs
title_fullStr Analysis of complex genetic variation using population reference graphs
title_full_unstemmed Analysis of complex genetic variation using population reference graphs
title_short Analysis of complex genetic variation using population reference graphs
title_sort analysis of complex genetic variation using population reference graphs
work_keys_str_mv AT maciucas analysisofcomplexgeneticvariationusingpopulationreferencegraphs