Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies...

Full description

Bibliographic Details
Main Authors: Martin, A, Atkinson, E, Chapman, S, Stevenson, A, Stroud, R, Abebe, T, Akena, D, Alemayehu, M, Ashaba, F, Atwoli, L, Bowers, T, Chibnik, L, Daly, M, DeSmet, T, Dodge, S, Fekadu, A, Ferriera, S, Gelaye, B, Gichuru, S, Injera, W, James, R, Kariuki, S, Kigen, G, Koenen, K, Kwobah, E, Kyebuzibwa, J, Majara, L, Musinguzi, H, Mwema, R, Neale, B, Newman, C, Newton, C, Pickrell, J, Ramesar, R, Shiferaw, W, Stein, D, Teferra, S, van der Merwe, C, Zingela, Z, NeuroGAP-Psychosis Consortium
Format: Internet publication
Language:English
Published: biorxiv 2020
_version_ 1817931575470325760
author Martin, A
Atkinson, E
Chapman, S
Stevenson, A
Stroud, R
Abebe, T
Akena, D
Alemayehu, M
Ashaba, F
Atwoli, L
Bowers, T
Chibnik, L
Daly, M
DeSmet, T
Dodge, S
Fekadu, A
Ferriera, S
Gelaye, B
Gichuru, S
Injera, W
James, R
Kariuki, S
Kigen, G
Koenen, K
Kwobah, E
Kyebuzibwa, J
Majara, L
Musinguzi, H
Mwema, R
Neale, B
Newman, C
Newton, C
Pickrell, J
Ramesar, R
Shiferaw, W
Stein, D
Teferra, S
van der Merwe, C
Zingela, Z
NeuroGAP-Psychosis Consortium
author_facet Martin, A
Atkinson, E
Chapman, S
Stevenson, A
Stroud, R
Abebe, T
Akena, D
Alemayehu, M
Ashaba, F
Atwoli, L
Bowers, T
Chibnik, L
Daly, M
DeSmet, T
Dodge, S
Fekadu, A
Ferriera, S
Gelaye, B
Gichuru, S
Injera, W
James, R
Kariuki, S
Kigen, G
Koenen, K
Kwobah, E
Kyebuzibwa, J
Majara, L
Musinguzi, H
Mwema, R
Neale, B
Newman, C
Newton, C
Pickrell, J
Ramesar, R
Shiferaw, W
Stein, D
Teferra, S
van der Merwe, C
Zingela, Z
NeuroGAP-Psychosis Consortium
author_sort Martin, A
collection OXFORD
description Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5–1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.
first_indexed 2024-12-09T03:24:12Z
format Internet publication
id oxford-uuid:a198e963-acf3-4524-b4f4-0effe2993602
institution University of Oxford
language English
last_indexed 2024-12-09T03:24:12Z
publishDate 2020
publisher biorxiv
record_format dspace
spelling oxford-uuid:a198e963-acf3-4524-b4f4-0effe29936022024-11-25T13:50:45ZLow-coverage sequencing cost-effectively detects known and novel variation in underrepresented populationsInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:a198e963-acf3-4524-b4f4-0effe2993602EnglishSymplectic Elementsbiorxiv2020Martin, AAtkinson, EChapman, SStevenson, AStroud, RAbebe, TAkena, DAlemayehu, MAshaba, FAtwoli, LBowers, TChibnik, LDaly, MDeSmet, TDodge, SFekadu, AFerriera, SGelaye, BGichuru, SInjera, WJames, RKariuki, SKigen, GKoenen, KKwobah, EKyebuzibwa, JMajara, LMusinguzi, HMwema, RNeale, BNewman, CNewton, CPickrell, JRamesar, RShiferaw, WStein, DTeferra, Svan der Merwe, CZingela, ZNeuroGAP-Psychosis ConsortiumGenetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5–1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.
spellingShingle Martin, A
Atkinson, E
Chapman, S
Stevenson, A
Stroud, R
Abebe, T
Akena, D
Alemayehu, M
Ashaba, F
Atwoli, L
Bowers, T
Chibnik, L
Daly, M
DeSmet, T
Dodge, S
Fekadu, A
Ferriera, S
Gelaye, B
Gichuru, S
Injera, W
James, R
Kariuki, S
Kigen, G
Koenen, K
Kwobah, E
Kyebuzibwa, J
Majara, L
Musinguzi, H
Mwema, R
Neale, B
Newman, C
Newton, C
Pickrell, J
Ramesar, R
Shiferaw, W
Stein, D
Teferra, S
van der Merwe, C
Zingela, Z
NeuroGAP-Psychosis Consortium
Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
title Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
title_full Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
title_fullStr Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
title_full_unstemmed Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
title_short Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
title_sort low coverage sequencing cost effectively detects known and novel variation in underrepresented populations
work_keys_str_mv AT martina lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT atkinsone lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT chapmans lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT stevensona lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT stroudr lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT abebet lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT akenad lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT alemayehum lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT ashabaf lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT atwolil lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT bowerst lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT chibnikl lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT dalym lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT desmett lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT dodges lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT fekadua lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT ferrieras lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT gelayeb lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT gichurus lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT injeraw lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT jamesr lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT kariukis lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT kigeng lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT koenenk lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT kwobahe lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT kyebuzibwaj lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT majaral lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT musinguzih lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT mwemar lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT nealeb lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT newmanc lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT newtonc lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT pickrellj lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT ramesarr lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT shiferaww lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT steind lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT teferras lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT vandermerwec lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT zingelaz lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations
AT neurogappsychosisconsortium lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations