Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations
Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies...
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Internet publication |
Language: | English |
Published: |
biorxiv
2020
|
_version_ | 1817931575470325760 |
---|---|
author | Martin, A Atkinson, E Chapman, S Stevenson, A Stroud, R Abebe, T Akena, D Alemayehu, M Ashaba, F Atwoli, L Bowers, T Chibnik, L Daly, M DeSmet, T Dodge, S Fekadu, A Ferriera, S Gelaye, B Gichuru, S Injera, W James, R Kariuki, S Kigen, G Koenen, K Kwobah, E Kyebuzibwa, J Majara, L Musinguzi, H Mwema, R Neale, B Newman, C Newton, C Pickrell, J Ramesar, R Shiferaw, W Stein, D Teferra, S van der Merwe, C Zingela, Z NeuroGAP-Psychosis Consortium |
author_facet | Martin, A Atkinson, E Chapman, S Stevenson, A Stroud, R Abebe, T Akena, D Alemayehu, M Ashaba, F Atwoli, L Bowers, T Chibnik, L Daly, M DeSmet, T Dodge, S Fekadu, A Ferriera, S Gelaye, B Gichuru, S Injera, W James, R Kariuki, S Kigen, G Koenen, K Kwobah, E Kyebuzibwa, J Majara, L Musinguzi, H Mwema, R Neale, B Newman, C Newton, C Pickrell, J Ramesar, R Shiferaw, W Stein, D Teferra, S van der Merwe, C Zingela, Z NeuroGAP-Psychosis Consortium |
author_sort | Martin, A |
collection | OXFORD |
description | Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5–1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches. |
first_indexed | 2024-12-09T03:24:12Z |
format | Internet publication |
id | oxford-uuid:a198e963-acf3-4524-b4f4-0effe2993602 |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:24:12Z |
publishDate | 2020 |
publisher | biorxiv |
record_format | dspace |
spelling | oxford-uuid:a198e963-acf3-4524-b4f4-0effe29936022024-11-25T13:50:45ZLow-coverage sequencing cost-effectively detects known and novel variation in underrepresented populationsInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:a198e963-acf3-4524-b4f4-0effe2993602EnglishSymplectic Elementsbiorxiv2020Martin, AAtkinson, EChapman, SStevenson, AStroud, RAbebe, TAkena, DAlemayehu, MAshaba, FAtwoli, LBowers, TChibnik, LDaly, MDeSmet, TDodge, SFekadu, AFerriera, SGelaye, BGichuru, SInjera, WJames, RKariuki, SKigen, GKoenen, KKwobah, EKyebuzibwa, JMajara, LMusinguzi, HMwema, RNeale, BNewman, CNewton, CPickrell, JRamesar, RShiferaw, WStein, DTeferra, Svan der Merwe, CZingela, ZNeuroGAP-Psychosis ConsortiumGenetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5–1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches. |
spellingShingle | Martin, A Atkinson, E Chapman, S Stevenson, A Stroud, R Abebe, T Akena, D Alemayehu, M Ashaba, F Atwoli, L Bowers, T Chibnik, L Daly, M DeSmet, T Dodge, S Fekadu, A Ferriera, S Gelaye, B Gichuru, S Injera, W James, R Kariuki, S Kigen, G Koenen, K Kwobah, E Kyebuzibwa, J Majara, L Musinguzi, H Mwema, R Neale, B Newman, C Newton, C Pickrell, J Ramesar, R Shiferaw, W Stein, D Teferra, S van der Merwe, C Zingela, Z NeuroGAP-Psychosis Consortium Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations |
title | Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations |
title_full | Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations |
title_fullStr | Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations |
title_full_unstemmed | Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations |
title_short | Low-coverage sequencing cost-effectively detects known and novel variation in underrepresented populations |
title_sort | low coverage sequencing cost effectively detects known and novel variation in underrepresented populations |
work_keys_str_mv | AT martina lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT atkinsone lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT chapmans lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT stevensona lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT stroudr lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT abebet lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT akenad lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT alemayehum lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT ashabaf lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT atwolil lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT bowerst lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT chibnikl lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT dalym lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT desmett lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT dodges lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT fekadua lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT ferrieras lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT gelayeb lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT gichurus lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT injeraw lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT jamesr lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT kariukis lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT kigeng lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT koenenk lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT kwobahe lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT kyebuzibwaj lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT majaral lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT musinguzih lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT mwemar lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT nealeb lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT newmanc lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT newtonc lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT pickrellj lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT ramesarr lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT shiferaww lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT steind lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT teferras lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT vandermerwec lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT zingelaz lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations AT neurogappsychosisconsortium lowcoveragesequencingcosteffectivelydetectsknownandnovelvariationinunderrepresentedpopulations |