Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation

Abstract The short lengths of short-read sequencing reads challenge the analysis of paralogous genomic regions in exome and genome sequencing data. Most genetic variants within these homologous regions therefore remain unidentified in standard analyses. Here, we present a method (Chameleolyser) that...

Full description

Bibliographic Details
Main Authors: Wouter Steyaert, Lonneke Haer-Wigman, Rolph Pfundt, Debby Hellebrekers, Marloes Steehouwer, Juliet Hampstead, Elke de Boer, Alexander Stegmann, Helger Yntema, Erik-Jan Kamsteeg, Han Brunner, Alexander Hoischen, Christian Gilissen
Format: Article
Language:English
Published: Nature Portfolio 2023-10-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-023-42531-9
_version_ 1797647283752271872
author Wouter Steyaert
Lonneke Haer-Wigman
Rolph Pfundt
Debby Hellebrekers
Marloes Steehouwer
Juliet Hampstead
Elke de Boer
Alexander Stegmann
Helger Yntema
Erik-Jan Kamsteeg
Han Brunner
Alexander Hoischen
Christian Gilissen
author_facet Wouter Steyaert
Lonneke Haer-Wigman
Rolph Pfundt
Debby Hellebrekers
Marloes Steehouwer
Juliet Hampstead
Elke de Boer
Alexander Stegmann
Helger Yntema
Erik-Jan Kamsteeg
Han Brunner
Alexander Hoischen
Christian Gilissen
author_sort Wouter Steyaert
collection DOAJ
description Abstract The short lengths of short-read sequencing reads challenge the analysis of paralogous genomic regions in exome and genome sequencing data. Most genetic variants within these homologous regions therefore remain unidentified in standard analyses. Here, we present a method (Chameleolyser) that accurately identifies single nucleotide variants and small insertions/deletions (SNVs/Indels), copy number variants and ectopic gene conversion events in duplicated genomic regions using whole-exome sequencing data. Application to a cohort of 41,755 exome samples yields 20,432 rare homozygous deletions and 2,529,791 rare SNVs/Indels, of which we show that 338,084 are due to gene conversion events. None of the SNVs/Indels are detectable using regular analysis techniques. Validation by high-fidelity long-read sequencing in 20 samples confirms >88% of called variants. Focusing on variation in known disease genes leads to a direct molecular diagnosis in 25 previously undiagnosed patients. Our method can readily be applied to existing exome data.
first_indexed 2024-03-11T15:14:04Z
format Article
id doaj.art-b6807c8412374d2aa25e893290d2b4c6
institution Directory Open Access Journal
issn 2041-1723
language English
last_indexed 2024-03-11T15:14:04Z
publishDate 2023-10-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj.art-b6807c8412374d2aa25e893290d2b4c62023-10-29T12:29:37ZengNature PortfolioNature Communications2041-17232023-10-0114111310.1038/s41467-023-42531-9Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variationWouter Steyaert0Lonneke Haer-Wigman1Rolph Pfundt2Debby Hellebrekers3Marloes Steehouwer4Juliet Hampstead5Elke de Boer6Alexander Stegmann7Helger Yntema8Erik-Jan Kamsteeg9Han Brunner10Alexander Hoischen11Christian Gilissen12Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterMaastricht University Medical Center + , Department of Clinical GeneticsDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterMaastricht University Medical Center + , Department of Clinical GeneticsDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterDepartment of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical CenterAbstract The short lengths of short-read sequencing reads challenge the analysis of paralogous genomic regions in exome and genome sequencing data. Most genetic variants within these homologous regions therefore remain unidentified in standard analyses. Here, we present a method (Chameleolyser) that accurately identifies single nucleotide variants and small insertions/deletions (SNVs/Indels), copy number variants and ectopic gene conversion events in duplicated genomic regions using whole-exome sequencing data. Application to a cohort of 41,755 exome samples yields 20,432 rare homozygous deletions and 2,529,791 rare SNVs/Indels, of which we show that 338,084 are due to gene conversion events. None of the SNVs/Indels are detectable using regular analysis techniques. Validation by high-fidelity long-read sequencing in 20 samples confirms >88% of called variants. Focusing on variation in known disease genes leads to a direct molecular diagnosis in 25 previously undiagnosed patients. Our method can readily be applied to existing exome data.https://doi.org/10.1038/s41467-023-42531-9
spellingShingle Wouter Steyaert
Lonneke Haer-Wigman
Rolph Pfundt
Debby Hellebrekers
Marloes Steehouwer
Juliet Hampstead
Elke de Boer
Alexander Stegmann
Helger Yntema
Erik-Jan Kamsteeg
Han Brunner
Alexander Hoischen
Christian Gilissen
Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
Nature Communications
title Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
title_full Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
title_fullStr Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
title_full_unstemmed Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
title_short Systematic analysis of paralogous regions in 41,755 exomes uncovers clinically relevant variation
title_sort systematic analysis of paralogous regions in 41 755 exomes uncovers clinically relevant variation
url https://doi.org/10.1038/s41467-023-42531-9
work_keys_str_mv AT woutersteyaert systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT lonnekehaerwigman systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT rolphpfundt systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT debbyhellebrekers systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT marloessteehouwer systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT juliethampstead systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT elkedeboer systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT alexanderstegmann systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT helgeryntema systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT erikjankamsteeg systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT hanbrunner systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT alexanderhoischen systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation
AT christiangilissen systematicanalysisofparalogousregionsin41755exomesuncoversclinicallyrelevantvariation