IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders

Summary: Diagnosing rare developmental disorders using genome-wide sequencing data commonly necessitates review of multiple plausible candidate variants, often using ontologies of categorical clinical terms. We show that Integrating Multiple Phenotype Resources Optimizes Variant Evaluation in Develo...

Full description

Bibliographic Details
Main Authors: Stuart Aitken, Helen V. Firth, Caroline F. Wright, Matthew E. Hurles, David R. FitzPatrick, Colin A. Semple
Format: Article
Language:English
Published: Elsevier 2023-01-01
Series:HGG Advances
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666247722000793
_version_ 1811187169910325248
author Stuart Aitken
Helen V. Firth
Caroline F. Wright
Matthew E. Hurles
David R. FitzPatrick
Colin A. Semple
author_facet Stuart Aitken
Helen V. Firth
Caroline F. Wright
Matthew E. Hurles
David R. FitzPatrick
Colin A. Semple
author_sort Stuart Aitken
collection DOAJ
description Summary: Diagnosing rare developmental disorders using genome-wide sequencing data commonly necessitates review of multiple plausible candidate variants, often using ontologies of categorical clinical terms. We show that Integrating Multiple Phenotype Resources Optimizes Variant Evaluation in Developmental Disorders (IMPROVE-DD) by incorporating additional classes of data commonly available to clinicians and recorded in health records. In doing so, we quantify the distinct contributions of sex, growth, and development in addition to Human Phenotype Ontology (HPO) terms and demonstrate added value from these readily available information sources. We use likelihood ratios for nominal and quantitative data and propose a classifier for HPO terms in this framework. This Bayesian framework results in more robust diagnoses. Using data systematically collected in the Deciphering Developmental Disorders study, we considered 77 genes with pathogenic/likely pathogenic variants in ≥10 individuals. All genes showed at least a satisfactory prediction by receiver operating characteristic when testing on training data (AUC ≥ 0.6), and HPO terms were the best predictor for the majority of genes, though a minority (13/77) of genes were better predicted by other phenotypic data types. Overall, classifiers based upon multiple integrated phenotypic data sources performed better than those based upon any individual source, and importantly, integrated models produced notably fewer false positives. Finally, we show that IMPROVE-DD models with good predictive performance on cross-validation can be constructed from relatively few individuals. This suggests new strategies for candidate gene prioritization and highlights the value of systematic clinical data collection to support diagnostic programs.
first_indexed 2024-04-11T13:58:47Z
format Article
id doaj.art-02b9675f33534c6bae4aba0033a026ad
institution Directory Open Access Journal
issn 2666-2477
language English
last_indexed 2024-04-11T13:58:47Z
publishDate 2023-01-01
publisher Elsevier
record_format Article
series HGG Advances
spelling doaj.art-02b9675f33534c6bae4aba0033a026ad2022-12-22T04:20:11ZengElsevierHGG Advances2666-24772023-01-0141100162IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disordersStuart Aitken0Helen V. Firth1Caroline F. Wright2Matthew E. Hurles3David R. FitzPatrick4Colin A. Semple5MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UKWellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK; Clinical Genetics Department, Addenbrooke’s Hospital Cambridge University Hospitals, Cambridge CB2 0QQ, UKUniversity of Exeter Medical School, Royal Devon & Exeter Hospital, Barrack Road, Exeter EX2 5DW, UKWellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UKMRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UKMRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK; Corresponding authorSummary: Diagnosing rare developmental disorders using genome-wide sequencing data commonly necessitates review of multiple plausible candidate variants, often using ontologies of categorical clinical terms. We show that Integrating Multiple Phenotype Resources Optimizes Variant Evaluation in Developmental Disorders (IMPROVE-DD) by incorporating additional classes of data commonly available to clinicians and recorded in health records. In doing so, we quantify the distinct contributions of sex, growth, and development in addition to Human Phenotype Ontology (HPO) terms and demonstrate added value from these readily available information sources. We use likelihood ratios for nominal and quantitative data and propose a classifier for HPO terms in this framework. This Bayesian framework results in more robust diagnoses. Using data systematically collected in the Deciphering Developmental Disorders study, we considered 77 genes with pathogenic/likely pathogenic variants in ≥10 individuals. All genes showed at least a satisfactory prediction by receiver operating characteristic when testing on training data (AUC ≥ 0.6), and HPO terms were the best predictor for the majority of genes, though a minority (13/77) of genes were better predicted by other phenotypic data types. Overall, classifiers based upon multiple integrated phenotypic data sources performed better than those based upon any individual source, and importantly, integrated models produced notably fewer false positives. Finally, we show that IMPROVE-DD models with good predictive performance on cross-validation can be constructed from relatively few individuals. This suggests new strategies for candidate gene prioritization and highlights the value of systematic clinical data collection to support diagnostic programs.http://www.sciencedirect.com/science/article/pii/S2666247722000793human phenotype ontologyphenotypegenotypedevelopmental diseasegrowthdevelopmental milestones
spellingShingle Stuart Aitken
Helen V. Firth
Caroline F. Wright
Matthew E. Hurles
David R. FitzPatrick
Colin A. Semple
IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
HGG Advances
human phenotype ontology
phenotype
genotype
developmental disease
growth
developmental milestones
title IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
title_full IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
title_fullStr IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
title_full_unstemmed IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
title_short IMPROVE-DD: Integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
title_sort improve dd integrating multiple phenotype resources optimizes variant evaluation in genetically determined developmental disorders
topic human phenotype ontology
phenotype
genotype
developmental disease
growth
developmental milestones
url http://www.sciencedirect.com/science/article/pii/S2666247722000793
work_keys_str_mv AT stuartaitken improveddintegratingmultiplephenotyperesourcesoptimizesvariantevaluationingeneticallydetermineddevelopmentaldisorders
AT helenvfirth improveddintegratingmultiplephenotyperesourcesoptimizesvariantevaluationingeneticallydetermineddevelopmentaldisorders
AT carolinefwright improveddintegratingmultiplephenotyperesourcesoptimizesvariantevaluationingeneticallydetermineddevelopmentaldisorders
AT matthewehurles improveddintegratingmultiplephenotyperesourcesoptimizesvariantevaluationingeneticallydetermineddevelopmentaldisorders
AT davidrfitzpatrick improveddintegratingmultiplephenotyperesourcesoptimizesvariantevaluationingeneticallydetermineddevelopmentaldisorders
AT colinasemple improveddintegratingmultiplephenotyperesourcesoptimizesvariantevaluationingeneticallydetermineddevelopmentaldisorders