Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse

Abstract Background Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whet...

Full description

Bibliographic Details
Main Authors: Christian Groß, Dick de Ridder, Marcel Reinders
Format: Article
Language:English
Published: BMC 2018-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2337-5
_version_ 1811273906051350528
author Christian Groß
Dick de Ridder
Marcel Reinders
author_facet Christian Groß
Dick de Ridder
Marcel Reinders
author_sort Christian Groß
collection DOAJ
description Abstract Background Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species. Results Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good. Conclusions It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation.
first_indexed 2024-04-12T23:08:02Z
format Article
id doaj.art-76a63ace48834db182ae8707dae12f2d
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-04-12T23:08:02Z
publishDate 2018-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-76a63ace48834db182ae8707dae12f2d2022-12-22T03:12:52ZengBMCBMC Bioinformatics1471-21052018-10-0119111010.1186/s12859-018-2337-5Predicting variant deleteriousness in non-human species: applying the CADD approach in mouseChristian Groß0Dick de Ridder1Marcel Reinders2Delft Bioinformatics Lab, University of Technology DelftBioinformatics Group, Wageningen University & ResearchDelft Bioinformatics Lab, University of Technology DelftAbstract Background Predicting the deleteriousness of observed genomic variants has taken a step forward with the introduction of the Combined Annotation Dependent Depletion (CADD) approach, which trains a classifier on the wealth of available human genomic information. This raises the question whether it can be done with less data for non-human species. Here, we investigate the prerequisites to construct a CADD-based model for a non-human species. Results Performance of the mouse model is competitive with that of the human CADD model and better than established methods like PhastCons conservation scores and SIFT. Like in the human case, performance varies for different genomic regions and is best for coding regions. We also show the benefits of generating a species-specific model over lifting variants to a different species or applying a generic model. With fewer genomic annotations, performance on the test set as well as on the three validation sets is still good. Conclusions It is feasible to construct species-specific CADD models even when annotations such as epigenetic markers are not available. The minimal requirement for these models is the availability of a set of genomes of closely related species that can be used to infer an ancestor genome and substitution rates for the data generation.http://link.springer.com/article/10.1186/s12859-018-2337-5GenomicsGenome annotationVariant annotationSequence annotationMouse genetics
spellingShingle Christian Groß
Dick de Ridder
Marcel Reinders
Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
BMC Bioinformatics
Genomics
Genome annotation
Variant annotation
Sequence annotation
Mouse genetics
title Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_full Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_fullStr Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_full_unstemmed Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_short Predicting variant deleteriousness in non-human species: applying the CADD approach in mouse
title_sort predicting variant deleteriousness in non human species applying the cadd approach in mouse
topic Genomics
Genome annotation
Variant annotation
Sequence annotation
Mouse genetics
url http://link.springer.com/article/10.1186/s12859-018-2337-5
work_keys_str_mv AT christiangroß predictingvariantdeleteriousnessinnonhumanspeciesapplyingthecaddapproachinmouse
AT dickderidder predictingvariantdeleteriousnessinnonhumanspeciesapplyingthecaddapproachinmouse
AT marcelreinders predictingvariantdeleteriousnessinnonhumanspeciesapplyingthecaddapproachinmouse