Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus

Abstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate...

Full description

Bibliographic Details
Main Authors: Erica L. Lyons, Daniel Watson, Mohammad S. Alodadi, Sharie J. Haugabook, Gregory J. Tawa, Fady Hannah-Shmouni, Forbes D. Porter, Jack R. Collins, Elizabeth A. Ottinger, Uma S. Mudunuri
Format: Article
Language:English
Published: BMC 2023-08-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-023-09561-5
_version_ 1797453845243101184
author Erica L. Lyons
Daniel Watson
Mohammad S. Alodadi
Sharie J. Haugabook
Gregory J. Tawa
Fady Hannah-Shmouni
Forbes D. Porter
Jack R. Collins
Elizabeth A. Ottinger
Uma S. Mudunuri
author_facet Erica L. Lyons
Daniel Watson
Mohammad S. Alodadi
Sharie J. Haugabook
Gregory J. Tawa
Fady Hannah-Shmouni
Forbes D. Porter
Jack R. Collins
Elizabeth A. Ottinger
Uma S. Mudunuri
author_sort Erica L. Lyons
collection DOAJ
description Abstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. Results This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. Conclusions Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.
first_indexed 2024-03-09T15:27:46Z
format Article
id doaj.art-e690a548a1d4449da411eb08d584528a
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-03-09T15:27:46Z
publishDate 2023-08-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-e690a548a1d4449da411eb08d584528a2023-11-26T12:25:59ZengBMCBMC Genomics1471-21642023-08-0124111810.1186/s12864-023-09561-5Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focusErica L. Lyons0Daniel Watson1Mohammad S. Alodadi2Sharie J. Haugabook3Gregory J. Tawa4Fady Hannah-Shmouni5Forbes D. Porter6Jack R. Collins7Elizabeth A. Ottinger8Uma S. Mudunuri9Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchDivision of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of HealthDivision of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of HealthDivision of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of HealthDivision of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of HealthAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchDivision of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of HealthAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchAbstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. Results This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. Conclusions Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.https://doi.org/10.1186/s12864-023-09561-5Rare diseaseGene variantLiterature curationCTDSLC6A8Variant database
spellingShingle Erica L. Lyons
Daniel Watson
Mohammad S. Alodadi
Sharie J. Haugabook
Gregory J. Tawa
Fady Hannah-Shmouni
Forbes D. Porter
Jack R. Collins
Elizabeth A. Ottinger
Uma S. Mudunuri
Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
BMC Genomics
Rare disease
Gene variant
Literature curation
CTD
SLC6A8
Variant database
title Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_full Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_fullStr Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_full_unstemmed Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_short Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
title_sort rare disease variant curation from literature assessing gaps with creatine transport deficiency in focus
topic Rare disease
Gene variant
Literature curation
CTD
SLC6A8
Variant database
url https://doi.org/10.1186/s12864-023-09561-5
work_keys_str_mv AT ericallyons rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT danielwatson rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT mohammadsalodadi rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT shariejhaugabook rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT gregoryjtawa rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT fadyhannahshmouni rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT forbesdporter rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT jackrcollins rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT elizabethaottinger rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus
AT umasmudunuri rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus