Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus
Abstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate...
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2023-08-01
|
Series: | BMC Genomics |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12864-023-09561-5 |
_version_ | 1797453845243101184 |
---|---|
author | Erica L. Lyons Daniel Watson Mohammad S. Alodadi Sharie J. Haugabook Gregory J. Tawa Fady Hannah-Shmouni Forbes D. Porter Jack R. Collins Elizabeth A. Ottinger Uma S. Mudunuri |
author_facet | Erica L. Lyons Daniel Watson Mohammad S. Alodadi Sharie J. Haugabook Gregory J. Tawa Fady Hannah-Shmouni Forbes D. Porter Jack R. Collins Elizabeth A. Ottinger Uma S. Mudunuri |
author_sort | Erica L. Lyons |
collection | DOAJ |
description | Abstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. Results This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. Conclusions Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases. |
first_indexed | 2024-03-09T15:27:46Z |
format | Article |
id | doaj.art-e690a548a1d4449da411eb08d584528a |
institution | Directory Open Access Journal |
issn | 1471-2164 |
language | English |
last_indexed | 2024-03-09T15:27:46Z |
publishDate | 2023-08-01 |
publisher | BMC |
record_format | Article |
series | BMC Genomics |
spelling | doaj.art-e690a548a1d4449da411eb08d584528a2023-11-26T12:25:59ZengBMCBMC Genomics1471-21642023-08-0124111810.1186/s12864-023-09561-5Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focusErica L. Lyons0Daniel Watson1Mohammad S. Alodadi2Sharie J. Haugabook3Gregory J. Tawa4Fady Hannah-Shmouni5Forbes D. Porter6Jack R. Collins7Elizabeth A. Ottinger8Uma S. Mudunuri9Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchDivision of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of HealthDivision of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of HealthDivision of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of HealthDivision of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of HealthAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchDivision of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of HealthAdvanced Biomedical Computational Science, Frederick National Laboratory for Cancer ResearchAbstract Background Approximately 4–8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. Results This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. Conclusions Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.https://doi.org/10.1186/s12864-023-09561-5Rare diseaseGene variantLiterature curationCTDSLC6A8Variant database |
spellingShingle | Erica L. Lyons Daniel Watson Mohammad S. Alodadi Sharie J. Haugabook Gregory J. Tawa Fady Hannah-Shmouni Forbes D. Porter Jack R. Collins Elizabeth A. Ottinger Uma S. Mudunuri Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus BMC Genomics Rare disease Gene variant Literature curation CTD SLC6A8 Variant database |
title | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_full | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_fullStr | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_full_unstemmed | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_short | Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus |
title_sort | rare disease variant curation from literature assessing gaps with creatine transport deficiency in focus |
topic | Rare disease Gene variant Literature curation CTD SLC6A8 Variant database |
url | https://doi.org/10.1186/s12864-023-09561-5 |
work_keys_str_mv | AT ericallyons rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT danielwatson rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT mohammadsalodadi rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT shariejhaugabook rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT gregoryjtawa rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT fadyhannahshmouni rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT forbesdporter rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT jackrcollins rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT elizabethaottinger rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus AT umasmudunuri rarediseasevariantcurationfromliteratureassessinggapswithcreatinetransportdeficiencyinfocus |