Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation

Abstract Background Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited...

Full description

Bibliographic Details
Main Authors: Mila Glavaški, Lazar Velicki
Format: Article
Language:English
Published: BMC 2021-10-01
Series:BioData Mining
Subjects:
Online Access:https://doi.org/10.1186/s13040-021-00279-2
_version_ 1831679082102784000
author Mila Glavaški
Lazar Velicki
author_facet Mila Glavaški
Lazar Velicki
author_sort Mila Glavaški
collection DOAJ
description Abstract Background Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease, with genotype–phenotype associations still incompletely understood. We compared human- and machine-curated HCM molecular mechanisms’ models and examined the performance of different machine approaches for that task. Results We created six models representing HCM molecular mechanisms using different approaches and made them publicly available, analyzed them as networks, and tried to explain the models’ differences by the analysis of factors that affect the quality of machine-curated models (query constraints and reading systems’ performance). A result of this work is also the Interactive HCM map, the only publicly available knowledge resource dedicated to HCM. Sizes and topological parameters of the networks differed notably, and a low consensus was found in terms of centrality measures between networks. Consensus about the most important nodes was achieved only with respect to one element (calcium). Models with a reduced level of noise were generated and cooperatively working elements were detected. REACH and TRIPS reading systems showed much higher accuracy than Sparser, but at the cost of extraction performance. TRIPS proved to be the best single reading system for text segments about HCM, in terms of the compromise between accuracy and extraction performance. Conclusions Different approaches in curation can produce models of the same disease with diverse characteristics, and they give rise to utterly different conclusions in subsequent analysis. The final purpose of the model should direct the choice of curation techniques. Manual curation represents the gold standard for information extraction in biomedical research and is most suitable when only high-quality elements for models are required. Automated curation provides more substance, but high level of noise is expected. Different curation strategies can reduce the level of human input needed. Biomedical knowledge would benefit overwhelmingly, especially as to its rapid growth, if computers were to be able to assist in analysis on a larger scale.
first_indexed 2024-12-20T05:12:11Z
format Article
id doaj.art-c2859dfcc83f4970b302a45e7df32ce2
institution Directory Open Access Journal
issn 1756-0381
language English
last_indexed 2024-12-20T05:12:11Z
publishDate 2021-10-01
publisher BMC
record_format Article
series BioData Mining
spelling doaj.art-c2859dfcc83f4970b302a45e7df32ce22022-12-21T19:52:15ZengBMCBioData Mining1756-03812021-10-0114112510.1186/s13040-021-00279-2Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representationMila Glavaški0Lazar Velicki1Faculty of Medicine, University of Novi SadFaculty of Medicine, University of Novi SadAbstract Background Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease, with genotype–phenotype associations still incompletely understood. We compared human- and machine-curated HCM molecular mechanisms’ models and examined the performance of different machine approaches for that task. Results We created six models representing HCM molecular mechanisms using different approaches and made them publicly available, analyzed them as networks, and tried to explain the models’ differences by the analysis of factors that affect the quality of machine-curated models (query constraints and reading systems’ performance). A result of this work is also the Interactive HCM map, the only publicly available knowledge resource dedicated to HCM. Sizes and topological parameters of the networks differed notably, and a low consensus was found in terms of centrality measures between networks. Consensus about the most important nodes was achieved only with respect to one element (calcium). Models with a reduced level of noise were generated and cooperatively working elements were detected. REACH and TRIPS reading systems showed much higher accuracy than Sparser, but at the cost of extraction performance. TRIPS proved to be the best single reading system for text segments about HCM, in terms of the compromise between accuracy and extraction performance. Conclusions Different approaches in curation can produce models of the same disease with diverse characteristics, and they give rise to utterly different conclusions in subsequent analysis. The final purpose of the model should direct the choice of curation techniques. Manual curation represents the gold standard for information extraction in biomedical research and is most suitable when only high-quality elements for models are required. Automated curation provides more substance, but high level of noise is expected. Different curation strategies can reduce the level of human input needed. Biomedical knowledge would benefit overwhelmingly, especially as to its rapid growth, if computers were to be able to assist in analysis on a larger scale.https://doi.org/10.1186/s13040-021-00279-2Data miningCurationAutomated curationHypertrophic cardiomyopathySignaling pathwaysKnowledge graphs
spellingShingle Mila Glavaški
Lazar Velicki
Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
BioData Mining
Data mining
Curation
Automated curation
Hypertrophic cardiomyopathy
Signaling pathways
Knowledge graphs
title Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
title_full Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
title_fullStr Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
title_full_unstemmed Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
title_short Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms’ representation
title_sort humans and machines in biomedical knowledge curation hypertrophic cardiomyopathy molecular mechanisms representation
topic Data mining
Curation
Automated curation
Hypertrophic cardiomyopathy
Signaling pathways
Knowledge graphs
url https://doi.org/10.1186/s13040-021-00279-2
work_keys_str_mv AT milaglavaski humansandmachinesinbiomedicalknowledgecurationhypertrophiccardiomyopathymolecularmechanismsrepresentation
AT lazarvelicki humansandmachinesinbiomedicalknowledgecurationhypertrophiccardiomyopathymolecularmechanismsrepresentation