Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

BackgroundMultiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predi...

Full description

Bibliographic Details
Main Authors:	Chao Jiang, Victoria Ngo, Richard Chapman, Yue Yu, Hongfang Liu, Guoqian Jiang, Nansu Zong
Format:	Article
Language:	English
Published:	JMIR Publications 2022-07-01
Series:	Journal of Medical Internet Research
Online Access:	https://www.jmir.org/2022/7/e38584

_version_	1797734925239058432
author	Chao Jiang Victoria Ngo Richard Chapman Yue Yu Hongfang Liu Guoqian Jiang Nansu Zong
author_facet	Chao Jiang Victoria Ngo Richard Chapman Yue Yu Hongfang Liu Guoqian Jiang Nansu Zong
author_sort	Chao Jiang
collection	DOAJ
description	BackgroundMultiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. ObjectiveData quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. MethodsThe proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. ResultsThe performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. ConclusionsOur preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.
first_indexed	2024-03-12T12:51:15Z
format	Article
id	doaj.art-6407370fa4434939be6a512d0c617f61
institution	Directory Open Access Journal
issn	1438-8871
language	English
last_indexed	2024-03-12T12:51:15Z
publishDate	2022-07-01
publisher	JMIR Publications
record_format	Article
series	Journal of Medical Internet Research
spelling	doaj.art-6407370fa4434939be6a512d0c617f612023-08-28T22:42:28ZengJMIR PublicationsJournal of Medical Internet Research1438-88712022-07-01247e3858410.2196/38584Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and ValidationChao Jianghttps://orcid.org/0000-0002-0467-6177Victoria Ngohttps://orcid.org/0000-0001-9973-8379Richard Chapmanhttps://orcid.org/0000-0002-3600-0286Yue Yuhttps://orcid.org/0000-0002-3900-1217Hongfang Liuhttps://orcid.org/0000-0003-2570-3741Guoqian Jianghttps://orcid.org/0000-0003-2940-0019Nansu Zonghttps://orcid.org/0000-0003-0066-9524 BackgroundMultiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. ObjectiveData quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. MethodsThe proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. ResultsThe performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. ConclusionsOur preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.https://www.jmir.org/2022/7/e38584
spellingShingle	Chao Jiang Victoria Ngo Richard Chapman Yue Yu Hongfang Liu Guoqian Jiang Nansu Zong Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation Journal of Medical Internet Research
title	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_full	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_fullStr	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_full_unstemmed	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_short	Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation
title_sort	deep denoising of raw biomedical knowledge graph from covid 19 literature litcovid and pubtator framework development and validation
url	https://www.jmir.org/2022/7/e38584
work_keys_str_mv	AT chaojiang deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT victoriango deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT richardchapman deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT yueyu deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT hongfangliu deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT guoqianjiang deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation AT nansuzong deepdenoisingofrawbiomedicalknowledgegraphfromcovid19literaturelitcovidandpubtatorframeworkdevelopmentandvalidation

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

Similar Items