A corpus of plant-disease relations in the biomedical domain.

<h4>Background</h4>Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have be...

Full description

Bibliographic Details
Main Authors: Baeksoo Kim, Wonjun Choi, Hyunju Lee
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0221582
_version_ 1819140278011822080
author Baeksoo Kim
Wonjun Choi
Hyunju Lee
author_facet Baeksoo Kim
Wonjun Choi
Hyunju Lee
author_sort Baeksoo Kim
collection DOAJ
description <h4>Background</h4>Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus.<h4>Methods and results</h4>In this study, we designed and annotated a plant-disease relations corpus, and proposed a computational model to predict plant-disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant-disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707.<h4>Conclusion</h4>The plant-disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/.
first_indexed 2024-12-22T11:36:01Z
format Article
id doaj.art-c550d38ddf82477fbe7bcf837d91df64
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-22T11:36:01Z
publishDate 2019-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-c550d38ddf82477fbe7bcf837d91df642022-12-21T18:27:27ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01148e022158210.1371/journal.pone.0221582A corpus of plant-disease relations in the biomedical domain.Baeksoo KimWonjun ChoiHyunju Lee<h4>Background</h4>Many new medicines have been derived from natural sources such as plants, which have a long history of being used for disease treatment. Thus, their benefits and side effects have been studied, and plant-related information including plant and disease relations have been accumulated in Medline articles. Because numerous articles are available in Medline and are written in natural language, text-mining is important. However, a corpus of plant and disease relations is not available yet. Thus, we aimed to construct such a corpus.<h4>Methods and results</h4>In this study, we designed and annotated a plant-disease relations corpus, and proposed a computational model to predict plant-disease relations using the corpus. We categorized plant and disease relations into four types: treatments of diseases, causes of diseases, associations, and negative relations. To construct a corpus of plant-disease relations, we first created its annotation guidelines and randomly selected 200 Medline abstracts. From these abstracts, we identified 1,405 and 1,755 plant and disease mentions, annotated to 105 and 237 unique plant and disease identifiers, respectively. When we selected sentences containing at least one plant and one disease mention, we extracted 878 plant and 1,077 disease entities, which finally generated a corpus of plant-disease relations including 1,309 relations from 199 abstracts. To verify the effectiveness of the corpus, we proposed a convolutional neural network model with the shortest dependency path (SDP-CNN) and applied it to the constructed corpus. The micro F-score with ten-fold cross-validation was found to be 0.764. We also applied the proposed SDP-CNN model to all Medline abstracts. When we measured its performance for 483 randomly selected plant-disease co-occurring sentences, the model showed a precision of 0.707.<h4>Conclusion</h4>The plant-disease relations corpus is unique and represents an important resource for biomedical text-mining. The corpus of plant and disease relations is available at http://gcancer.org/pdr/.https://doi.org/10.1371/journal.pone.0221582
spellingShingle Baeksoo Kim
Wonjun Choi
Hyunju Lee
A corpus of plant-disease relations in the biomedical domain.
PLoS ONE
title A corpus of plant-disease relations in the biomedical domain.
title_full A corpus of plant-disease relations in the biomedical domain.
title_fullStr A corpus of plant-disease relations in the biomedical domain.
title_full_unstemmed A corpus of plant-disease relations in the biomedical domain.
title_short A corpus of plant-disease relations in the biomedical domain.
title_sort corpus of plant disease relations in the biomedical domain
url https://doi.org/10.1371/journal.pone.0221582
work_keys_str_mv AT baeksookim acorpusofplantdiseaserelationsinthebiomedicaldomain
AT wonjunchoi acorpusofplantdiseaserelationsinthebiomedicaldomain
AT hyunjulee acorpusofplantdiseaserelationsinthebiomedicaldomain
AT baeksookim corpusofplantdiseaserelationsinthebiomedicaldomain
AT wonjunchoi corpusofplantdiseaserelationsinthebiomedicaldomain
AT hyunjulee corpusofplantdiseaserelationsinthebiomedicaldomain