Summary: | Abstract Background VerbNet, an extensive computational verb lexicon for English, has proved useful for supporting a wide range of Natural Language Processing tasks requiring information about the behaviour and meaning of verbs. Biomedical text processing and mining could benefit from a similar resource. We take the first step towards the development of BioVerbNet: A VerbNet specifically aimed at describing verbs in the area of biomedicine. Because VerbNet-style classification is extremely time consuming, we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, specifically developed for class-based optimization, to expand the classification with new verbs, using all the PubMed abstracts and the full articles in the PubMed Central Open Access subset as data. Results Direct evaluation of the resulting classification against BioSimVerb (verb similarity judgement data in biomedicine) shows promising results when representation learning is performed using verb class-based contexts. Human validation by linguists and biologists reveals that the automatically expanded classification is highly accurate. Including novel, valid member verbs and classes, our method can be used to facilitate cost-effective development of BioVerbNet. Conclusion This work constitutes the first effort on applying a state-of-the-art architecture for neural representation learning to biomedical verb classification. While we discuss future optimization of the method, our promising results suggest that the automatic classification released with this article can be used to readily support application tasks in biomedicine.
|