Composition-driven symptom phrase recognition for Chinese medical consultation corpora
Abstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dic...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-12-01
|
Series: | BMC Medical Informatics and Decision Making |
Subjects: | |
Online Access: | https://doi.org/10.1186/s12911-021-01716-2 |
_version_ | 1819173971396919296 |
---|---|
author | Xuan Gu Zhengya Sun Wensheng Zhang |
author_facet | Xuan Gu Zhengya Sun Wensheng Zhang |
author_sort | Xuan Gu |
collection | DOAJ |
description | Abstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. Methods In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. Results Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. Conclusions Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels. |
first_indexed | 2024-12-22T20:31:33Z |
format | Article |
id | doaj.art-ab07a17259404a6c89df7e4d0519b1a5 |
institution | Directory Open Access Journal |
issn | 1472-6947 |
language | English |
last_indexed | 2024-12-22T20:31:33Z |
publishDate | 2021-12-01 |
publisher | BMC |
record_format | Article |
series | BMC Medical Informatics and Decision Making |
spelling | doaj.art-ab07a17259404a6c89df7e4d0519b1a52022-12-21T18:13:36ZengBMCBMC Medical Informatics and Decision Making1472-69472021-12-0121111510.1186/s12911-021-01716-2Composition-driven symptom phrase recognition for Chinese medical consultation corporaXuan Gu0Zhengya Sun1Wensheng Zhang2University of Chinese Academy of SciencesUniversity of Chinese Academy of SciencesUniversity of Chinese Academy of SciencesAbstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. Methods In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. Results Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. Conclusions Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels.https://doi.org/10.1186/s12911-021-01716-2Symptom phrase recognitionNamed entity recognitionMedical consultationComposition driven |
spellingShingle | Xuan Gu Zhengya Sun Wensheng Zhang Composition-driven symptom phrase recognition for Chinese medical consultation corpora BMC Medical Informatics and Decision Making Symptom phrase recognition Named entity recognition Medical consultation Composition driven |
title | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_full | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_fullStr | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_full_unstemmed | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_short | Composition-driven symptom phrase recognition for Chinese medical consultation corpora |
title_sort | composition driven symptom phrase recognition for chinese medical consultation corpora |
topic | Symptom phrase recognition Named entity recognition Medical consultation Composition driven |
url | https://doi.org/10.1186/s12911-021-01716-2 |
work_keys_str_mv | AT xuangu compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora AT zhengyasun compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora AT wenshengzhang compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora |