Composition-driven symptom phrase recognition for Chinese medical consultation corpora

Abstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dic...

Full description

Bibliographic Details
Main Authors: Xuan Gu, Zhengya Sun, Wensheng Zhang
Format: Article
Language:English
Published: BMC 2021-12-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-021-01716-2
_version_ 1819173971396919296
author Xuan Gu
Zhengya Sun
Wensheng Zhang
author_facet Xuan Gu
Zhengya Sun
Wensheng Zhang
author_sort Xuan Gu
collection DOAJ
description Abstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. Methods In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. Results Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. Conclusions Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels.
first_indexed 2024-12-22T20:31:33Z
format Article
id doaj.art-ab07a17259404a6c89df7e4d0519b1a5
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-22T20:31:33Z
publishDate 2021-12-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-ab07a17259404a6c89df7e4d0519b1a52022-12-21T18:13:36ZengBMCBMC Medical Informatics and Decision Making1472-69472021-12-0121111510.1186/s12911-021-01716-2Composition-driven symptom phrase recognition for Chinese medical consultation corporaXuan Gu0Zhengya Sun1Wensheng Zhang2University of Chinese Academy of SciencesUniversity of Chinese Academy of SciencesUniversity of Chinese Academy of SciencesAbstract Background Symptom phrase recognition is essential to improve the use of unstructured medical consultation corpora for the development of automated question answering systems. A majority of previous works typically require enough manually annotated training data or as complete a symptom dictionary as possible. However, when applied to real scenarios, they will face a dilemma due to the scarcity of the annotated textual resources and the diversity of the spoken language expressions. Methods In this paper, we propose a composition-driven method to recognize the symptom phrases from Chinese medical consultation corpora without any annotations. The basic idea is to directly learn models that capture the composition, i.e., the arrangement of the symptom components (semantic units of words). We introduce an automatic annotation strategy for the standard symptom phrases which are collected from multiple data sources. In particular, we combine the position information and the interaction scores between symptom components to characterize the symptom phrases. Equipped with such models, we are allowed to robustly extract symptom phrases that are not seen before. Results Without any manual annotations, our method achieves strong positive results on symptom phrase recognition tasks. Experiments also show that our method enjoys great potential with access to plenty of corpora. Conclusions Compositionality offers a feasible solution for extracting information from unstructured free text with scarce labels.https://doi.org/10.1186/s12911-021-01716-2Symptom phrase recognitionNamed entity recognitionMedical consultationComposition driven
spellingShingle Xuan Gu
Zhengya Sun
Wensheng Zhang
Composition-driven symptom phrase recognition for Chinese medical consultation corpora
BMC Medical Informatics and Decision Making
Symptom phrase recognition
Named entity recognition
Medical consultation
Composition driven
title Composition-driven symptom phrase recognition for Chinese medical consultation corpora
title_full Composition-driven symptom phrase recognition for Chinese medical consultation corpora
title_fullStr Composition-driven symptom phrase recognition for Chinese medical consultation corpora
title_full_unstemmed Composition-driven symptom phrase recognition for Chinese medical consultation corpora
title_short Composition-driven symptom phrase recognition for Chinese medical consultation corpora
title_sort composition driven symptom phrase recognition for chinese medical consultation corpora
topic Symptom phrase recognition
Named entity recognition
Medical consultation
Composition driven
url https://doi.org/10.1186/s12911-021-01716-2
work_keys_str_mv AT xuangu compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora
AT zhengyasun compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora
AT wenshengzhang compositiondrivensymptomphraserecognitionforchinesemedicalconsultationcorpora