Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.

Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of c...

Full description

Bibliographic Details
Main Authors: Halil Kilicoglu, Dina Demner-Fushman
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4774913?pdf=render
_version_ 1819175814315376640
author Halil Kilicoglu
Dina Demner-Fushman
author_facet Halil Kilicoglu
Dina Demner-Fushman
author_sort Halil Kilicoglu
collection DOAJ
description Coreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text.
first_indexed 2024-12-22T21:00:51Z
format Article
id doaj.art-655a32d22b124816b490bca3fe6a95dd
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-12-22T21:00:51Z
publishDate 2016-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-655a32d22b124816b490bca3fe6a95dd2022-12-21T18:12:50ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01113e014853810.1371/journal.pone.0148538Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.Halil KilicogluDina Demner-FushmanCoreference resolution is one of the fundamental and challenging tasks in natural language processing. Resolving coreference successfully can have a significant positive effect on downstream natural language processing tasks, such as information extraction and question answering. The importance of coreference resolution for biomedical text analysis applications has increasingly been acknowledged. One of the difficulties in coreference resolution stems from the fact that distinct types of coreference (e.g., anaphora, appositive) are expressed with a variety of lexical and syntactic means (e.g., personal pronouns, definite noun phrases), and that resolution of each combination often requires a different approach. In the biomedical domain, it is common for coreference annotation and resolution efforts to focus on specific subcategories of coreference deemed important for the downstream task. In the current work, we aim to address some of these concerns regarding coreference resolution in biomedical text. We propose a general, modular framework underpinned by a smorgasbord architecture (Bio-SCoRes), which incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. For development and evaluation, we used a corpus of structured drug labels annotated with fine-grained coreference information. In addition, we evaluated our approach on two other corpora (i2b2/VA discharge summaries and protein coreference dataset) to investigate its generality and ease of adaptation to other biomedical text types. Our results demonstrate the usefulness of our novel smorgasbord architecture. The specific pipelines based on the architecture perform successfully in linking coreferential mention pairs, while we find that recognition of full mention clusters is more challenging. The corpus of structured drug labels (SPL) as well as the components of Bio-SCoRes and some of the pipelines based on it are publicly available at https://github.com/kilicogluh/Bio-SCoRes. We believe that Bio-SCoRes can serve as a strong and extensible baseline system for coreference resolution of biomedical text.http://europepmc.org/articles/PMC4774913?pdf=render
spellingShingle Halil Kilicoglu
Dina Demner-Fushman
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.
PLoS ONE
title Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.
title_full Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.
title_fullStr Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.
title_full_unstemmed Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.
title_short Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.
title_sort bio scores a smorgasbord architecture for coreference resolution in biomedical text
url http://europepmc.org/articles/PMC4774913?pdf=render
work_keys_str_mv AT halilkilicoglu bioscoresasmorgasbordarchitectureforcoreferenceresolutioninbiomedicaltext
AT dinademnerfushman bioscoresasmorgasbordarchitectureforcoreferenceresolutioninbiomedicaltext