A multi-BERT hybrid system for named entity recognition in Spanish radiology reports

The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi...

Full description

Bibliographic Details
Main Authors: Suárez-Paniagua, V, Dong, H, Casey, A
Format: Conference item
Language:English
Published: CEUR Workshop Proceedings 2021
_version_ 1826308923378892800
author Suárez-Paniagua, V
Dong, H
Casey, A
author_facet Suárez-Paniagua, V
Dong, H
Casey, A
author_sort Suárez-Paniagua, V
collection OXFORD
description The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain.
first_indexed 2024-03-07T07:26:32Z
format Conference item
id oxford-uuid:f184bdfe-f91e-4b83-a27a-f5c360a61ee4
institution University of Oxford
language English
last_indexed 2024-03-07T07:26:32Z
publishDate 2021
publisher CEUR Workshop Proceedings
record_format dspace
spelling oxford-uuid:f184bdfe-f91e-4b83-a27a-f5c360a61ee42022-11-18T08:56:55ZA multi-BERT hybrid system for named entity recognition in Spanish radiology reportsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f184bdfe-f91e-4b83-a27a-f5c360a61ee4EnglishSymplectic ElementsCEUR Workshop Proceedings2021Suárez-Paniagua, VDong, HCasey, AThe present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain.
spellingShingle Suárez-Paniagua, V
Dong, H
Casey, A
A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
title A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
title_full A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
title_fullStr A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
title_full_unstemmed A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
title_short A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
title_sort multi bert hybrid system for named entity recognition in spanish radiology reports
work_keys_str_mv AT suarezpaniaguav amultiberthybridsystemfornamedentityrecognitioninspanishradiologyreports
AT dongh amultiberthybridsystemfornamedentityrecognitioninspanishradiologyreports
AT caseya amultiberthybridsystemfornamedentityrecognitioninspanishradiologyreports
AT suarezpaniaguav multiberthybridsystemfornamedentityrecognitioninspanishradiologyreports
AT dongh multiberthybridsystemfornamedentityrecognitioninspanishradiologyreports
AT caseya multiberthybridsystemfornamedentityrecognitioninspanishradiologyreports