A multi-BERT hybrid system for named entity recognition in Spanish radiology reports
The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
CEUR Workshop Proceedings
2021
|
_version_ | 1826308923378892800 |
---|---|
author | Suárez-Paniagua, V Dong, H Casey, A |
author_facet | Suárez-Paniagua, V Dong, H Casey, A |
author_sort | Suárez-Paniagua, V |
collection | OXFORD |
description | The present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain. |
first_indexed | 2024-03-07T07:26:32Z |
format | Conference item |
id | oxford-uuid:f184bdfe-f91e-4b83-a27a-f5c360a61ee4 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:26:32Z |
publishDate | 2021 |
publisher | CEUR Workshop Proceedings |
record_format | dspace |
spelling | oxford-uuid:f184bdfe-f91e-4b83-a27a-f5c360a61ee42022-11-18T08:56:55ZA multi-BERT hybrid system for named entity recognition in Spanish radiology reportsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:f184bdfe-f91e-4b83-a27a-f5c360a61ee4EnglishSymplectic ElementsCEUR Workshop Proceedings2021Suárez-Paniagua, VDong, HCasey, AThe present work describes the proposed methods by the EdIE-KnowLab team in Information Extraction Task of CLEF eHealth 2021, SpRadIE Task 1. This task focuses on detecting and classifying relevant mentions in ultrasonography reports. The architecture developed is an ensemble of multiple BERT (multi-BERT) systems, one per each entity type, together with a generated dictionary and available off-the-shelf tools, Google Healthcare Natural Language API and GATECloud's Measurement Expression Annotator system, applied to the documents translated into English with word alignment from the neural machine translation tool, Microsoft Translator API. Our best system configuration (multi-BERT with a dictionary) achieves 85.51% and 80.04% F1 for Lenient and Exact metrics, respectively. Thus, the system ranked first out of 17 submissions from 7 teams that participated in this shared task. Our system also achieved the best Recall merging the previous predictions to the results given by English-translated texts and cross-lingual word alignment (83.87% Lenient match and 78.71% Exact match). The overall results demonstrate the potential of pre-trained language models and cross-lingual word alignment for limited corpus and low-resource NER in the clinical domain. |
spellingShingle | Suárez-Paniagua, V Dong, H Casey, A A multi-BERT hybrid system for named entity recognition in Spanish radiology reports |
title | A multi-BERT hybrid system for named entity recognition in Spanish radiology reports |
title_full | A multi-BERT hybrid system for named entity recognition in Spanish radiology reports |
title_fullStr | A multi-BERT hybrid system for named entity recognition in Spanish radiology reports |
title_full_unstemmed | A multi-BERT hybrid system for named entity recognition in Spanish radiology reports |
title_short | A multi-BERT hybrid system for named entity recognition in Spanish radiology reports |
title_sort | multi bert hybrid system for named entity recognition in spanish radiology reports |
work_keys_str_mv | AT suarezpaniaguav amultiberthybridsystemfornamedentityrecognitioninspanishradiologyreports AT dongh amultiberthybridsystemfornamedentityrecognitioninspanishradiologyreports AT caseya amultiberthybridsystemfornamedentityrecognitioninspanishradiologyreports AT suarezpaniaguav multiberthybridsystemfornamedentityrecognitioninspanishradiologyreports AT dongh multiberthybridsystemfornamedentityrecognitioninspanishradiologyreports AT caseya multiberthybridsystemfornamedentityrecognitioninspanishradiologyreports |