Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers

As a key modifiable risk factor, alcohol consumption is clinically crucial information that allows medical professionals to further understand their patients’ medical conditions and suggest appropriate lifestyle modifying interventions. However, identifying alcohol-related information fro...

Full description

Bibliographic Details
Main Authors: Han Kyul Kim, Yujin Park, Yeju Park, Eunji Choi, Sodam Kim, Hahyun You, Ye Seul Bae
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10044673/
_version_ 1797900210844729344
author Han Kyul Kim
Yujin Park
Yeju Park
Eunji Choi
Sodam Kim
Hahyun You
Ye Seul Bae
author_facet Han Kyul Kim
Yujin Park
Yeju Park
Eunji Choi
Sodam Kim
Hahyun You
Ye Seul Bae
author_sort Han Kyul Kim
collection DOAJ
description As a key modifiable risk factor, alcohol consumption is clinically crucial information that allows medical professionals to further understand their patients’ medical conditions and suggest appropriate lifestyle modifying interventions. However, identifying alcohol-related information from unstructured free-text clinical notes is often challenging. Not only are the formats of the notes inconsistent, but they also include a massive amount of non-alcohol-related information. Furthermore, for medical institutions outside of English-speaking countries, these clinical notes contain both a mixture of English and local languages, inducing additional difficulty in the extraction. Thanks to the increasing availability of electronic medical record (EMR), several previous works explored the idea of using natural language processing (NLP) to train machine learning models that automatically identify alcohol-related information from unstructured clinical notes. However, all these previous works are limited to English clinical notes, thereby able to leverage various large-scale external ontologies during the text preprocessing. Furthermore, they rely on simple NLP techniques such as the bag-of-words models that suffer from high dimensionality and out-of-vocabulary issues. Addressing these issues, we adopt fine-tuning multilingual transformers. By leveraging their linguistically rich contextual information learned during their pre-training, we are able to extract alcohol-related information from unstructured clinical notes without preprocessing the clinical notes on any external ontologies. Furthermore, our work is the first to explore the use of transformers in bilingual clinical notes to extract alcohol-related information. Even with minimal text preprocessing, we achieve extraction accuracy of 84.70% in terms of macro F-1 score.
first_indexed 2024-04-10T08:42:25Z
format Article
id doaj.art-412d1c65c02e473286742f8f1bb21502
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T08:42:25Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-412d1c65c02e473286742f8f1bb215022023-02-23T00:00:27ZengIEEEIEEE Access2169-35362023-01-0111160661607510.1109/ACCESS.2023.324552310044673Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual TransformersHan Kyul Kim0https://orcid.org/0000-0002-4854-7211Yujin Park1https://orcid.org/0000-0002-7936-9307Yeju Park2Eunji Choi3Sodam Kim4Hahyun You5Ye Seul Bae6https://orcid.org/0000-0003-0763-5458Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA, USADepartment of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South KoreaOffice of Hospital Information, Seoul National University Hospital, Seoul, South KoreaOffice of Hospital Information, Seoul National University Hospital, Seoul, South KoreaOffice of Hospital Information, Seoul National University Hospital, Seoul, South KoreaDepartment of Biomedical Engineering, Seoul National University College of Medicine, Seoul, South KoreaOffice of Hospital Information, Seoul National University Hospital, Seoul, South KoreaAs a key modifiable risk factor, alcohol consumption is clinically crucial information that allows medical professionals to further understand their patients’ medical conditions and suggest appropriate lifestyle modifying interventions. However, identifying alcohol-related information from unstructured free-text clinical notes is often challenging. Not only are the formats of the notes inconsistent, but they also include a massive amount of non-alcohol-related information. Furthermore, for medical institutions outside of English-speaking countries, these clinical notes contain both a mixture of English and local languages, inducing additional difficulty in the extraction. Thanks to the increasing availability of electronic medical record (EMR), several previous works explored the idea of using natural language processing (NLP) to train machine learning models that automatically identify alcohol-related information from unstructured clinical notes. However, all these previous works are limited to English clinical notes, thereby able to leverage various large-scale external ontologies during the text preprocessing. Furthermore, they rely on simple NLP techniques such as the bag-of-words models that suffer from high dimensionality and out-of-vocabulary issues. Addressing these issues, we adopt fine-tuning multilingual transformers. By leveraging their linguistically rich contextual information learned during their pre-training, we are able to extract alcohol-related information from unstructured clinical notes without preprocessing the clinical notes on any external ontologies. Furthermore, our work is the first to explore the use of transformers in bilingual clinical notes to extract alcohol-related information. Even with minimal text preprocessing, we achieve extraction accuracy of 84.70% in terms of macro F-1 score.https://ieeexplore.ieee.org/document/10044673/Clinical informaticsalcohol information extractionnatural language processinginformation extraction from clinical notesmultilingual transformers
spellingShingle Han Kyul Kim
Yujin Park
Yeju Park
Eunji Choi
Sodam Kim
Hahyun You
Ye Seul Bae
Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers
IEEE Access
Clinical informatics
alcohol information extraction
natural language processing
information extraction from clinical notes
multilingual transformers
title Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers
title_full Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers
title_fullStr Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers
title_full_unstemmed Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers
title_short Identifying Alcohol-Related Information From Unstructured Bilingual Clinical Notes With Multilingual Transformers
title_sort identifying alcohol related information from unstructured bilingual clinical notes with multilingual transformers
topic Clinical informatics
alcohol information extraction
natural language processing
information extraction from clinical notes
multilingual transformers
url https://ieeexplore.ieee.org/document/10044673/
work_keys_str_mv AT hankyulkim identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers
AT yujinpark identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers
AT yejupark identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers
AT eunjichoi identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers
AT sodamkim identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers
AT hahyunyou identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers
AT yeseulbae identifyingalcoholrelatedinformationfromunstructuredbilingualclinicalnoteswithmultilingualtransformers