Automated ICD coding for coronary heart diseases by a deep learning method

Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Ba...

Full description

Bibliographic Details
Main Authors: Shuai Zhao, Xiaolin Diao, Yun Xia, Yanni Huo, Meng Cui, Yuxin Wang, Jing Yuan, Wei Zhao
Format: Article
Language:English
Published: Elsevier 2023-03-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844023012446
_version_ 1797851828974518272
author Shuai Zhao
Xiaolin Diao
Yun Xia
Yanni Huo
Meng Cui
Yuxin Wang
Jing Yuan
Wei Zhao
author_facet Shuai Zhao
Xiaolin Diao
Yun Xia
Yanni Huo
Meng Cui
Yuxin Wang
Jing Yuan
Wei Zhao
author_sort Shuai Zhao
collection DOAJ
description Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC–III–CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a BERT variant module responsible for encoding clinical text. In the module, we fine-tuned BERT variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that BERT variants generally cannot handle sequences containing more than 512 tokens. The second is a word2vec module for encoding code titles and the third is a label-attention module for integrating the embeddings of clinical text and code titles. In short, we named the method BW_att. We compared BW_att against some widely studied baselines, and found that BW_att performed best in most of the coding missions. Specifically, BW_att reached a Macro-F1 of 96.2% and a Macro-AUC of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC–III–CHD, BW_att reached a Macro-F1 of 40.5% and a Macro-AUC of 66.1%. Moreover, BW_att was capable of locating informative tokens from clinical text for predicting the target codes. In summary, BW_att can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice.
first_indexed 2024-04-09T19:24:09Z
format Article
id doaj.art-1b910c8cde52487b85a07edc1506496e
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-04-09T19:24:09Z
publishDate 2023-03-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-1b910c8cde52487b85a07edc1506496e2023-04-05T08:20:28ZengElsevierHeliyon2405-84402023-03-0193e14037Automated ICD coding for coronary heart diseases by a deep learning methodShuai Zhao0Xiaolin Diao1Yun Xia2Yanni Huo3Meng Cui4Yuxin Wang5Jing Yuan6Wei Zhao7Department of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaMedical Record Department, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaFuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China; Corresponding author.Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC–III–CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a BERT variant module responsible for encoding clinical text. In the module, we fine-tuned BERT variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that BERT variants generally cannot handle sequences containing more than 512 tokens. The second is a word2vec module for encoding code titles and the third is a label-attention module for integrating the embeddings of clinical text and code titles. In short, we named the method BW_att. We compared BW_att against some widely studied baselines, and found that BW_att performed best in most of the coding missions. Specifically, BW_att reached a Macro-F1 of 96.2% and a Macro-AUC of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC–III–CHD, BW_att reached a Macro-F1 of 40.5% and a Macro-AUC of 66.1%. Moreover, BW_att was capable of locating informative tokens from clinical text for predicting the target codes. In summary, BW_att can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice.http://www.sciencedirect.com/science/article/pii/S2405844023012446ICD codingCoronary heart diseasesDeep learningBERTInterpretability
spellingShingle Shuai Zhao
Xiaolin Diao
Yun Xia
Yanni Huo
Meng Cui
Yuxin Wang
Jing Yuan
Wei Zhao
Automated ICD coding for coronary heart diseases by a deep learning method
Heliyon
ICD coding
Coronary heart diseases
Deep learning
BERT
Interpretability
title Automated ICD coding for coronary heart diseases by a deep learning method
title_full Automated ICD coding for coronary heart diseases by a deep learning method
title_fullStr Automated ICD coding for coronary heart diseases by a deep learning method
title_full_unstemmed Automated ICD coding for coronary heart diseases by a deep learning method
title_short Automated ICD coding for coronary heart diseases by a deep learning method
title_sort automated icd coding for coronary heart diseases by a deep learning method
topic ICD coding
Coronary heart diseases
Deep learning
BERT
Interpretability
url http://www.sciencedirect.com/science/article/pii/S2405844023012446
work_keys_str_mv AT shuaizhao automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT xiaolindiao automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT yunxia automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT yannihuo automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT mengcui automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT yuxinwang automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT jingyuan automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT weizhao automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod