Automated ICD coding for coronary heart diseases by a deep learning method

Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Ba...

Full description

Bibliographic Details
Main Authors:	Shuai Zhao, Xiaolin Diao, Yun Xia, Yanni Huo, Meng Cui, Yuxin Wang, Jing Yuan, Wei Zhao
Format:	Article
Language:	English
Published:	Elsevier 2023-03-01
Series:	Heliyon
Subjects:	ICD coding Coronary heart diseases Deep learning BERT Interpretability
Online Access:	http://www.sciencedirect.com/science/article/pii/S2405844023012446

_version_	1797851828974518272
author	Shuai Zhao Xiaolin Diao Yun Xia Yanni Huo Meng Cui Yuxin Wang Jing Yuan Wei Zhao
author_facet	Shuai Zhao Xiaolin Diao Yun Xia Yanni Huo Meng Cui Yuxin Wang Jing Yuan Wei Zhao
author_sort	Shuai Zhao
collection	DOAJ
description	Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC–III–CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a BERT variant module responsible for encoding clinical text. In the module, we fine-tuned BERT variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that BERT variants generally cannot handle sequences containing more than 512 tokens. The second is a word2vec module for encoding code titles and the third is a label-attention module for integrating the embeddings of clinical text and code titles. In short, we named the method BW_att. We compared BW_att against some widely studied baselines, and found that BW_att performed best in most of the coding missions. Specifically, BW_att reached a Macro-F1 of 96.2% and a Macro-AUC of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC–III–CHD, BW_att reached a Macro-F1 of 40.5% and a Macro-AUC of 66.1%. Moreover, BW_att was capable of locating informative tokens from clinical text for predicting the target codes. In summary, BW_att can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice.
first_indexed	2024-04-09T19:24:09Z
format	Article
id	doaj.art-1b910c8cde52487b85a07edc1506496e
institution	Directory Open Access Journal
issn	2405-8440
language	English
last_indexed	2024-04-09T19:24:09Z
publishDate	2023-03-01
publisher	Elsevier
record_format	Article
series	Heliyon
spelling	doaj.art-1b910c8cde52487b85a07edc1506496e2023-04-05T08:20:28ZengElsevierHeliyon2405-84402023-03-0193e14037Automated ICD coding for coronary heart diseases by a deep learning methodShuai Zhao0Xiaolin Diao1Yun Xia2Yanni Huo3Meng Cui4Yuxin Wang5Jing Yuan6Wei Zhao7Department of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaMedical Record Department, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaDepartment of Information Center, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, ChinaFuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China; Corresponding author.Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC–III–CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a BERT variant module responsible for encoding clinical text. In the module, we fine-tuned BERT variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that BERT variants generally cannot handle sequences containing more than 512 tokens. The second is a word2vec module for encoding code titles and the third is a label-attention module for integrating the embeddings of clinical text and code titles. In short, we named the method BW_att. We compared BW_att against some widely studied baselines, and found that BW_att performed best in most of the coding missions. Specifically, BW_att reached a Macro-F1 of 96.2% and a Macro-AUC of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC–III–CHD, BW_att reached a Macro-F1 of 40.5% and a Macro-AUC of 66.1%. Moreover, BW_att was capable of locating informative tokens from clinical text for predicting the target codes. In summary, BW_att can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice.http://www.sciencedirect.com/science/article/pii/S2405844023012446ICD codingCoronary heart diseasesDeep learningBERTInterpretability
spellingShingle	Shuai Zhao Xiaolin Diao Yun Xia Yanni Huo Meng Cui Yuxin Wang Jing Yuan Wei Zhao Automated ICD coding for coronary heart diseases by a deep learning method Heliyon ICD coding Coronary heart diseases Deep learning BERT Interpretability
title	Automated ICD coding for coronary heart diseases by a deep learning method
title_full	Automated ICD coding for coronary heart diseases by a deep learning method
title_fullStr	Automated ICD coding for coronary heart diseases by a deep learning method
title_full_unstemmed	Automated ICD coding for coronary heart diseases by a deep learning method
title_short	Automated ICD coding for coronary heart diseases by a deep learning method
title_sort	automated icd coding for coronary heart diseases by a deep learning method
topic	ICD coding Coronary heart diseases Deep learning BERT Interpretability
url	http://www.sciencedirect.com/science/article/pii/S2405844023012446
work_keys_str_mv	AT shuaizhao automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT xiaolindiao automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT yunxia automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT yannihuo automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT mengcui automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT yuxinwang automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT jingyuan automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod AT weizhao automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod

Automated ICD coding for coronary heart diseases by a deep learning method

Similar Items