Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity

Abstract Background Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. How...

Full description

Bibliographic Details
Main Authors: Zheng Jia, Xudong Lu, Huilong Duan, Haomin Li
Format: Article
Language:English
Published: BMC 2019-04-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-019-0807-y
_version_ 1819168769361051648
author Zheng Jia
Xudong Lu
Huilong Duan
Haomin Li
author_facet Zheng Jia
Xudong Lu
Huilong Duan
Haomin Li
author_sort Zheng Jia
collection DOAJ
description Abstract Background Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied. Methods In this paper the most widely used taxonomic clinical concepts system, ICD-10, was studied as a representative taxonomy. The distance between ICD-10-coded diagnosis sets is an integrated estimation of the information content of each concept, the similarity between each pairwise concepts and the similarity between the sets of concepts. We proposed a novel method at the set-level similarity to calculate the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity. A real-world clinical dataset with ICD-10 coded diagnoses and hospital length of stay (HLOS) information was used to evaluate the performance of various algorithms and their combinations in predicting whether a patient need long-term hospitalization or not. Four subpopulation prototypes that were defined based on age and HLOS with different diagnoses set sizes were used as the target for similarity analysis. The F-score was used to evaluate the performance of different algorithms by controlling other factors. We also evaluated the effect of prototype set size on prediction precision. Results The results identified the strengths and weaknesses of different algorithms to compute information content, code-level similarity and set-level similarity under different contexts, such as set size and concept set background. The minimum weighted bipartite matching approach, which has not been fully recognized previously showed unique advantages in measuring the concepts-based patient similarity. Conclusions This study provides a systematic benchmark evaluation of previous algorithms and novel algorithms used in taxonomic concepts-based patient similarity, and it provides the basis for selecting appropriate methods under different clinical scenarios.
first_indexed 2024-12-22T19:08:52Z
format Article
id doaj.art-ed1a95217fec42c7a07de8ce1201100e
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-22T19:08:52Z
publishDate 2019-04-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-ed1a95217fec42c7a07de8ce1201100e2022-12-21T18:15:44ZengBMCBMC Medical Informatics and Decision Making1472-69472019-04-0119111110.1186/s12911-019-0807-yUsing the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarityZheng Jia0Xudong Lu1Huilong Duan2Haomin Li3College of Biomedical Engineering and Instrument Science, Zhejiang UniversityCollege of Biomedical Engineering and Instrument Science, Zhejiang UniversityCollege of Biomedical Engineering and Instrument Science, Zhejiang UniversityThe Children’s Hospital, Zhejiang University School of MedicineAbstract Background Many clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied. Methods In this paper the most widely used taxonomic clinical concepts system, ICD-10, was studied as a representative taxonomy. The distance between ICD-10-coded diagnosis sets is an integrated estimation of the information content of each concept, the similarity between each pairwise concepts and the similarity between the sets of concepts. We proposed a novel method at the set-level similarity to calculate the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity. A real-world clinical dataset with ICD-10 coded diagnoses and hospital length of stay (HLOS) information was used to evaluate the performance of various algorithms and their combinations in predicting whether a patient need long-term hospitalization or not. Four subpopulation prototypes that were defined based on age and HLOS with different diagnoses set sizes were used as the target for similarity analysis. The F-score was used to evaluate the performance of different algorithms by controlling other factors. We also evaluated the effect of prototype set size on prediction precision. Results The results identified the strengths and weaknesses of different algorithms to compute information content, code-level similarity and set-level similarity under different contexts, such as set size and concept set background. The minimum weighted bipartite matching approach, which has not been fully recognized previously showed unique advantages in measuring the concepts-based patient similarity. Conclusions This study provides a systematic benchmark evaluation of previous algorithms and novel algorithms used in taxonomic concepts-based patient similarity, and it provides the basis for selecting appropriate methods under different clinical scenarios.http://link.springer.com/article/10.1186/s12911-019-0807-yTaxonomic conceptPatient similarityConcept similarityPredictive modelICD-10Data visualization
spellingShingle Zheng Jia
Xudong Lu
Huilong Duan
Haomin Li
Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
BMC Medical Informatics and Decision Making
Taxonomic concept
Patient similarity
Concept similarity
Predictive model
ICD-10
Data visualization
title Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_full Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_fullStr Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_full_unstemmed Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_short Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
title_sort using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity
topic Taxonomic concept
Patient similarity
Concept similarity
Predictive model
ICD-10
Data visualization
url http://link.springer.com/article/10.1186/s12911-019-0807-y
work_keys_str_mv AT zhengjia usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity
AT xudonglu usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity
AT huilongduan usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity
AT haominli usingthedistancebetweensetsofhierarchicaltaxonomicclinicalconceptstomeasurepatientsimilarity