Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases

Abstract Background A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the...

Full description

Bibliographic Details
Main Author:	Peera Liewlom
Format:	Article
Language:	English
Published:	BMC 2023-07-01
Series:	BMC Medical Informatics and Decision Making
Subjects:	Information Science Medical Informatics Data Mining Cardiovascular Diseases
Online Access:	https://doi.org/10.1186/s12911-023-02228-x

_version_	1797769469294018560
author	Peera Liewlom
author_facet	Peera Liewlom
author_sort	Peera Liewlom
collection	DOAJ
description	Abstract Background A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic. Methods We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset. Results The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated. Conclusion The results provide confidence for using the descriptive forest.
first_indexed	2024-03-12T21:09:28Z
format	Article
id	doaj.art-e2bc240a347d41bc8b19237fefba2f6b
institution	Directory Open Access Journal
issn	1472-6947
language	English
last_indexed	2024-03-12T21:09:28Z
publishDate	2023-07-01
publisher	BMC
record_format	Article
series	BMC Medical Informatics and Decision Making
spelling	doaj.art-e2bc240a347d41bc8b19237fefba2f6b2023-07-30T11:17:20ZengBMCBMC Medical Informatics and Decision Making1472-69472023-07-0123112510.1186/s12911-023-02228-xDescriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseasesPeera Liewlom0Department of Computer and Information Science, Faculty of Science and Engineering, Kasetsart UniversityAbstract Background A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic. Methods We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset. Results The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated. Conclusion The results provide confidence for using the descriptive forest.https://doi.org/10.1186/s12911-023-02228-xInformation ScienceMedical InformaticsData MiningCardiovascular Diseases
spellingShingle	Peera Liewlom Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases BMC Medical Informatics and Decision Making Information Science Medical Informatics Data Mining Cardiovascular Diseases
title	Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_full	Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_fullStr	Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_full_unstemmed	Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_short	Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_sort	descriptive forest experiments on a novel tree structure generalization method for describing cardiovascular diseases
topic	Information Science Medical Informatics Data Mining Cardiovascular Diseases
url	https://doi.org/10.1186/s12911-023-02228-x
work_keys_str_mv	AT peeraliewlom descriptiveforestexperimentsonanoveltreestructuregeneralizationmethodfordescribingcardiovasculardiseases

Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases

Similar Items