Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases

Abstract Background A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the...

Full description

Bibliographic Details
Main Author: Peera Liewlom
Format: Article
Language:English
Published: BMC 2023-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-023-02228-x
_version_ 1797769469294018560
author Peera Liewlom
author_facet Peera Liewlom
author_sort Peera Liewlom
collection DOAJ
description Abstract Background A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic. Methods We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset. Results The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated. Conclusion The results provide confidence for using the descriptive forest.
first_indexed 2024-03-12T21:09:28Z
format Article
id doaj.art-e2bc240a347d41bc8b19237fefba2f6b
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-03-12T21:09:28Z
publishDate 2023-07-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-e2bc240a347d41bc8b19237fefba2f6b2023-07-30T11:17:20ZengBMCBMC Medical Informatics and Decision Making1472-69472023-07-0123112510.1186/s12911-023-02228-xDescriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseasesPeera Liewlom0Department of Computer and Information Science, Faculty of Science and Engineering, Kasetsart UniversityAbstract Background A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic. Methods We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset. Results The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated. Conclusion The results provide confidence for using the descriptive forest.https://doi.org/10.1186/s12911-023-02228-xInformation ScienceMedical InformaticsData MiningCardiovascular Diseases
spellingShingle Peera Liewlom
Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
BMC Medical Informatics and Decision Making
Information Science
Medical Informatics
Data Mining
Cardiovascular Diseases
title Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_full Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_fullStr Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_full_unstemmed Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_short Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
title_sort descriptive forest experiments on a novel tree structure generalization method for describing cardiovascular diseases
topic Information Science
Medical Informatics
Data Mining
Cardiovascular Diseases
url https://doi.org/10.1186/s12911-023-02228-x
work_keys_str_mv AT peeraliewlom descriptiveforestexperimentsonanoveltreestructuregeneralizationmethodfordescribingcardiovasculardiseases