Data Mining Techniques for Iraqi Biochemical Dataset Analysis

This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, an...

Full description

Bibliographic Details
Main Authors: Sarah Sameer, Suhad Faisal Behadili
Format: Article
Language:Arabic
Published: College of Science for Women, University of Baghdad 2022-04-01
Series:Baghdad Science Journal
Subjects:
Online Access:https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/5407
_version_ 1818385191939342336
author Sarah Sameer
Suhad Faisal Behadili
author_facet Sarah Sameer
Suhad Faisal Behadili
author_sort Sarah Sameer
collection DOAJ
description This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB), and Support Vector Machine (SVM) techniques. CART gives clear results with high accuracy between the six supervised algorithms. It is worth noting that the preprocessing steps take remarkable efforts to handle this type of data, since its pure data set has so many null values of a ratio 94.8%, then it becomes 0% after achieving the preprocessing steps. Then, in order to apply CART algorithm, several determined tests were assumed as classes. The decision to select the tests which had been assumed as classes were depending on their acquired accuracy. Consequently, enabling the physicians to trace and connect the tests result with each other, which extends its impact on patients’ health.
first_indexed 2024-12-14T03:34:14Z
format Article
id doaj.art-b680201da6bf4db2bf076d900ef9736a
institution Directory Open Access Journal
issn 2078-8665
2411-7986
language Arabic
last_indexed 2024-12-14T03:34:14Z
publishDate 2022-04-01
publisher College of Science for Women, University of Baghdad
record_format Article
series Baghdad Science Journal
spelling doaj.art-b680201da6bf4db2bf076d900ef9736a2022-12-21T23:18:40ZaraCollege of Science for Women, University of BaghdadBaghdad Science Journal2078-86652411-79862022-04-0119210.21123/bsj.2022.19.2.0385Data Mining Techniques for Iraqi Biochemical Dataset AnalysisSarah Sameer0Suhad Faisal Behadili1Computer Science Department, College of Science, University of Baghdad, Baghdad, IraqComputer Science Department, College of Science, University of Baghdad, Baghdad, Iraq This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB), and Support Vector Machine (SVM) techniques. CART gives clear results with high accuracy between the six supervised algorithms. It is worth noting that the preprocessing steps take remarkable efforts to handle this type of data, since its pure data set has so many null values of a ratio 94.8%, then it becomes 0% after achieving the preprocessing steps. Then, in order to apply CART algorithm, several determined tests were assumed as classes. The decision to select the tests which had been assumed as classes were depending on their acquired accuracy. Consequently, enabling the physicians to trace and connect the tests result with each other, which extends its impact on patients’ health. https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/5407Biomedical, Classification And Regression Tree (CART), Data mining, Hierarchical clustering, K-means.
spellingShingle Sarah Sameer
Suhad Faisal Behadili
Data Mining Techniques for Iraqi Biochemical Dataset Analysis
Baghdad Science Journal
Biomedical, Classification And Regression Tree (CART), Data mining, Hierarchical clustering, K-means.
title Data Mining Techniques for Iraqi Biochemical Dataset Analysis
title_full Data Mining Techniques for Iraqi Biochemical Dataset Analysis
title_fullStr Data Mining Techniques for Iraqi Biochemical Dataset Analysis
title_full_unstemmed Data Mining Techniques for Iraqi Biochemical Dataset Analysis
title_short Data Mining Techniques for Iraqi Biochemical Dataset Analysis
title_sort data mining techniques for iraqi biochemical dataset analysis
topic Biomedical, Classification And Regression Tree (CART), Data mining, Hierarchical clustering, K-means.
url https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/5407
work_keys_str_mv AT sarahsameer dataminingtechniquesforiraqibiochemicaldatasetanalysis
AT suhadfaisalbehadili dataminingtechniquesforiraqibiochemicaldatasetanalysis