Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records

The purpose of this article is to illustrate an investigation of methods that can be effectively used to predict the data incompleteness of a dataset. Here, the investigators have conceptualized data incompleteness as a random variable, with the overall goal behind experimentation providing a 360-de...

Full description

Bibliographic Details
Main Authors: Varadraj P. Gurupur, Paniz Abedin, Sahar Hooshmand, Muhammed Shelleh
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/21/10746
_version_ 1797469288643166208
author Varadraj P. Gurupur
Paniz Abedin
Sahar Hooshmand
Muhammed Shelleh
author_facet Varadraj P. Gurupur
Paniz Abedin
Sahar Hooshmand
Muhammed Shelleh
author_sort Varadraj P. Gurupur
collection DOAJ
description The purpose of this article is to illustrate an investigation of methods that can be effectively used to predict the data incompleteness of a dataset. Here, the investigators have conceptualized data incompleteness as a random variable, with the overall goal behind experimentation providing a 360-degree view of this concept conceptualizing incompleteness of a dataset both as a continuous, discrete random variable depending on the aspect of the required analysis. During the course of the experiments, the investigators have identified Kolomogorov–Smirnov goodness of fit, Mielke distribution, and beta distributions as key methods to analyze the incompleteness of a dataset for the datasets used for experimentation. A comparison of these methods with a mixture density network was also performed. Overall, the investigators have provided key insights into the use of methods and algorithms that can be used to predict data incompleteness and have provided a pathway for further explorations and prediction of data incompleteness.
first_indexed 2024-03-09T19:19:16Z
format Article
id doaj.art-9ad8ec5f00434514b4fc63d53bb2b6f0
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T19:19:16Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-9ad8ec5f00434514b4fc63d53bb2b6f02023-11-24T03:32:09ZengMDPI AGApplied Sciences2076-34172022-10-0112211074610.3390/app122110746Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health RecordsVaradraj P. Gurupur0Paniz Abedin1Sahar Hooshmand2Muhammed Shelleh3School of Global Health Management and Informatics, University of Central Florida, Orlando, FL 32816, USADepartment of Computer Science, Florida Polytechnic University, Lakeland, FL 33805, USADepartment of Computer Science, California State University-Dominguez Hills, Carson, CA 90747, USADepartment of Computer Science, University of Central Florida, Orlando, FL 32816, USAThe purpose of this article is to illustrate an investigation of methods that can be effectively used to predict the data incompleteness of a dataset. Here, the investigators have conceptualized data incompleteness as a random variable, with the overall goal behind experimentation providing a 360-degree view of this concept conceptualizing incompleteness of a dataset both as a continuous, discrete random variable depending on the aspect of the required analysis. During the course of the experiments, the investigators have identified Kolomogorov–Smirnov goodness of fit, Mielke distribution, and beta distributions as key methods to analyze the incompleteness of a dataset for the datasets used for experimentation. A comparison of these methods with a mixture density network was also performed. Overall, the investigators have provided key insights into the use of methods and algorithms that can be used to predict data incompleteness and have provided a pathway for further explorations and prediction of data incompleteness.https://www.mdpi.com/2076-3417/12/21/10746health informaticsbig data modelsdata completenessprobability densityKolomogorov–Smirnov test
spellingShingle Varadraj P. Gurupur
Paniz Abedin
Sahar Hooshmand
Muhammed Shelleh
Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
Applied Sciences
health informatics
big data models
data completeness
probability density
Kolomogorov–Smirnov test
title Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
title_full Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
title_fullStr Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
title_full_unstemmed Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
title_short Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
title_sort analyzing the data completeness of patients records using a random variable approach to predict the incompleteness of electronic health records
topic health informatics
big data models
data completeness
probability density
Kolomogorov–Smirnov test
url https://www.mdpi.com/2076-3417/12/21/10746
work_keys_str_mv AT varadrajpgurupur analyzingthedatacompletenessofpatientsrecordsusingarandomvariableapproachtopredicttheincompletenessofelectronichealthrecords
AT panizabedin analyzingthedatacompletenessofpatientsrecordsusingarandomvariableapproachtopredicttheincompletenessofelectronichealthrecords
AT saharhooshmand analyzingthedatacompletenessofpatientsrecordsusingarandomvariableapproachtopredicttheincompletenessofelectronichealthrecords
AT muhammedshelleh analyzingthedatacompletenessofpatientsrecordsusingarandomvariableapproachtopredicttheincompletenessofelectronichealthrecords