Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data

Background: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, whi...

Full description

Bibliographic Details
Main Authors: Elizabeth Ford, Philip Rooney, Peter Hurley, Seb Oliver, Stephen Bremner, Jackie Cassell
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-03-01
Series:Frontiers in Public Health
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fpubh.2020.00054/full
_version_ 1828860784902406144
author Elizabeth Ford
Philip Rooney
Peter Hurley
Seb Oliver
Stephen Bremner
Jackie Cassell
author_facet Elizabeth Ford
Philip Rooney
Peter Hurley
Seb Oliver
Stephen Bremner
Jackie Cassell
author_sort Elizabeth Ford
collection DOAJ
description Background: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias toward the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterized by under-diagnosis.Methods: Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic.Results: Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR.Conclusions: The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrate the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.
first_indexed 2024-12-13T02:47:08Z
format Article
id doaj.art-c8519a514ee54d9c98f867251e100358
institution Directory Open Access Journal
issn 2296-2565
language English
last_indexed 2024-12-13T02:47:08Z
publishDate 2020-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Public Health
spelling doaj.art-c8519a514ee54d9c98f867251e1003582022-12-22T00:02:10ZengFrontiers Media S.A.Frontiers in Public Health2296-25652020-03-01810.3389/fpubh.2020.00054477853Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical DataElizabeth Ford0Philip Rooney1Peter Hurley2Seb Oliver3Stephen Bremner4Jackie Cassell5Department of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United KingdomDepartment of Physics and Astronomy, University of Sussex, Brighton, United KingdomDepartment of Physics and Astronomy, University of Sussex, Brighton, United KingdomDepartment of Physics and Astronomy, University of Sussex, Brighton, United KingdomDepartment of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United KingdomDepartment of Primary Care and Public Health, Brighton and Sussex Medical School, Brighton, United KingdomBackground: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias toward the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterized by under-diagnosis.Methods: Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic.Results: Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR.Conclusions: The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrate the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.https://www.frontiersin.org/article/10.3389/fpubh.2020.00054/fullelectronic health recordspatient datadata qualitymissing dataBayesian analysismethodology
spellingShingle Elizabeth Ford
Philip Rooney
Peter Hurley
Seb Oliver
Stephen Bremner
Jackie Cassell
Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
Frontiers in Public Health
electronic health records
patient data
data quality
missing data
Bayesian analysis
methodology
title Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_full Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_fullStr Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_full_unstemmed Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_short Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_sort can the use of bayesian analysis methods correct for incompleteness in electronic health records diagnosis data development of a novel method using simulated and real life clinical data
topic electronic health records
patient data
data quality
missing data
Bayesian analysis
methodology
url https://www.frontiersin.org/article/10.3389/fpubh.2020.00054/full
work_keys_str_mv AT elizabethford cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT philiprooney cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT peterhurley cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT seboliver cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT stephenbremner cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT jackiecassell cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata