Modelling Diagnostic Validity Estimates from Administrative Health Data

ABSTRACT Objectives Validation studies compare diagnostic information in linked administrative and reference (i.e., gold standard) data; they are an essential tool to develop accurate case definitions, the rules used to identify individuals in administrative data with a specific health condition....

Full description

Bibliographic Details
Main Authors: Kristine Kroeker, Lisa M. Lix, Depeng Jiang, Saman Muthukumarana
Format: Article
Language:English
Published: Swansea University 2017-04-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/171
_version_ 1797431109242322944
author Kristine Kroeker
Lisa M. Lix
Depeng Jiang
Saman Muthukumarana
author_facet Kristine Kroeker
Lisa M. Lix
Depeng Jiang
Saman Muthukumarana
author_sort Kristine Kroeker
collection DOAJ
description ABSTRACT Objectives Validation studies compare diagnostic information in linked administrative and reference (i.e., gold standard) data; they are an essential tool to develop accurate case definitions, the rules used to identify individuals in administrative data with a specific health condition. Validation studies often estimate the accuracy of multiple case definitions, in order to identify the data features (e.g., diagnosis codes, type of data source) that influence accuracy estimates. Descriptive analyses are commonly used to select a case definition(s) with the greatest accuracy estimates, but fail to account for uncertainty in accuracy estimates. The objectives were to: (1) compare the performance of regression-based approaches to test for differences in diagnostic accuracy estimates, and (2) demonstrate how to apply and use these models. Approach Computer simulation was used to compare three regression models: (a) univariate fixed-effects models applied to estimates of sensitivity and specificity; (b) univariate fixed-effects model for Youden's index, the average of sensitivity and the complement of specificity; and (c) bivariate random-effects joint model of sensitivity and specificity. The simulations varied the means and variances of sensitivity and specificity, the correlation between these parameters, and the number of case definitions. Performance was compared using: (a) bias (i.e., difference between estimated and observed mean), (b) mean squared error (MSE), the sum of the estimated variance and bias squared, and (c) 95% confidence interval (CI) coverage, the proportion of times the population mean is contained in the 95% CI. For objective 2, we applied the models to estimates of diagnostic accuracy from a published rheumatoid arthritis (RA) validation study with 61 case definitions. Results Univariate models of sensitivity and specificity had lower bias than the bivariate model (e.g., univariate=1.8%, bivariate=2.2%). The bivariate model had a smaller MSE than the univariate models when sample size was large and there was a small correlation between sensitivity and specificity (e.g., univariate=3.4%, bivariate=2.6%). Across all scenarios, the univariate model for Youden’s index showed small bias (average=2.4%) and MSE (average=2.1%). For objective 2, the univariate models of sensitivity, specificity, and Youden’s index revealed multiple case definition features that were associated with estimates of RA diagnostic accuracy: 1+ diagnosis in hospital records, >1 diagnosis in physician claims, and 1+ diagnoses by a specialist physician. Conclusions We recommend the bivariate model when a validation study contains a large number of case definitions. When the data contain a small number of case definitions, univariate models are recommended.
first_indexed 2024-03-09T09:37:12Z
format Article
id doaj.art-d30764b4dd5f46f084cbe7ae9dca0195
institution Directory Open Access Journal
issn 2399-4908
language English
last_indexed 2024-03-09T09:37:12Z
publishDate 2017-04-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj.art-d30764b4dd5f46f084cbe7ae9dca01952023-12-02T01:20:52ZengSwansea UniversityInternational Journal of Population Data Science2399-49082017-04-011110.23889/ijpds.v1i1.171171Modelling Diagnostic Validity Estimates from Administrative Health DataKristine Kroeker0Lisa M. Lix1Depeng Jiang2Saman Muthukumarana3University of ManitobaUniversity of ManitobaUniversity of ManitobaUniversity of ManitobaABSTRACT Objectives Validation studies compare diagnostic information in linked administrative and reference (i.e., gold standard) data; they are an essential tool to develop accurate case definitions, the rules used to identify individuals in administrative data with a specific health condition. Validation studies often estimate the accuracy of multiple case definitions, in order to identify the data features (e.g., diagnosis codes, type of data source) that influence accuracy estimates. Descriptive analyses are commonly used to select a case definition(s) with the greatest accuracy estimates, but fail to account for uncertainty in accuracy estimates. The objectives were to: (1) compare the performance of regression-based approaches to test for differences in diagnostic accuracy estimates, and (2) demonstrate how to apply and use these models. Approach Computer simulation was used to compare three regression models: (a) univariate fixed-effects models applied to estimates of sensitivity and specificity; (b) univariate fixed-effects model for Youden's index, the average of sensitivity and the complement of specificity; and (c) bivariate random-effects joint model of sensitivity and specificity. The simulations varied the means and variances of sensitivity and specificity, the correlation between these parameters, and the number of case definitions. Performance was compared using: (a) bias (i.e., difference between estimated and observed mean), (b) mean squared error (MSE), the sum of the estimated variance and bias squared, and (c) 95% confidence interval (CI) coverage, the proportion of times the population mean is contained in the 95% CI. For objective 2, we applied the models to estimates of diagnostic accuracy from a published rheumatoid arthritis (RA) validation study with 61 case definitions. Results Univariate models of sensitivity and specificity had lower bias than the bivariate model (e.g., univariate=1.8%, bivariate=2.2%). The bivariate model had a smaller MSE than the univariate models when sample size was large and there was a small correlation between sensitivity and specificity (e.g., univariate=3.4%, bivariate=2.6%). Across all scenarios, the univariate model for Youden’s index showed small bias (average=2.4%) and MSE (average=2.1%). For objective 2, the univariate models of sensitivity, specificity, and Youden’s index revealed multiple case definition features that were associated with estimates of RA diagnostic accuracy: 1+ diagnosis in hospital records, >1 diagnosis in physician claims, and 1+ diagnoses by a specialist physician. Conclusions We recommend the bivariate model when a validation study contains a large number of case definitions. When the data contain a small number of case definitions, univariate models are recommended.https://ijpds.org/article/view/171
spellingShingle Kristine Kroeker
Lisa M. Lix
Depeng Jiang
Saman Muthukumarana
Modelling Diagnostic Validity Estimates from Administrative Health Data
International Journal of Population Data Science
title Modelling Diagnostic Validity Estimates from Administrative Health Data
title_full Modelling Diagnostic Validity Estimates from Administrative Health Data
title_fullStr Modelling Diagnostic Validity Estimates from Administrative Health Data
title_full_unstemmed Modelling Diagnostic Validity Estimates from Administrative Health Data
title_short Modelling Diagnostic Validity Estimates from Administrative Health Data
title_sort modelling diagnostic validity estimates from administrative health data
url https://ijpds.org/article/view/171
work_keys_str_mv AT kristinekroeker modellingdiagnosticvalidityestimatesfromadministrativehealthdata
AT lisamlix modellingdiagnosticvalidityestimatesfromadministrativehealthdata
AT depengjiang modellingdiagnosticvalidityestimatesfromadministrativehealthdata
AT samanmuthukumarana modellingdiagnosticvalidityestimatesfromadministrativehealthdata