Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases

Objectives To explore whether UK primary care databases arising from two different software systems can be feasibly combined, by comparing rates of Huntington’s disease (HD, which is rare) and 14 common cancers in the two databases, as well as characteristics of people with these conditions.Design D...

Full description

Bibliographic Details
Main Authors: Rachael Williams, Krishnan Bhaskaran, Ian J Douglas, Daniel Dedman
Format: Article
Language:English
Published: BMJ Publishing Group 2024-02-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/14/2/e070258.full
_version_ 1797306839804674048
author Rachael Williams
Krishnan Bhaskaran
Ian J Douglas
Daniel Dedman
author_facet Rachael Williams
Krishnan Bhaskaran
Ian J Douglas
Daniel Dedman
author_sort Rachael Williams
collection DOAJ
description Objectives To explore whether UK primary care databases arising from two different software systems can be feasibly combined, by comparing rates of Huntington’s disease (HD, which is rare) and 14 common cancers in the two databases, as well as characteristics of people with these conditions.Design Descriptive study.Setting Primary care electronic health records from Clinical Practice Research Datalink (CPRD) GOLD and CPRD Aurum databases, with linked hospital admission and death registration data.Participants 4986 patients with HD and 1 294 819 with an incident cancer between 1990 and 2019.Primary and secondary outcome measures Incidence and prevalence of HD by calendar period, age group and region, and annual age-standardised incidence of 14 common cancers in each database, and in a subset of ‘overlapping’ practices which contributed to both databases. Characteristics of patients with HD or incident cancer: medical history, recent prescribing, healthcare contacts and database follow-up.Results Incidence and prevalence of HD were slightly higher in CPRD GOLD than CPRD Aurum, but with similar trends over time. Cancer incidence in the two databases differed between 1990 and 2000, but converged and was very similar thereafter. Participants in each database were most similar in terms of medical history (median standardised difference, MSD 0.03 (IQR 0.01–0.03)), recent prescribing (MSD 0.06 (0.03–0.10)) and demographics and general health variables (MSD 0.05 (0.01–0.09)). Larger differences were seen for healthcare contacts (MSD 0.27 (0.10–0.41)), and database follow-up (MSD 0.39 (0.19–0.56)).Conclusions Differences in cancer incidence trends between 1990 and 2000 may relate to use of a practice-level data quality filter (the ‘up-to-standard’ date) in CPRD GOLD only. As well as the impact of data curation methods, differences in underlying data models can make it more challenging to define exactly equivalent clinical concepts in each database. Researchers should be aware of these potential sources of variability when planning combined database studies and interpreting results.
first_indexed 2024-03-08T00:47:38Z
format Article
id doaj.art-fdedd0166d2f47b9be8bb2f4edea25ef
institution Directory Open Access Journal
issn 2044-6055
language English
last_indexed 2024-03-08T00:47:38Z
publishDate 2024-02-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open
spelling doaj.art-fdedd0166d2f47b9be8bb2f4edea25ef2024-02-15T05:50:09ZengBMJ Publishing GroupBMJ Open2044-60552024-02-0114210.1136/bmjopen-2022-070258Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databasesRachael Williams0Krishnan Bhaskaran1Ian J Douglas2Daniel Dedman3Clinical Practice Research Datalink, Medicines and Healthcare Products Regulatory Agency, London, UKEpidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UKEpidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UKClinical Practice Research Datalink, Medicines and Healthcare Products Regulatory Agency, London, UKObjectives To explore whether UK primary care databases arising from two different software systems can be feasibly combined, by comparing rates of Huntington’s disease (HD, which is rare) and 14 common cancers in the two databases, as well as characteristics of people with these conditions.Design Descriptive study.Setting Primary care electronic health records from Clinical Practice Research Datalink (CPRD) GOLD and CPRD Aurum databases, with linked hospital admission and death registration data.Participants 4986 patients with HD and 1 294 819 with an incident cancer between 1990 and 2019.Primary and secondary outcome measures Incidence and prevalence of HD by calendar period, age group and region, and annual age-standardised incidence of 14 common cancers in each database, and in a subset of ‘overlapping’ practices which contributed to both databases. Characteristics of patients with HD or incident cancer: medical history, recent prescribing, healthcare contacts and database follow-up.Results Incidence and prevalence of HD were slightly higher in CPRD GOLD than CPRD Aurum, but with similar trends over time. Cancer incidence in the two databases differed between 1990 and 2000, but converged and was very similar thereafter. Participants in each database were most similar in terms of medical history (median standardised difference, MSD 0.03 (IQR 0.01–0.03)), recent prescribing (MSD 0.06 (0.03–0.10)) and demographics and general health variables (MSD 0.05 (0.01–0.09)). Larger differences were seen for healthcare contacts (MSD 0.27 (0.10–0.41)), and database follow-up (MSD 0.39 (0.19–0.56)).Conclusions Differences in cancer incidence trends between 1990 and 2000 may relate to use of a practice-level data quality filter (the ‘up-to-standard’ date) in CPRD GOLD only. As well as the impact of data curation methods, differences in underlying data models can make it more challenging to define exactly equivalent clinical concepts in each database. Researchers should be aware of these potential sources of variability when planning combined database studies and interpreting results.https://bmjopen.bmj.com/content/14/2/e070258.full
spellingShingle Rachael Williams
Krishnan Bhaskaran
Ian J Douglas
Daniel Dedman
Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases
BMJ Open
title Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases
title_full Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases
title_fullStr Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases
title_full_unstemmed Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases
title_short Pooling of primary care electronic health record (EHR) data on Huntington’s disease (HD) and cancer: establishing comparability of two large UK databases
title_sort pooling of primary care electronic health record ehr data on huntington s disease hd and cancer establishing comparability of two large uk databases
url https://bmjopen.bmj.com/content/14/2/e070258.full
work_keys_str_mv AT rachaelwilliams poolingofprimarycareelectronichealthrecordehrdataonhuntingtonsdiseasehdandcancerestablishingcomparabilityoftwolargeukdatabases
AT krishnanbhaskaran poolingofprimarycareelectronichealthrecordehrdataonhuntingtonsdiseasehdandcancerestablishingcomparabilityoftwolargeukdatabases
AT ianjdouglas poolingofprimarycareelectronichealthrecordehrdataonhuntingtonsdiseasehdandcancerestablishingcomparabilityoftwolargeukdatabases
AT danieldedman poolingofprimarycareelectronichealthrecordehrdataonhuntingtonsdiseasehdandcancerestablishingcomparabilityoftwolargeukdatabases