Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset

Background: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, publi...

Full description

Bibliographic Details
Main Authors: Elizabeth Ford, Richard Tyler, Natalie Johnston, Vicki Spencer-Hughes, Graham Evans, Jon Elsom, Anotida Madzvamuse, Jacqueline Clay, Kate Gilchrist, Melanie Rees-Roberts
Format: Article
Language:English
Published: MDPI AG 2023-02-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/14/2/106
_version_ 1797620328725217280
author Elizabeth Ford
Richard Tyler
Natalie Johnston
Vicki Spencer-Hughes
Graham Evans
Jon Elsom
Anotida Madzvamuse
Jacqueline Clay
Kate Gilchrist
Melanie Rees-Roberts
author_facet Elizabeth Ford
Richard Tyler
Natalie Johnston
Vicki Spencer-Hughes
Graham Evans
Jon Elsom
Anotida Madzvamuse
Jacqueline Clay
Kate Gilchrist
Melanie Rees-Roberts
author_sort Elizabeth Ford
collection DOAJ
description Background: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, public health intelligence analysts based in local authorities (LAs) aimed to use the newly created “Sussex Integrated Dataset” (SID) for identifying cohorts of patients who are at risk of early onset multiple long-term conditions (MLTCs). Analysts from the LAs were among the first to have access to this new dataset. Methods: Data access was assured as the analysts were employed within joint data controller organisations and logged into the data via virtual machines following approval of a data access request. Analysts examined the demographics and medical history of patients against multiple external sources, identifying data quality issues and developing methods to establish true values for cases with multiple conflicting entries. Service use was plotted over timelines for individual patients. Results: Early evaluation of the data revealed multiple conflicting within-patient values for age, sex, ethnicity and date of death. This was partially resolved by creating a “demographic milestones” table, capturing demographic details for each patient for each year of the data available in the SID. Older data (≥5 y) was found to be sparse in events and diagnoses. Open-source code lists for defining long-term conditions were poor at identifying the expected number of patients, and bespoke code lists were developed by hand and validated against other sources of data. At the start, the age and sex distributions of patients submitted by GP practices were substantially different from those published by NHS Digital, and errors in data processing were identified and rectified. Conclusions: While new NHS linked datasets appear a promising resource for tracking multi-service use, MLTCs and health inequalities, substantial investment in data analysis and data architect time is necessary to ensure high enough quality data for meaningful analysis. Our team made conceptual progress in identifying the skills needed for programming analyses and understanding the types of questions which can be asked and answered reliably in these datasets.
first_indexed 2024-03-11T08:39:36Z
format Article
id doaj.art-5d76978d128c4802b0489aa17c1bcf79
institution Directory Open Access Journal
issn 2078-2489
language English
last_indexed 2024-03-11T08:39:36Z
publishDate 2023-02-01
publisher MDPI AG
record_format Article
series Information
spelling doaj.art-5d76978d128c4802b0489aa17c1bcf792023-11-16T21:12:24ZengMDPI AGInformation2078-24892023-02-0114210610.3390/info14020106Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated DatasetElizabeth Ford0Richard Tyler1Natalie Johnston2Vicki Spencer-Hughes3Graham Evans4Jon Elsom5Anotida Madzvamuse6Jacqueline Clay7Kate Gilchrist8Melanie Rees-Roberts9Department of Primary Care and Public Health, Brighton and Sussex Medical School, Room 104 Watson Building, Village Way, Falmer, Brighton BN1 9PH, UKWest Sussex County Council, Chichester PO19 1RQ, UKBrighton and Hove City Council, Brighton BN3 3BQ, UKEast Sussex County Council, Lewes BN7 1UE, UKEast Sussex County Council, Lewes BN7 1UE, UKEast Sussex Health Trust, St Leonards-on-Sea, East Sussex TN37 7PT, UKDepartment of Mathematics, University of Sussex, Brighton BN1 9PH, UKWest Sussex County Council, Chichester PO19 1RQ, UKBrighton and Hove City Council, Brighton BN3 3BQ, UKCentre for Health Services Studies, University of Kent, Canterbury CT2 7NZ, UKBackground: In the United Kingdom National Health Service (NHS), digital transformation programmes have resulted in the creation of pseudonymised linked datasets of patient-level medical records across all NHS and social care services. In the Southeast England counties of East and West Sussex, public health intelligence analysts based in local authorities (LAs) aimed to use the newly created “Sussex Integrated Dataset” (SID) for identifying cohorts of patients who are at risk of early onset multiple long-term conditions (MLTCs). Analysts from the LAs were among the first to have access to this new dataset. Methods: Data access was assured as the analysts were employed within joint data controller organisations and logged into the data via virtual machines following approval of a data access request. Analysts examined the demographics and medical history of patients against multiple external sources, identifying data quality issues and developing methods to establish true values for cases with multiple conflicting entries. Service use was plotted over timelines for individual patients. Results: Early evaluation of the data revealed multiple conflicting within-patient values for age, sex, ethnicity and date of death. This was partially resolved by creating a “demographic milestones” table, capturing demographic details for each patient for each year of the data available in the SID. Older data (≥5 y) was found to be sparse in events and diagnoses. Open-source code lists for defining long-term conditions were poor at identifying the expected number of patients, and bespoke code lists were developed by hand and validated against other sources of data. At the start, the age and sex distributions of patients submitted by GP practices were substantially different from those published by NHS Digital, and errors in data processing were identified and rectified. Conclusions: While new NHS linked datasets appear a promising resource for tracking multi-service use, MLTCs and health inequalities, substantial investment in data analysis and data architect time is necessary to ensure high enough quality data for meaningful analysis. Our team made conceptual progress in identifying the skills needed for programming analyses and understanding the types of questions which can be asked and answered reliably in these datasets.https://www.mdpi.com/2078-2489/14/2/106health dataelectronic health recordsdata linkagedata qualitypublic health
spellingShingle Elizabeth Ford
Richard Tyler
Natalie Johnston
Vicki Spencer-Hughes
Graham Evans
Jon Elsom
Anotida Madzvamuse
Jacqueline Clay
Kate Gilchrist
Melanie Rees-Roberts
Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
Information
health data
electronic health records
data linkage
data quality
public health
title Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
title_full Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
title_fullStr Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
title_full_unstemmed Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
title_short Challenges Encountered and Lessons Learned when Using a Novel Anonymised Linked Dataset of Health and Social Care Records for Public Health Intelligence: The Sussex Integrated Dataset
title_sort challenges encountered and lessons learned when using a novel anonymised linked dataset of health and social care records for public health intelligence the sussex integrated dataset
topic health data
electronic health records
data linkage
data quality
public health
url https://www.mdpi.com/2078-2489/14/2/106
work_keys_str_mv AT elizabethford challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT richardtyler challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT nataliejohnston challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT vickispencerhughes challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT grahamevans challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT jonelsom challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT anotidamadzvamuse challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT jacquelineclay challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT kategilchrist challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset
AT melaniereesroberts challengesencounteredandlessonslearnedwhenusinganovelanonymisedlinkeddatasetofhealthandsocialcarerecordsforpublichealthintelligencethesussexintegrateddataset