Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test

A Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data i...

Full description

Bibliographic Details
Main Authors: Virdee, P, Fuller, A, Jacobs, M, Holt, T, Birks, J
Format: Journal article
Language:English
Published: SpringerOpen 2020
_version_ 1797058548797014016
author Virdee, P
Fuller, A
Jacobs, M
Holt, T
Birks, J
author_facet Virdee, P
Fuller, A
Jacobs, M
Holt, T
Birks, J
author_sort Virdee, P
collection OXFORD
description A Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data in one EHR. The Test dataset from the Clinical Research Practice Datalink (CPRD) was accessed, which contains results of tests performed in primary care, such as FBC blood tests. Medical codes and entity codes, two coding systems used within CPRD to identify FBC records, were compared, with levels of mismatched coding, and number that could be rectified reported. The reliability of units of measurement are also described and missing data discussed. There were 14 entity codes and 138 medical codes for the FBC in the data. Medical and entity codes consistently corresponded to the same FBC parameter in 95.2% (n = 217,752,448) of parameters. In the 4.8% (n = 10,955,006) mismatches, the most common parameter rectified was mean platelet volume (n = 2,041,360) and 1,191,540 could not be rectified and were removed. Units of measurement were often either missing, partially entered, or did not appear to correspond to the blood value. The final dataset contained 16,537,017 FBC tests. Applying mathematical equations to derive some missing parameters in these FBCs resulted in 15 of 20 parameters available per FBC on average, with 0.3% of FBCs having all 20 parameters. Performing data quality checks can help to understand the extent of any issues in the dataset. We emphasise balancing large sample sizes with reliability of the data.
first_indexed 2024-03-06T19:51:51Z
format Journal article
id oxford-uuid:243a78af-70b0-470b-931c-221781873403
institution University of Oxford
language English
last_indexed 2024-03-06T19:51:51Z
publishDate 2020
publisher SpringerOpen
record_format dspace
spelling oxford-uuid:243a78af-70b0-470b-931c-2217818734032022-03-26T11:48:54ZAssessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood testJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:243a78af-70b0-470b-931c-221781873403EnglishSymplectic ElementsSpringerOpen2020Virdee, PFuller, AJacobs, MHolt, TBirks, JA Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data in one EHR. The Test dataset from the Clinical Research Practice Datalink (CPRD) was accessed, which contains results of tests performed in primary care, such as FBC blood tests. Medical codes and entity codes, two coding systems used within CPRD to identify FBC records, were compared, with levels of mismatched coding, and number that could be rectified reported. The reliability of units of measurement are also described and missing data discussed. There were 14 entity codes and 138 medical codes for the FBC in the data. Medical and entity codes consistently corresponded to the same FBC parameter in 95.2% (n = 217,752,448) of parameters. In the 4.8% (n = 10,955,006) mismatches, the most common parameter rectified was mean platelet volume (n = 2,041,360) and 1,191,540 could not be rectified and were removed. Units of measurement were often either missing, partially entered, or did not appear to correspond to the blood value. The final dataset contained 16,537,017 FBC tests. Applying mathematical equations to derive some missing parameters in these FBCs resulted in 15 of 20 parameters available per FBC on average, with 0.3% of FBCs having all 20 parameters. Performing data quality checks can help to understand the extent of any issues in the dataset. We emphasise balancing large sample sizes with reliability of the data.
spellingShingle Virdee, P
Fuller, A
Jacobs, M
Holt, T
Birks, J
Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_full Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_fullStr Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_full_unstemmed Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_short Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_sort assessing data quality from the clinical practice research datalink a methodological approach applied to the full blood count blood test
work_keys_str_mv AT virdeep assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest
AT fullera assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest
AT jacobsm assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest
AT holtt assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest
AT birksj assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest