Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test

A Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data i...

Full description

Bibliographic Details
Main Authors:	Virdee, P, Fuller, A, Jacobs, M, Holt, T, Birks, J
Format:	Journal article
Language:	English
Published:	SpringerOpen 2020

_version_	1797058548797014016
author	Virdee, P Fuller, A Jacobs, M Holt, T Birks, J
author_facet	Virdee, P Fuller, A Jacobs, M Holt, T Birks, J
author_sort	Virdee, P
collection	OXFORD
description	A Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data in one EHR. The Test dataset from the Clinical Research Practice Datalink (CPRD) was accessed, which contains results of tests performed in primary care, such as FBC blood tests. Medical codes and entity codes, two coding systems used within CPRD to identify FBC records, were compared, with levels of mismatched coding, and number that could be rectified reported. The reliability of units of measurement are also described and missing data discussed. There were 14 entity codes and 138 medical codes for the FBC in the data. Medical and entity codes consistently corresponded to the same FBC parameter in 95.2% (n = 217,752,448) of parameters. In the 4.8% (n = 10,955,006) mismatches, the most common parameter rectified was mean platelet volume (n = 2,041,360) and 1,191,540 could not be rectified and were removed. Units of measurement were often either missing, partially entered, or did not appear to correspond to the blood value. The final dataset contained 16,537,017 FBC tests. Applying mathematical equations to derive some missing parameters in these FBCs resulted in 15 of 20 parameters available per FBC on average, with 0.3% of FBCs having all 20 parameters. Performing data quality checks can help to understand the extent of any issues in the dataset. We emphasise balancing large sample sizes with reliability of the data.
first_indexed	2024-03-06T19:51:51Z
format	Journal article
id	oxford-uuid:243a78af-70b0-470b-931c-221781873403
institution	University of Oxford
language	English
last_indexed	2024-03-06T19:51:51Z
publishDate	2020
publisher	SpringerOpen
record_format	dspace
spelling	oxford-uuid:243a78af-70b0-470b-931c-2217818734032022-03-26T11:48:54ZAssessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood testJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:243a78af-70b0-470b-931c-221781873403EnglishSymplectic ElementsSpringerOpen2020Virdee, PFuller, AJacobs, MHolt, TBirks, JA Full Blood Count (FBC) is a common blood test including 20 parameters, such as haemoglobin and platelets. FBCs from Electronic Health Record (EHR) databases provide a large sample of anonymised individual patient data and are increasingly used in research. We describe the quality of the FBC data in one EHR. The Test dataset from the Clinical Research Practice Datalink (CPRD) was accessed, which contains results of tests performed in primary care, such as FBC blood tests. Medical codes and entity codes, two coding systems used within CPRD to identify FBC records, were compared, with levels of mismatched coding, and number that could be rectified reported. The reliability of units of measurement are also described and missing data discussed. There were 14 entity codes and 138 medical codes for the FBC in the data. Medical and entity codes consistently corresponded to the same FBC parameter in 95.2% (n = 217,752,448) of parameters. In the 4.8% (n = 10,955,006) mismatches, the most common parameter rectified was mean platelet volume (n = 2,041,360) and 1,191,540 could not be rectified and were removed. Units of measurement were often either missing, partially entered, or did not appear to correspond to the blood value. The final dataset contained 16,537,017 FBC tests. Applying mathematical equations to derive some missing parameters in these FBCs resulted in 15 of 20 parameters available per FBC on average, with 0.3% of FBCs having all 20 parameters. Performing data quality checks can help to understand the extent of any issues in the dataset. We emphasise balancing large sample sizes with reliability of the data.
spellingShingle	Virdee, P Fuller, A Jacobs, M Holt, T Birks, J Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title	Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_full	Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_fullStr	Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_full_unstemmed	Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_short	Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test
title_sort	assessing data quality from the clinical practice research datalink a methodological approach applied to the full blood count blood test
work_keys_str_mv	AT virdeep assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest AT fullera assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest AT jacobsm assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest AT holtt assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest AT birksj assessingdataqualityfromtheclinicalpracticeresearchdatalinkamethodologicalapproachappliedtothefullbloodcountbloodtest

Assessing data quality from the Clinical Practice Research Datalink: a methodological approach applied to the full blood count blood test

Similar Items