Linking of the 2021 Census to massive linked administrative data to understand coverage and quality

The Office for National Statistics has built a vast, composite dataset for population statistics by linking data from health, education, and employment sources, known as the Demographic Index (DI). It attempts to contain a record ‘cluster’ for each person in England and Wales. To understand the cov...

Full description

Bibliographic Details
Main Authors: Josie Plachta, Sarah Collyer
Format: Article
Language:English
Published: Swansea University 2023-09-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/2201
_version_ 1827610725836652544
author Josie Plachta
Sarah Collyer
author_facet Josie Plachta
Sarah Collyer
author_sort Josie Plachta
collection DOAJ
description The Office for National Statistics has built a vast, composite dataset for population statistics by linking data from health, education, and employment sources, known as the Demographic Index (DI). It attempts to contain a record ‘cluster’ for each person in England and Wales. To understand the coverage and quality of the DI, it has been linked to the 2021 Census to a high standard, enabling review of those captured incorrectly: and over- and undercoverage. Massive data techniques were used to apply deterministic, probabilistic, and associative methods. High quality was achieved by applying clerical matching methods to cases that could not be confirmed by automatic techniques. Due to resource limitations, only a subsample was linked to this high standard. The resulting links were flagged to indicate cases where the DI had correctly captured persons or had made errors. Errors included capturing persons at the wrong address, accidently splitting a person’s records across two clusters, or incorrectly capturing two persons in the same cluster. Unlinked records were flagged as under-coverage (census) or over-coverage (DI). The 2021 Census was linked to the DI with an estimated precision of 99.4%-99.7% and recall of 99.1%-99.7%. This exceptional quality allows ONS analysts to use this dataset with high confidence in analysing the quality of the DI and its impact on statistics. In general, DI under-coverage was low, with 0.9% of Census records in the subsample not present on the DI. However, DI over-coverage was much higher, with 29.5% of DI records in the subsample not present on the census. 2.3% of census persons in the subsample had been incorrectly split across multiple clusters, and 0.3% had been merged into a cluster with multiple other persons. The ONS successfully linked the 2021 Census to the DI to a high quality. The linkage suggests that the DI captures most of the current population correctly but captures many persons that are not. These insights must be considered by any users of the data.
first_indexed 2024-03-09T07:54:29Z
format Article
id doaj.art-4b39df59767c47ba9087cca430b1c573
institution Directory Open Access Journal
issn 2399-4908
language English
last_indexed 2024-03-09T07:54:29Z
publishDate 2023-09-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj.art-4b39df59767c47ba9087cca430b1c5732023-12-03T01:11:27ZengSwansea UniversityInternational Journal of Population Data Science2399-49082023-09-018210.23889/ijpds.v8i2.2201Linking of the 2021 Census to massive linked administrative data to understand coverage and qualityJosie Plachta0Sarah Collyer1Office for National Statistics, Newport, United KingdomOffice for National Statistics, Newport, United Kingdom The Office for National Statistics has built a vast, composite dataset for population statistics by linking data from health, education, and employment sources, known as the Demographic Index (DI). It attempts to contain a record ‘cluster’ for each person in England and Wales. To understand the coverage and quality of the DI, it has been linked to the 2021 Census to a high standard, enabling review of those captured incorrectly: and over- and undercoverage. Massive data techniques were used to apply deterministic, probabilistic, and associative methods. High quality was achieved by applying clerical matching methods to cases that could not be confirmed by automatic techniques. Due to resource limitations, only a subsample was linked to this high standard. The resulting links were flagged to indicate cases where the DI had correctly captured persons or had made errors. Errors included capturing persons at the wrong address, accidently splitting a person’s records across two clusters, or incorrectly capturing two persons in the same cluster. Unlinked records were flagged as under-coverage (census) or over-coverage (DI). The 2021 Census was linked to the DI with an estimated precision of 99.4%-99.7% and recall of 99.1%-99.7%. This exceptional quality allows ONS analysts to use this dataset with high confidence in analysing the quality of the DI and its impact on statistics. In general, DI under-coverage was low, with 0.9% of Census records in the subsample not present on the DI. However, DI over-coverage was much higher, with 29.5% of DI records in the subsample not present on the census. 2.3% of census persons in the subsample had been incorrectly split across multiple clusters, and 0.3% had been merged into a cluster with multiple other persons. The ONS successfully linked the 2021 Census to the DI to a high quality. The linkage suggests that the DI captures most of the current population correctly but captures many persons that are not. These insights must be considered by any users of the data. https://ijpds.org/article/view/2201
spellingShingle Josie Plachta
Sarah Collyer
Linking of the 2021 Census to massive linked administrative data to understand coverage and quality
International Journal of Population Data Science
title Linking of the 2021 Census to massive linked administrative data to understand coverage and quality
title_full Linking of the 2021 Census to massive linked administrative data to understand coverage and quality
title_fullStr Linking of the 2021 Census to massive linked administrative data to understand coverage and quality
title_full_unstemmed Linking of the 2021 Census to massive linked administrative data to understand coverage and quality
title_short Linking of the 2021 Census to massive linked administrative data to understand coverage and quality
title_sort linking of the 2021 census to massive linked administrative data to understand coverage and quality
url https://ijpds.org/article/view/2201
work_keys_str_mv AT josieplachta linkingofthe2021censustomassivelinkedadministrativedatatounderstandcoverageandquality
AT sarahcollyer linkingofthe2021censustomassivelinkedadministrativedatatounderstandcoverageandquality