Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records

<p>Abstract</p> <p>Background</p> <p>Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incom...

Full description

Bibliographic Details
Main Authors: Ryan Ronan, Vernon Sally, Lawrence Gill, Wilson Sue
Format: Article
Language:English
Published: BMC 2012-01-01
Series:BMC Medical Informatics and Decision Making
Online Access:http://www.biomedcentral.com/1472-6947/12/3
_version_ 1811251445043822592
author Ryan Ronan
Vernon Sally
Lawrence Gill
Wilson Sue
author_facet Ryan Ronan
Vernon Sally
Lawrence Gill
Wilson Sue
author_sort Ryan Ronan
collection DOAJ
description <p>Abstract</p> <p>Background</p> <p>Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for imputing missing data on the ethnicity of cancer patients, developed for a regional cancer registry in the UK.</p> <p>Methods</p> <p>Routine records from cancer screening services, name recognition software (Nam Pehchan and Onomap), 2001 national Census data, and multiple imputation were used to predict the ethnicity of the 23% of cases that were still missing following linkage with self-reported ethnicity from inpatient hospital records.</p> <p>Results</p> <p>The name recognition software were good predictors of ethnicity for South Asian cancer cases when compared with data on ethnicity derived from hospital inpatient records, especially when combined (sensitivity 90.5%; specificity 99.9%; PPV 93.3%). Onomap was a poor predictor of ethnicity for other minority ethnic groups (sensitivity 4.4% for Black cases and 0.0% for Chinese/Other ethnic groups). Area-based data derived from the national Census was also a poor predictor non-White ethnicity (sensitivity: South Asian 7.4%; Black 2.3%; Chinese/Other 0.0%; Mixed 0.0%).</p> <p>Conclusions</p> <p>Currently, neither method for assigning individuals to an ethnic group (name recognition and ethnic distribution of area of residence) performs well across all ethnic groups. We recommend further development of name recognition applications and the identification of additional methods for predicting ethnicity to improve their precision and accuracy for comparisons of health outcomes. However, real improvements can only come from better recording of ethnicity by health services.</p>
first_indexed 2024-04-12T16:19:40Z
format Article
id doaj.art-ff54fe739fa245739254780eb26222db
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-04-12T16:19:40Z
publishDate 2012-01-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-ff54fe739fa245739254780eb26222db2022-12-22T03:25:36ZengBMCBMC Medical Informatics and Decision Making1472-69472012-01-01121310.1186/1472-6947-12-3Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry recordsRyan RonanVernon SallyLawrence GillWilson Sue<p>Abstract</p> <p>Background</p> <p>Information on ethnicity is commonly used by health services and researchers to plan services, ensure equality of access, and for epidemiological studies. In common with other important demographic and clinical data it is often incompletely recorded. This paper presents a method for imputing missing data on the ethnicity of cancer patients, developed for a regional cancer registry in the UK.</p> <p>Methods</p> <p>Routine records from cancer screening services, name recognition software (Nam Pehchan and Onomap), 2001 national Census data, and multiple imputation were used to predict the ethnicity of the 23% of cases that were still missing following linkage with self-reported ethnicity from inpatient hospital records.</p> <p>Results</p> <p>The name recognition software were good predictors of ethnicity for South Asian cancer cases when compared with data on ethnicity derived from hospital inpatient records, especially when combined (sensitivity 90.5%; specificity 99.9%; PPV 93.3%). Onomap was a poor predictor of ethnicity for other minority ethnic groups (sensitivity 4.4% for Black cases and 0.0% for Chinese/Other ethnic groups). Area-based data derived from the national Census was also a poor predictor non-White ethnicity (sensitivity: South Asian 7.4%; Black 2.3%; Chinese/Other 0.0%; Mixed 0.0%).</p> <p>Conclusions</p> <p>Currently, neither method for assigning individuals to an ethnic group (name recognition and ethnic distribution of area of residence) performs well across all ethnic groups. We recommend further development of name recognition applications and the identification of additional methods for predicting ethnicity to improve their precision and accuracy for comparisons of health outcomes. However, real improvements can only come from better recording of ethnicity by health services.</p>http://www.biomedcentral.com/1472-6947/12/3
spellingShingle Ryan Ronan
Vernon Sally
Lawrence Gill
Wilson Sue
Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
BMC Medical Informatics and Decision Making
title Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_full Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_fullStr Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_full_unstemmed Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_short Use of name recognition software, census data and multiple imputation to predict missing data on ethnicity: application to cancer registry records
title_sort use of name recognition software census data and multiple imputation to predict missing data on ethnicity application to cancer registry records
url http://www.biomedcentral.com/1472-6947/12/3
work_keys_str_mv AT ryanronan useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords
AT vernonsally useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords
AT lawrencegill useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords
AT wilsonsue useofnamerecognitionsoftwarecensusdataandmultipleimputationtopredictmissingdataonethnicityapplicationtocancerregistryrecords