Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil

Abstract Background Due to the increasing availability of individual-level information across different electronic datasets, record linkage has become an efficient and important research tool. High quality linkage is essential for producing robust results. The objective of this study was to describe...

Full description

Bibliographic Details
Main Authors: Enny S Paixão, Katie Harron, Kleydson Andrade, Maria Glória Teixeira, Rosemeire L. Fiaccone, Maria da Conceição N. Costa, Laura C. Rodrigues
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12911-017-0506-5
_version_ 1818539745750286336
author Enny S Paixão
Katie Harron
Kleydson Andrade
Maria Glória Teixeira
Rosemeire L. Fiaccone
Maria da Conceição N. Costa
Laura C. Rodrigues
author_facet Enny S Paixão
Katie Harron
Kleydson Andrade
Maria Glória Teixeira
Rosemeire L. Fiaccone
Maria da Conceição N. Costa
Laura C. Rodrigues
author_sort Enny S Paixão
collection DOAJ
description Abstract Background Due to the increasing availability of individual-level information across different electronic datasets, record linkage has become an efficient and important research tool. High quality linkage is essential for producing robust results. The objective of this study was to describe the process of preparing and linking national Brazilian datasets, and to compare the accuracy of different linkage methods for assessing the risk of stillbirth due to dengue in pregnancy. Methods We linked mothers and stillbirths in two routinely collected datasets from Brazil for 2009–2010: for dengue in pregnancy, notifications of infectious diseases (SINAN); for stillbirths, mortality (SIM). Since there was no unique identifier, we used probabilistic linkage based on maternal name, age and municipality. We compared two probabilistic approaches, each with two thresholds: 1) a bespoke linkage algorithm; 2) a standard linkage software widely used in Brazil (ReclinkIII), and used manual review to identify further links. Sensitivity and positive predictive value (PPV) were estimated using a subset of gold-standard data created through manual review. We examined the characteristics of false-matches and missed-matches to identify any sources of bias. Results From records of 678,999 dengue cases and 62,373 stillbirths, the gold-standard linkage identified 191 cases. The bespoke linkage algorithm with a conservative threshold produced 131 links, with sensitivity = 64.4% (68 missed-matches) and PPV = 92.5% (8 false-matches). Manual review of uncertain links identified an additional 37 links, increasing sensitivity to 83.7%. The bespoke algorithm with a relaxed threshold identified 132 true matches (sensitivity = 69.1%), but introduced 61 false-matches (PPV = 68.4%). ReclinkIII produced lower sensitivity and PPV than the bespoke linkage algorithm. Linkage error was not associated with any recorded study variables. Conclusion Despite a lack of unique identifiers for linking mothers and stillbirths, we demonstrate a high standard of linkage of large routine databases from a middle income country. Probabilistic linkage and manual review were essential for accurately identifying cases for a case-control study, but this approach may not be feasible for larger databases or for linkage of more common outcomes.
first_indexed 2024-12-11T21:46:10Z
format Article
id doaj.art-79d0073404d546a58251abced0d6f812
institution Directory Open Access Journal
issn 1472-6947
language English
last_indexed 2024-12-11T21:46:10Z
publishDate 2017-07-01
publisher BMC
record_format Article
series BMC Medical Informatics and Decision Making
spelling doaj.art-79d0073404d546a58251abced0d6f8122022-12-22T00:49:37ZengBMCBMC Medical Informatics and Decision Making1472-69472017-07-011711910.1186/s12911-017-0506-5Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in BrazilEnny S Paixão0Katie Harron1Kleydson Andrade2Maria Glória Teixeira3Rosemeire L. Fiaccone4Maria da Conceição N. Costa5Laura C. Rodrigues6London School of Hygiene and Tropical MedicineLondon School of Hygiene and Tropical MedicineInstituto de Saúde Coletiva, Rua Basílio da Gama, s/n.CanelaInstituto de Saúde Coletiva, Rua Basílio da Gama, s/n.CanelaDepartamento de EstatísticaInstituto de Saúde Coletiva, Rua Basílio da Gama, s/n.CanelaLondon School of Hygiene and Tropical MedicineAbstract Background Due to the increasing availability of individual-level information across different electronic datasets, record linkage has become an efficient and important research tool. High quality linkage is essential for producing robust results. The objective of this study was to describe the process of preparing and linking national Brazilian datasets, and to compare the accuracy of different linkage methods for assessing the risk of stillbirth due to dengue in pregnancy. Methods We linked mothers and stillbirths in two routinely collected datasets from Brazil for 2009–2010: for dengue in pregnancy, notifications of infectious diseases (SINAN); for stillbirths, mortality (SIM). Since there was no unique identifier, we used probabilistic linkage based on maternal name, age and municipality. We compared two probabilistic approaches, each with two thresholds: 1) a bespoke linkage algorithm; 2) a standard linkage software widely used in Brazil (ReclinkIII), and used manual review to identify further links. Sensitivity and positive predictive value (PPV) were estimated using a subset of gold-standard data created through manual review. We examined the characteristics of false-matches and missed-matches to identify any sources of bias. Results From records of 678,999 dengue cases and 62,373 stillbirths, the gold-standard linkage identified 191 cases. The bespoke linkage algorithm with a conservative threshold produced 131 links, with sensitivity = 64.4% (68 missed-matches) and PPV = 92.5% (8 false-matches). Manual review of uncertain links identified an additional 37 links, increasing sensitivity to 83.7%. The bespoke algorithm with a relaxed threshold identified 132 true matches (sensitivity = 69.1%), but introduced 61 false-matches (PPV = 68.4%). ReclinkIII produced lower sensitivity and PPV than the bespoke linkage algorithm. Linkage error was not associated with any recorded study variables. Conclusion Despite a lack of unique identifiers for linking mothers and stillbirths, we demonstrate a high standard of linkage of large routine databases from a middle income country. Probabilistic linkage and manual review were essential for accurately identifying cases for a case-control study, but this approach may not be feasible for larger databases or for linkage of more common outcomes.http://link.springer.com/article/10.1186/s12911-017-0506-5Data linkageRoutine dataElectronic health recordsLinkage qualityLinkage accuracyStillbirth
spellingShingle Enny S Paixão
Katie Harron
Kleydson Andrade
Maria Glória Teixeira
Rosemeire L. Fiaccone
Maria da Conceição N. Costa
Laura C. Rodrigues
Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
BMC Medical Informatics and Decision Making
Data linkage
Routine data
Electronic health records
Linkage quality
Linkage accuracy
Stillbirth
title Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
title_full Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
title_fullStr Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
title_full_unstemmed Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
title_short Evaluation of record linkage of two large administrative databases in a middle income country: stillbirths and notifications of dengue during pregnancy in Brazil
title_sort evaluation of record linkage of two large administrative databases in a middle income country stillbirths and notifications of dengue during pregnancy in brazil
topic Data linkage
Routine data
Electronic health records
Linkage quality
Linkage accuracy
Stillbirth
url http://link.springer.com/article/10.1186/s12911-017-0506-5
work_keys_str_mv AT ennyspaixao evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil
AT katieharron evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil
AT kleydsonandrade evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil
AT mariagloriateixeira evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil
AT rosemeirelfiaccone evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil
AT mariadaconceicaoncosta evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil
AT lauracrodrigues evaluationofrecordlinkageoftwolargeadministrativedatabasesinamiddleincomecountrystillbirthsandnotificationsofdengueduringpregnancyinbrazil