Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study

Abstract Background Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessa...

Full description

Bibliographic Details
Main Authors: Duong Thuy Tran, Alys Havard, Louisa R. Jorm
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12874-017-0385-6
_version_ 1818137018752827392
author Duong Thuy Tran
Alys Havard
Louisa R. Jorm
author_facet Duong Thuy Tran
Alys Havard
Louisa R. Jorm
author_sort Duong Thuy Tran
collection DOAJ
description Abstract Background Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessation pharmacotherapies during pregnancy. Methods Perinatal records for all deliveries (2003–2012) in the States of New South Wales (NSW) and Western Australia were linked to State-based data collections including hospital separation, emergency department and death data (mothers and babies) and congenital defect notifications (babies in NSW) by State-based data linkage units. A national data linkage unit linked pharmaceutical dispensing data for the mothers. All linkages were probabilistic. Twenty two steps assessed the uniqueness of records and consistency of items within and across data sources, resolved discrepancies in the linkages between units, and identified women having records in both States. Results State-based linkages yielded a cohort of 783,471 mothers and 1,232,440 babies. Likely false positive links relating to 3703 mothers were identified. Corrections of baby’s date of birth and age, and parity were made for 43,578 records while 1996 records were flagged as duplicates. Checks for the uniqueness of the matches between State and national linkages detected 3404 ID clusters, suggestive of missed links in the State linkages, and identified 1986 women who had records in both States. Conclusions Analysis of content data can identify inaccurate links that cannot be detected by data linkage units that have access to personal identifiers only. Perinatal researchers are encouraged to adopt the methods presented to ensure quality and consistency among studies using linked administrative data.
first_indexed 2024-12-11T09:49:38Z
format Article
id doaj.art-d1c1f3da24f745c1af30a097392609db
institution Directory Open Access Journal
issn 1471-2288
language English
last_indexed 2024-12-11T09:49:38Z
publishDate 2017-07-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj.art-d1c1f3da24f745c1af30a097392609db2022-12-22T01:12:27ZengBMCBMC Medical Research Methodology1471-22882017-07-0117111510.1186/s12874-017-0385-6Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) StudyDuong Thuy Tran0Alys Havard1Louisa R. Jorm2Centre for Big Data Research in Health, Faculty of Medicine, UNSW Sydney (The University of New South Wales)Centre for Big Data Research in Health, Faculty of Medicine, UNSW Sydney (The University of New South Wales)Centre for Big Data Research in Health, Faculty of Medicine, UNSW Sydney (The University of New South Wales)Abstract Background Data cleaning is an important quality assurance in data linkage research studies. This paper presents the data cleaning and preparation process for a large-scale cross-jurisdictional Australian study (the Smoking MUMS Study) to evaluate the utilisation and safety of smoking cessation pharmacotherapies during pregnancy. Methods Perinatal records for all deliveries (2003–2012) in the States of New South Wales (NSW) and Western Australia were linked to State-based data collections including hospital separation, emergency department and death data (mothers and babies) and congenital defect notifications (babies in NSW) by State-based data linkage units. A national data linkage unit linked pharmaceutical dispensing data for the mothers. All linkages were probabilistic. Twenty two steps assessed the uniqueness of records and consistency of items within and across data sources, resolved discrepancies in the linkages between units, and identified women having records in both States. Results State-based linkages yielded a cohort of 783,471 mothers and 1,232,440 babies. Likely false positive links relating to 3703 mothers were identified. Corrections of baby’s date of birth and age, and parity were made for 43,578 records while 1996 records were flagged as duplicates. Checks for the uniqueness of the matches between State and national linkages detected 3404 ID clusters, suggestive of missed links in the State linkages, and identified 1986 women who had records in both States. Conclusions Analysis of content data can identify inaccurate links that cannot be detected by data linkage units that have access to personal identifiers only. Perinatal researchers are encouraged to adopt the methods presented to ensure quality and consistency among studies using linked administrative data.http://link.springer.com/article/10.1186/s12874-017-0385-6Data cleaning methodsData consistencyPerinatalRecord linkage
spellingShingle Duong Thuy Tran
Alys Havard
Louisa R. Jorm
Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
BMC Medical Research Methodology
Data cleaning methods
Data consistency
Perinatal
Record linkage
title Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_full Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_fullStr Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_full_unstemmed Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_short Data cleaning and management protocols for linked perinatal research data: a good practice example from the Smoking MUMS (Maternal Use of Medications and Safety) Study
title_sort data cleaning and management protocols for linked perinatal research data a good practice example from the smoking mums maternal use of medications and safety study
topic Data cleaning methods
Data consistency
Perinatal
Record linkage
url http://link.springer.com/article/10.1186/s12874-017-0385-6
work_keys_str_mv AT duongthuytran datacleaningandmanagementprotocolsforlinkedperinatalresearchdataagoodpracticeexamplefromthesmokingmumsmaternaluseofmedicationsandsafetystudy
AT alyshavard datacleaningandmanagementprotocolsforlinkedperinatalresearchdataagoodpracticeexamplefromthesmokingmumsmaternaluseofmedicationsandsafetystudy
AT louisarjorm datacleaningandmanagementprotocolsforlinkedperinatalresearchdataagoodpracticeexamplefromthesmokingmumsmaternaluseofmedicationsandsafetystudy