Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes

Introduction We have developed an innovative methodology to link maternal siblings within 2000 – 2005 England and Wales Birth Registration data, to form a Pregnancy Spine, a unification of all births to each unique mother. Key challenges in this many-many linkage scenario: • Blocking (reducti...

Full description

Bibliographic Details
Main Authors: Shelley Gammon, Charles Morris
Format: Article
Language:English
Published: Swansea University 2018-09-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/894
_version_ 1797426810449821696
author Shelley Gammon
Charles Morris
author_facet Shelley Gammon
Charles Morris
author_sort Shelley Gammon
collection DOAJ
description Introduction We have developed an innovative methodology to link maternal siblings within 2000 – 2005 England and Wales Birth Registration data, to form a Pregnancy Spine, a unification of all births to each unique mother. Key challenges in this many-many linkage scenario: • Blocking (reduction of record pair comparisons) • Cluster resolution Objectives and Approach Probabilistic data linkage (Python) was followed by generation of clusters (using igraph in R) and graph theory community detection techniques. To optimise geographical blocking and increase accuracy, we incorporated Internal Migration data to map the likely geographic movement of mothers between births. Maternal sibling clusters were modelled as a graph and the structure of clusters was optimised using community detection methods to link, split and evaluate sibling groups. Additionally, we incorporated additional childhood statistics data relating to child date of birth to evaluate likely accuracy of sibling pairs and remove false edges (links). Results Our development has resulted in a new blocking method and cluster resolution method. In addition, we developed new ways to assess and measure the accuracy of sibling groups, beyond traditional classifier metrics, and infer error rates. We applied our method to Registration Data used in earlier studies for QA of our methods. Using this, and by comparing against other statistics on maternal sibling composition we will present results which show that a high degree of accuracy (precision / recall and new checks) was obtained for precision, recall, and other evaluation metrics. Conclusion/Implications These methods will improve other linkage projects with unknown clusters sizes; for de-duplicating datasets, linkage of multiple datasets, or incorporation of data from a longer time-period through longitudinal linkage. To this Spine, researchers can now append and link other data sources to answer questions about maternal and child health outcomes.
first_indexed 2024-03-09T08:35:15Z
format Article
id doaj.art-b989932c0d8947018e2bcbf11de8c6b8
institution Directory Open Access Journal
issn 2399-4908
language English
last_indexed 2024-03-09T08:35:15Z
publishDate 2018-09-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj.art-b989932c0d8947018e2bcbf11de8c6b82023-12-02T18:21:50ZengSwansea UniversityInternational Journal of Population Data Science2399-49082018-09-013410.23889/ijpds.v3i4.894894Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster SizesShelley Gammon0Charles Morris1Office for National StatisticsOffice for National StatisticsIntroduction We have developed an innovative methodology to link maternal siblings within 2000 – 2005 England and Wales Birth Registration data, to form a Pregnancy Spine, a unification of all births to each unique mother. Key challenges in this many-many linkage scenario: • Blocking (reduction of record pair comparisons) • Cluster resolution Objectives and Approach Probabilistic data linkage (Python) was followed by generation of clusters (using igraph in R) and graph theory community detection techniques. To optimise geographical blocking and increase accuracy, we incorporated Internal Migration data to map the likely geographic movement of mothers between births. Maternal sibling clusters were modelled as a graph and the structure of clusters was optimised using community detection methods to link, split and evaluate sibling groups. Additionally, we incorporated additional childhood statistics data relating to child date of birth to evaluate likely accuracy of sibling pairs and remove false edges (links). Results Our development has resulted in a new blocking method and cluster resolution method. In addition, we developed new ways to assess and measure the accuracy of sibling groups, beyond traditional classifier metrics, and infer error rates. We applied our method to Registration Data used in earlier studies for QA of our methods. Using this, and by comparing against other statistics on maternal sibling composition we will present results which show that a high degree of accuracy (precision / recall and new checks) was obtained for precision, recall, and other evaluation metrics. Conclusion/Implications These methods will improve other linkage projects with unknown clusters sizes; for de-duplicating datasets, linkage of multiple datasets, or incorporation of data from a longer time-period through longitudinal linkage. To this Spine, researchers can now append and link other data sources to answer questions about maternal and child health outcomes.https://ijpds.org/article/view/894
spellingShingle Shelley Gammon
Charles Morris
Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
International Journal of Population Data Science
title Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
title_full Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
title_fullStr Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
title_full_unstemmed Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
title_short Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
title_sort finding maternal siblings in birth registration data to form a pregnancy spine data linkage graph based methods for unknown cluster sizes
url https://ijpds.org/article/view/894
work_keys_str_mv AT shelleygammon findingmaternalsiblingsinbirthregistrationdatatoformapregnancyspinedatalinkagegraphbasedmethodsforunknownclustersizes
AT charlesmorris findingmaternalsiblingsinbirthregistrationdatatoformapregnancyspinedatalinkagegraphbasedmethodsforunknownclustersizes