Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes
Introduction We have developed an innovative methodology to link maternal siblings within 2000 – 2005 England and Wales Birth Registration data, to form a Pregnancy Spine, a unification of all births to each unique mother. Key challenges in this many-many linkage scenario: • Blocking (reducti...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2018-09-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/894 |
_version_ | 1797426810449821696 |
---|---|
author | Shelley Gammon Charles Morris |
author_facet | Shelley Gammon Charles Morris |
author_sort | Shelley Gammon |
collection | DOAJ |
description | Introduction
We have developed an innovative methodology to link maternal siblings within 2000 – 2005 England and Wales Birth Registration data, to form a Pregnancy Spine, a unification of all births to each unique mother.
Key challenges in this many-many linkage scenario:
• Blocking (reduction of record pair comparisons)
• Cluster resolution
Objectives and Approach
Probabilistic data linkage (Python) was followed by generation of clusters (using igraph in R) and graph theory community detection techniques.
To optimise geographical blocking and increase accuracy, we incorporated Internal Migration data to map the likely geographic movement of mothers between births.
Maternal sibling clusters were modelled as a graph and the structure of clusters was optimised using community detection methods to link, split and evaluate sibling groups.
Additionally, we incorporated additional childhood statistics data relating to child date of birth to evaluate likely accuracy of sibling pairs and remove false edges (links).
Results
Our development has resulted in a new blocking method and cluster resolution method. In addition, we developed new ways to assess and measure the accuracy of sibling groups, beyond traditional classifier metrics, and infer error rates.
We applied our method to Registration Data used in earlier studies for QA of our methods.
Using this, and by comparing against other statistics on maternal sibling composition we will present results which show that a high degree of accuracy (precision / recall and new checks) was obtained for precision, recall, and other evaluation metrics.
Conclusion/Implications
These methods will improve other linkage projects with unknown clusters sizes; for de-duplicating datasets, linkage of multiple datasets, or incorporation of data from a longer time-period through longitudinal linkage.
To this Spine, researchers can now append and link other data sources to answer questions about maternal and child health outcomes. |
first_indexed | 2024-03-09T08:35:15Z |
format | Article |
id | doaj.art-b989932c0d8947018e2bcbf11de8c6b8 |
institution | Directory Open Access Journal |
issn | 2399-4908 |
language | English |
last_indexed | 2024-03-09T08:35:15Z |
publishDate | 2018-09-01 |
publisher | Swansea University |
record_format | Article |
series | International Journal of Population Data Science |
spelling | doaj.art-b989932c0d8947018e2bcbf11de8c6b82023-12-02T18:21:50ZengSwansea UniversityInternational Journal of Population Data Science2399-49082018-09-013410.23889/ijpds.v3i4.894894Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster SizesShelley Gammon0Charles Morris1Office for National StatisticsOffice for National StatisticsIntroduction We have developed an innovative methodology to link maternal siblings within 2000 – 2005 England and Wales Birth Registration data, to form a Pregnancy Spine, a unification of all births to each unique mother. Key challenges in this many-many linkage scenario: • Blocking (reduction of record pair comparisons) • Cluster resolution Objectives and Approach Probabilistic data linkage (Python) was followed by generation of clusters (using igraph in R) and graph theory community detection techniques. To optimise geographical blocking and increase accuracy, we incorporated Internal Migration data to map the likely geographic movement of mothers between births. Maternal sibling clusters were modelled as a graph and the structure of clusters was optimised using community detection methods to link, split and evaluate sibling groups. Additionally, we incorporated additional childhood statistics data relating to child date of birth to evaluate likely accuracy of sibling pairs and remove false edges (links). Results Our development has resulted in a new blocking method and cluster resolution method. In addition, we developed new ways to assess and measure the accuracy of sibling groups, beyond traditional classifier metrics, and infer error rates. We applied our method to Registration Data used in earlier studies for QA of our methods. Using this, and by comparing against other statistics on maternal sibling composition we will present results which show that a high degree of accuracy (precision / recall and new checks) was obtained for precision, recall, and other evaluation metrics. Conclusion/Implications These methods will improve other linkage projects with unknown clusters sizes; for de-duplicating datasets, linkage of multiple datasets, or incorporation of data from a longer time-period through longitudinal linkage. To this Spine, researchers can now append and link other data sources to answer questions about maternal and child health outcomes.https://ijpds.org/article/view/894 |
spellingShingle | Shelley Gammon Charles Morris Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes International Journal of Population Data Science |
title | Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes |
title_full | Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes |
title_fullStr | Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes |
title_full_unstemmed | Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes |
title_short | Finding Maternal Siblings in Birth Registration Data to form a Pregnancy Spine – Data Linkage & Graph Based Methods for Unknown Cluster Sizes |
title_sort | finding maternal siblings in birth registration data to form a pregnancy spine data linkage graph based methods for unknown cluster sizes |
url | https://ijpds.org/article/view/894 |
work_keys_str_mv | AT shelleygammon findingmaternalsiblingsinbirthregistrationdatatoformapregnancyspinedatalinkagegraphbasedmethodsforunknownclustersizes AT charlesmorris findingmaternalsiblingsinbirthregistrationdatatoformapregnancyspinedatalinkagegraphbasedmethodsforunknownclustersizes |