Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia
Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2018-08-01
|
Series: | Data |
Subjects: | |
Online Access: | http://www.mdpi.com/2306-5729/3/3/30 |
_version_ | 1811185946107838464 |
---|---|
author | Dana R. Thomson Lieke Kools Warren C. Jochem |
author_facet | Dana R. Thomson Lieke Kools Warren C. Jochem |
author_sort | Dana R. Thomson |
collection | DOAJ |
description | Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals. |
first_indexed | 2024-04-11T13:37:57Z |
format | Article |
id | doaj.art-c1c65360bbe04ea8ad22e699d0e3a0c8 |
institution | Directory Open Access Journal |
issn | 2306-5729 |
language | English |
last_indexed | 2024-04-11T13:37:57Z |
publishDate | 2018-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Data |
spelling | doaj.art-c1c65360bbe04ea8ad22e699d0e3a0c82022-12-22T04:21:23ZengMDPI AGData2306-57292018-08-01333010.3390/data3030030data3030030Linking Synthetic Populations to Household Geolocations: A Demonstration in NamibiaDana R. Thomson0Lieke Kools1Warren C. Jochem2Flowminder Foundation, SE-11355 Stockholm, SwedenDepartment of Economics, Leiden University, 2311 EZ Leiden, The NetherlandsFlowminder Foundation, SE-11355 Stockholm, SwedenWhether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals.http://www.mdpi.com/2306-5729/3/3/30simulationcensussimPopLMIC |
spellingShingle | Dana R. Thomson Lieke Kools Warren C. Jochem Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia Data simulation census simPop LMIC |
title | Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia |
title_full | Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia |
title_fullStr | Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia |
title_full_unstemmed | Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia |
title_short | Linking Synthetic Populations to Household Geolocations: A Demonstration in Namibia |
title_sort | linking synthetic populations to household geolocations a demonstration in namibia |
topic | simulation census simPop LMIC |
url | http://www.mdpi.com/2306-5729/3/3/30 |
work_keys_str_mv | AT danarthomson linkingsyntheticpopulationstohouseholdgeolocationsademonstrationinnamibia AT liekekools linkingsyntheticpopulationstohouseholdgeolocationsademonstrationinnamibia AT warrencjochem linkingsyntheticpopulationstohouseholdgeolocationsademonstrationinnamibia |