Providing Geospatial Covariate Data for Use with DHS Datasets
When users of The DHS Program’s survey data request access to our geospatial data, they usually do so with the intention of linking survey cluster location data to outside datasets – such as rainfall measurements, population density, and distance to road networks. These additional data, when coupled with geographic location, are known as spatial covariates and may shed light on the impact of location on health outcomes. However, linking these covariates to geographic data can often be a challenge as multiple sources of these covariate data exist, often with varying quality. It can be difficult for researchers to know which data source will provide the covariate data that will best complement the GPS cluster data they acquire from The DHS Program.
Having recognized both the demand for DHS geospatial data and the subsequent challenge in linking them to spatial covariates, the DHS Geospatial team endeavored to prepare and make a freely available set of standardized geospatial covariate datasets which do away with the need for linking to clusters’ GPS location data. This allows individuals with little to no Geographic Information Systems (GIS) experience to conduct geospatial statistical analysis in software such as STATA, SAS, or SPSS. Even experienced GIS analysts may benefit from these datasets as they no longer have to take the time to source the proper covariate data and link them to cluster GPS data themselves.
After gathering data from users and experts, we identified the covariates that are most commonly used in published literature in conjunction with The DHS Program’s survey data, that included key topic areas. Further, we reached out to users to get a sense of how they would potentially utilize and benefit from a set of spatial covariates prepared in-house. As a result of these two activities, we identified dozens of potential covariates that are used or that users would like to use in conjunction with our geospatial data.
Working closely with our partners at Blue Raster, we then extracted, at each displaced DHS survey cluster, measurements of selected geospatial covariates. These covariates were selected if they: a) had global or regional extent, b) were publicly available, c) had well-documented acquisition or creation processes with detailed metadata, and d) were available for relevant time frames.
We strove to include those covariates that would be in high demand by our users, including rainfall, ITN net coverage, cases of malaria, travel times to nearest cities, urbanization, and more. A detailed methodology used to extract them can be found on the Spatial Data Repository website.
We hope the spatial covariate datasets will prove to be valuable for a wide range of DHS data users. We are continuing to look into ways to further improve the datasets, including the extraction process used to create these files and release similar extracts for other covariates that weren’t addressed in the first round of this activity. User feedback will be critical in helping us understand what is truly desired out of these datasets, so we strongly encourage those who download and use these files to email us with their thoughts, advice, and requests for future covariates.
Photo Caption: GIS participants at the 2017 Regional Health Data Mapping Workshop in Cambodia.
Interesting article.
Great read!