Luminare: Providing Sample Weights for Multilevel Analysis while Protecting Confidentiality
This blog post is part of Luminare, our blog series exploring innovative solutions to data collection, quality assurance, biomarker measurement, data use, and further analysis.
The DHS Program recently published a Methodological Report providing a framework for estimating “level-weights” in DHS surveys – weights that correspond to each stage of sampling. These weights are required for multilevel modeling. While the audience for the framework itself is academic researchers, the challenge of protecting respondent confidentiality while supporting data analysis is of general interest.
We sat down with two of the authors, Mahmoud Elkasabi, Senior Sampling Statistician, and Tom Pullum, Senior Advisor for Research and Analysis, to learn more about this innovative strategy.
How did the idea for this activity come about?
Post from Data User:
I have been reading the posts on the forum regarding the use of weights with multilevel analyses and wanted to check to see if there were any updates on recommendations on how to go about this. . . Since we cannot separate out the household weights from the cluster weights to incorporate them in the statistical coding, does the DHS have any recommendations on how to go about running multilevel models with DHS data? . . . I would like to run multilevel models looking at childhood vaccinations and want to make sure I am going about it in the most proper way. Any help or guidance on this from those at DHS or out in the forum would be greatly appreciated!
Mahmoud: There has been huge user demand for DHS survey level-weights. We have seen many posts on The DHS Program User Forum over the years, where analysts are trying to apply weights in multilevel analysis. It is a common type of research question, to use multilevel modeling to understand the effects of cluster-level characteristics such as region on individual-level outcomes, such as contraceptive use or children’s nutritional status.
For those of us who aren’t statistically inclined, why do researchers need to include sampling weights in their analysis?
Mahmoud: Sampling weights compensate for different probabilities of selection within the samples, and for different levels of non-response. Providing weights at multiple levels allows for the best level of representativeness for that unit. That is, the data from each interviewed woman becomes as representative as possible of similar women in the population. That is ultimately the goal of a survey: to obtain data that are nationally and subnationally representative without interviewing the entire population.
Why aren’t level-weights standardly provided with DHS datasets?
Mahmoud: After a survey is completed, The DHS Program destroys the information required for exact calculation of the cluster weights. Providing the true cluster-level weight for each cluster would pose a risk to respondent confidentiality—anyone with access to the sampling frame could use the cluster-level weights to identify the specific clusters that were drawn in the sample—and then, potentially, identify households or individuals. For that reason, The DHS Program only releases the final survey weights in the datasets.
How does the level-weights framework respond to the challenge of protecting confidentiality?
Tom: We propose a framework that uses publicly available data from DHS datasets and Final Reports, along with a process to estimate other inputs. The framework starts with the household final weight from the household recode file or the woman final weight from the woman recode file. Most of the numbers required to separate the final weight into a cluster-level weight and a household-level (or woman-level) weight are included in the data files or in Appendix A – Sample Design of DHS Final Reports. Some of the required information is not available there (see Table 1), but we provide guidance on how to estimate these inputs with other publicly available data. In this way, we can estimate or approximate the level-weights for the clusters and households (or women).
Have these level-weights been used in any DHS analysis?
Tom: This report shows how to use data from the 2015 Zimbabwe DHS to estimate level-weights and then include them in a multilevel regression model. We fitted several regression models with data for married women in 400 clusters to examine modern contraceptive use with age, education, residence, and number of children as covariates. We provide the STATA code for this example.
The recently released Analytical Study Contraceptive Use, Method Mix, and Method Availability is the first DHS research to use the proposed methodology. This analysis used the method described here to estimate cluster-level and woman-level weights and then to assess the effect of cluster-level and woman-level factors on contraceptive use in Haiti and Malawi.
Can this approach be used for other surveys?
Yes! The approximation approach is valid for other household surveys, such as the Malaria Indicator Survey (MIS) and UNICEF’s Multiple Indicator Clusters Survey (MICS), so long as the inputs for the framework are available.
Dr. Mahmoud Elkasabi is a Sampling Statistician at The DHS Program. He joined The DHS Program in 2013 after earning his Ph.D in Survey Methodology from the University of Michigan at Ann Arbor, with a specialty in Survey Statistics and Sampling. Dr. Elkasabi is responsible for the sampling design for the DHS surveys as well as building sampling capacity in many countries, such as Ghana, Egypt, Nigeria, India, Malawi, Zambia, Bangladesh, and Afghanistan. Dr. Elkasabi likes to work closely with the sampling statisticians in different countries. In these win-win relationships, he shares his knowledge in sampling and gains new knowledge & experiences.
Dr. Tom Pullum directs the research program, including the analysis of DHS data beyond the country reports, such as the analytical studies, comparative reports, further analysis studies, and methodological reports. He also has overall responsibility for The DHS Fellows Program and workshops. Current interests include maternal mortality and the measurement of child vulnerability. A continuing effort is the adaptation of demographic methods to statistical frameworks and software. His work with DHS has included methodological reports on data quality. He joined the DHS staff in 2011, following a lengthy career in academia, primarily at the University of Texas at Austin. Dr. Pullum has a Ph.D. in sociology from the University of Chicago.