Category Archives: User Forum

07 Oct 2020

Luminare: Providing Sample Weights for Multilevel Analysis while Protecting Confidentiality

This blog post is part of Luminare, our blog series exploring innovative solutions to data collection, quality assurance, biomarker measurement, data use, and further analysis.

The DHS Program recently published a Methodological Report providing a framework for estimating “level-weights” in DHS surveys – weights that correspond to each stage of sampling. These weights are required for multilevel modeling. While the audience for the framework itself is academic researchers, the challenge of protecting respondent confidentiality while supporting data analysis is of general interest. 

We sat down with two of the authors, Mahmoud Elkasabi, Senior Sampling Statistician, and Tom Pullum, Senior Advisor for Research and Analysis, to learn more about this innovative strategy.

How did the idea for this activity come about?

Post from Data User:

I have been reading the posts on the forum regarding the use of weights with multilevel analyses and wanted to check to see if there were any updates on recommendations on how to go about this. . . Since we cannot separate out the household weights from the cluster weights to incorporate them in the statistical coding, does the DHS have any recommendations on how to go about running multilevel models with DHS data? . . . I would like to run multilevel models looking at childhood vaccinations and want to make sure I am going about it in the most proper way.  Any help or guidance on this from those at DHS or out in the forum would be greatly appreciated!

Mahmoud:  There has been huge user demand for DHS survey level-weights. We have seen many posts on The DHS Program User Forum over the years, where analysts are trying to apply weights in multilevel analysis. It is a common type of research question, to use multilevel modeling to understand the effects of cluster-level characteristics such as region on individual-level outcomes, such as contraceptive use or children’s nutritional status.

For those of us who aren’t statistically inclined, why do researchers need to include sampling weights in their analysis?

Mahmoud: Sampling weights compensate for different probabilities of selection within the samples, and for different levels of non-response. Providing weights at multiple levels allows for the best level of representativeness for that unit. That is, the data from each interviewed woman becomes as representative as possible of similar women in the population. That is ultimately the goal of a survey: to obtain data that are nationally and subnationally representative without interviewing the entire population.

Why aren’t level-weights standardly provided with DHS datasets?

Mahmoud: After a survey is completed, The DHS Program destroys the information required for exact calculation of the cluster weights. Providing the true cluster-level weight for each cluster would pose a risk to respondent confidentiality—anyone with access to the sampling frame could use the cluster-level weights to identify the specific clusters that were drawn in the sample—and then, potentially, identify households or individuals. For that reason, The DHS Program only releases the final survey weights in the datasets.

How does the level-weights framework respond to the challenge of protecting confidentiality? 

Tom: We propose a framework that uses publicly available data from DHS datasets and Final Reports, along with a process to estimate other inputs. The framework starts with the household final weight from the household recode file or the woman final weight from the woman recode file. Most of the numbers required to separate the final weight into a cluster-level weight and a household-level (or woman-level) weight are included in the data files or in Appendix A – Sample Design of DHS Final Reports. Some of the required information is not available there (see Table 1), but we provide guidance on how to estimate these inputs with other publicly available data. In this way, we can estimate or approximate the level-weights for the clusters and households (or women).

Have these level-weights been used in any DHS analysis? 

Tom: This report shows how to use data from the 2015 Zimbabwe DHS to estimate level-weights and then include them in a multilevel regression model. We fitted several regression models with data for married women in 400 clusters to examine modern contraceptive use with age, education, residence, and number of children as covariates. We provide the STATA code for this example.

The recently released Analytical Study Contraceptive Use, Method Mix, and Method Availability is the first DHS research to use the proposed methodology. This analysis used the method described here to estimate cluster-level and woman-level weights and then to assess the effect of cluster-level and woman-level factors on contraceptive use in Haiti and Malawi.

Can this approach be used for other surveys?

Yes! The approximation approach is valid for other household surveys, such as the Malaria Indicator Survey (MIS) and UNICEF’s Multiple Indicator Clusters Survey (MICS), so long as the inputs for the framework are available.


Dr. Mahmoud Elkasabi is a Sampling Statistician at The DHS Program. He joined The DHS Program in 2013 after earning his Ph.D in Survey Methodology from the University of Michigan at Ann Arbor, with a specialty in Survey Statistics and Sampling. Dr. Elkasabi is responsible for the sampling design for the DHS surveys as well as building sampling capacity in many countries, such as Ghana, Egypt, Nigeria, India, Malawi, Zambia, Bangladesh, and Afghanistan. Dr. Elkasabi likes to work closely with the sampling statisticians in different countries. In these win-win relationships, he shares his knowledge in sampling and gains new knowledge & experiences.

Dr. Tom Pullum directs the research program, including the analysis of DHS data beyond the country reports, such as the analytical studies, comparative reports, further analysis studies, and methodological reports. He also has overall responsibility for The DHS Fellows Program and workshops. Current interests include maternal mortality and the measurement of child vulnerability. A continuing effort is the adaptation of demographic methods to statistical frameworks and software. His work with DHS has included methodological reports on data quality. He joined the DHS staff in 2011, following a lengthy career in academia, primarily at the University of Texas at Austin. Dr. Pullum has a Ph.D. in sociology from the University of Chicago.

30 Mar 2016

Model Datasets to the Rescue

Have you ever wanted to start immediately working on a DHS dataset, but didn’t have a research topic? Or didn’t want to take the time to register for access? Well, The DHS Program now has the cure for all your data analysis woes!

The DHS Program has created model datasets so users can become familiar with datasets without having to register for access. These datasets have been created strictly for practice and do not represent any country’s actual data. Model datasets are based on the DHS 6 Questionnaire and Recode. They include data on all standard survey characteristics, as well as data on domestic violence, female genital cutting, adult and maternal mortality, and child labor.

You might be thinking, how can I use these datasets? Model datasets can be used for many different purposes, including:

  • Replicating standard final report tables
  • Practicing calculating complex indicators
  • Teaching statistical concepts and procedures

Team members from Nigeria participating in the 2016 Regional DHS/MIS Malaria Analysis Workshop

Recently, the model datasets were used in the 2016 Regional DHS/MIS Malaria Analysis Workshops in Uganda and Senegal. Since participants attending the workshop came from different countries with different DHS/MIS datasets, the curriculum and workshop exercises were standardized using the model datasets. After going through the model dataset examples, participants then worked with their country’s specific data to match numbers in the final report. This was a great way for facilitators to make sure everyone was mastering the skill before participants worked on their own country’s data.

Model datasets have already had a starring role in our sampling and weighting tutorial videos. Future videos will also feature the model datasets, allowing users to follow along with the examples in the tutorial with their own statistical program.

Visit the Model Datasets page on The DHS Program website for more information. Users can pick and choose which data files to download, as well as download the full set of final report tables and sampling errors to check their work. Again, unlike datasets for specific surveys, users do not need to register in order to gain access.

If you have recently used the model datasets we want to hear from you! Comment below or email modeldatasets@dhsprogram.com to share your experiences with the model datasets or how you plan on using them in the future. You can also post questions about the model datasets on the User Forum.

22 Jul 2015

Linking DHS Data with Health Facility Data: Opportunities and Challenges

For 30 years, The DHS Program has asked women hundreds of questions about their utilization of various health care services, including family planning, antenatal and delivery care, vaccination and treatment of sick children, malaria treatment, and HIV prevention and treatment. In 1999, The DHS Program started collecting facility-level data through the Service Provision Assessment (SPA) survey. The SPA interviews providers and clients, takes stock of facility supplies and equipment, and observes provider-client consultations.  

© Ibou GUISSE/ANSD

© Ibou GUISSE/ANSD

Many people hoped that the two datasets would be easy to link for a deeper understanding of how people access services, the quality of services, and the association between access to services and health outcomes in a given country.  And because most recent DHS and SPA surveys are geo-coded (DHS since 2000 and SPA since 2009), that is, clusters and facilities are identified with their latitude and longitude, linking the data through a geographic information system should be easy, right?

Several studies have looked at using geospatial analysis to link DHS and SPA data to answer these larger questions about access to and utilization of health care services.  There are several challenges to this type of linkage. A major concern is sampling: the DHS and SPA surveys have different sampling frames and are rarely conducted in the same year. Most SPAs are samples of the health care facilities in the country, not a census. Many individuals surveyed in a DHS likely visit some of the health facilities that were not selected for the SPA. So just because a woman’s cluster is closest to a certain facility included in the SPA does not mean that that is the facility the woman visits.

CaseSTudy_figure_CropIn addition, to protect the identity of respondents, the GPS locations of DHS cluster points are geo-masked. In densely populated areas, this means that clusters may be moved away from their closest health facilities, making linkage based on geographic location less accurate. There’s also a practical concern: the DHS does not ask where individuals receive health care but rather only the type of facility where they sought care. While some people probably use their closest health facility, this is not always the case.  People may choose health facilities based on quality, specialty, cost, or anonymity, not just proximity.

SAR10Despite the challenges there have been several successful analyses linking DHS and SPA data, and program managers and researchers continue to explore the best use cases for DHS-SPA linkage.  This will be the topic of our upcoming DHS webinar on July 28th, “Considerations when Linking DHS Household Data to Data on Health Facilities and Infrastructure.”  Clara Burgert, The DHS Program’s GIS Coordinator and author of the recently released Spatial Analysis Report “Linking DHS Household and SPA Facility Surveys: Data Considerations and Geospatial Methods”  will be making a presentation on the DHS-SPA linkage opportunities and challenges.  Interested participants can register for the webinar here, and are encouraged to read SAR10 and post discussion questions for Clara and her co-authors on The DHS Program User Forum here.

Update: This webinar event has ended. Please visit this feed on The DHS Program User Forum for the presentation, discussion, and additional resources.

20 May 2015

Your Questions on Weighting Answered: The DHS Program User Forum and Webinar

webinar_experts

The DHS Program regularly gets questions from users about sampling and weighting. “How do I apply sample weights in multilevel analyses?” or “What is the difference between self weighting data and non-self weighting data?” On June 3, 2015, three DHS experts will answer users’ questions on “Weights and other adjustments for the survey design” in our first ever live webinar. Drs. Tom Pullum, Ruilin Ren, and Mahmoud Elkasabi will be discussing common questions about sampling and weighting in DHS data collection and analysis.

Launched in 2013, The DHS Program User Forum was created to provide a transparent discussion platform for users of DHS data to ask questions and receive feedback from the broader community and DHS Program staff. To date, more than 1,700 users have registered on the user forum and posted over 3,000 messages. To quote one registered user:

“The forum is helping millions of DHS data users around the world to understand data and sort data management issues. I personally managed to merge DHS data with the help of the forum contributor.”

MemUser Forum Screen shotbers can post questions in dozens of threads in three main categories: Topics (i.e. child health, mortality, and wealth index), Countries (India and Bangladesh are currently  the most active), or Data (merging, sampling and weighting, geographic data, dataset use
in Stata and SPSS). While we encourage users to answer each other’s questions, The DHS Program staff  members do moderate the forum and provide answers when others do not. But increasingly, members are often able to find the answer to their question simply by searching the 3,000+ messages that are already in the forum. Registered users say:

“I’m likely to post again to the User Forum because, when I post not only do I get a quick and helpful answer, but also it lets me see what other users have posted and been given as answers, which opens my eyes and mind to other future research.”

Participate in the User Forum and the Webinar on June 3, 2015
To post a question on weighting for the webinar, simply visit the User Forum thread “Sampling and Weighting Webinar June 2015.” Then, on June 3, 2015, at 10am EST (UTC/GMT-4) join us live in our Adobe Connect room. A recording of the webinar will be available on the User Forum for those who cannot participate live, and a summary of the questions and answers will also be entered into the User Forum as individual messages for future reference.

The information provided on this Web site is not official U.S. Government information and does not represent the views or positions of the U.S. Agency for International Development or the U.S. Government.

The DHS Program, ICF
530 Gaither Road, Suite 500, Rockville, MD 20850
Tel: +1 (301) 407-6500 • Fax: +1 (301) 407-6501
dhsprogram.com