Sampling and Weighting with DHS Data

Written by: Mahmoud Elkasabi

16 Sep, 2015

At long last, The DHS Program has released two videos which demonstrate how to weight DHS data, concluding the Sampling and Weighting video series.

2012 Tajikistan DHS

2012 Tajikistan DHS

The first video in the series, Introduction to DHS Sampling Procedures, as well as the second
video, Introduction of Principles of DHS Sampling Weights, explained the basic concepts of sampling and weighting in The DHS Program surveys using the 2012 Tajikistan DHS survey as an example. Read our introductory blog post for more details.

In contrast, the third and fourth videos use an Example Practice Dataset, so viewers can practice weighting DHS data and replicate what is being shown in the videos while they are watching. The Example Practice Dataset was specifically created for DHS data users to have hands-on practice using DHS data in different statistical packages (Stata, SPSS and SAS) and does not represent the data of any actual country.

The third video, How to Weight DHS Data in Stata, explains which weight to use based on the unit of analysis, describes the steps of weighting DHS data in Stata and demonstrates both ways to weight DHS data in Stata (simple weighting and weighting that accounts for the complex survey design).

 

The fourth video, Demonstration on How to Weight DHS Data in SPSS and SAS, is the same as the third video, except it uses the statistical software packages SPSS and SAS instead of Stata.

After watching these videos, you will be able to answer the following questions:

  • Which weights should I use for my analysis?
  • What are the steps of weighting data in a statistical software package?
  • How do I weight DHS data in Stata, SPSS or SAS?
  • How do I account for the complex sample design when weighting in Stata, SPSS or SAS?

If you have more questions, visit the user forum!

What did you learn from the sampling and weighting videos? What would you like to explore further? Comment below!

Author

  • Dr. Elkasabi is a Sampling Statistician at The DHS Program. He joined The DHS Program in 2013 after earning his Ph.D in Survey Methodology from the University of Michigan at Ann Arbor, with a specialty in Survey Statistics and Sampling. Dr. Elkasabi is responsible for the sampling design for the DHS surveys as well as building sampling capacity in many countries, such as Ghana, Egypt, Nigeria, India, Malawi, Zambia, Bangladesh, and Afghanistan. Dr. Elkasabi likes to work closely with the sampling statisticians in different countries. In these win-win relationships, he shares his knowledge in sampling and gains new knowledge & experiences.

12 thoughts on “Sampling and Weighting with DHS Data

  1. Dear
    I still have confusion in how to weighting DHS sample. I also had poor internet access in Ethiopia to follow the videos in examples. How would you help me in order to weight the Ethiopian 2011 DHS.

  2. I wish to calculate mean of height-for-age z-score (HAZ) and their standard error considering the sampling technique. The results I want to estimate by small areas like sub-district/district. I am using 2011 BDHS data and trying to calculate mean HAZ and mean of (HAZ < -2.00 SD) according to sub-district.
    I am using STATA command as below. Fortunately we get reasonable results for district but unrealistic results by sub-district, particularly for small size sub-district. For some sub-districts, the mean HAZ becomes zero with zero standard error and similar is observed for HAZ= 601.

    gen HW70n= HW70/100.

    g COSUBDIST= CODIST*100+ COTHANA

    svyset [pw= V005_rewtd], psu (V001) strata (V023)

    univar HW70n, by (COSUBDIST)

    tabstat HW70n , by(COSUBDIST) stat(n, mean semean)

    # HAZ < -2.0

    g HW_2=0.
    replace HW_2=1 if HW70n <= -2.
    tabstat HW_2 , by(COSUBDIST) stat(n, mean semean)

    # Results are like theses.

    OSUBDIST | N mean se(mean) sd
    ———-+—————————————-
    108 | 18 .0555556 .0555556 .2357023
    114 | 15 .0666667 .0666667 .2581989
    134 | 14 .0714286 .0714286 .2672612
    156 | 18 .1666667 .0903877 .3834825
    160 | 17 .1176471 .0805474 .3321056
    177 | 18 .0555556 .0555556 .2357023
    373 | 16 .125 .0853913 .341565
    409 | 25 .16 .0748331 .3741657
    428 | 32 .03125 .03125 .1767767
    447 | 8 0 0 0
    485 | 11 .0909091 .0909091 .3015113
    602 | 8 0 0 0
    603 | 11 0 0 0
    607 | 21 .1904762 .0878052 .4023739
    610 | 23 .0434783 .0434783 .2085144
    632 | 12 .0833333 .0833333 .2886751
    636 | 15 .1333333 .0908514 .3518658
    651 | 107 .0747664 .0255462 .2642517
    662 | 19 .1578947 .085947 .3746343

    Can you explain why I get such results of zero? Can I do such spatial analysis in such way?

    Regards,
    Sumon

  3. Hello DHS Program!
    How to deal with non-response, when an entire cluster is dropped (for instance due to security or inaccessibility or bad data)? Shall that cluster be included in household response rate? why? how?

    Why such this in not discussed in the internet?

  4. Hello!

    This question is not in reference to DHS but I’m hoping to get help from someone through this forum.

    How can we use weights on a data set that is originally representative at provincial level and make it representative at district level?
    Since the variables I’m using are at a district level, using a provincially representative data set will result in a sampling bias. So I’m trying to somehow weight the data in a way that it becomes representative at the district level.

    It would be of immense help if i can get a response for my query.

    • This is a great question for The DHS Program User Forum. Visit the “Weighting Data” thread to post your question or look for help. https://userforum.dhsprogram.com/index.php?t=thread&frm_id=33&

  5. Is it possible to have a different total (of in-migrants for instance) when using or not the weights? I am using the HH weights and the PR dataset in order to find the number of in-migrants moving from their place of birth to their place of residence. But, using the weights, the number of total in-migrants is different from the one without the weigths.

  6. Hello !
    How can I weight for three merged surveys of a country (ZDHS) to calculate U5M using STATA and R commands?

    Regards,
    Amanuel

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Anthropometry measurement (height and weight) is a core component of DHS surveys that is used to generate indicators on nutritional status. The Biomarker Questionnaire now includes questions on clothing and hairstyle interference on measurements for both women and children for improved interpretation.