Harnessing Technology to Streamline Data Collection
By Guillermo Rojas
The survey process at The DHS Program takes an average of 18-20 months and goes through several steps: survey preparation and questionnaire design, training and fieldwork, data processing, and finally, writing the final report and dissemination. But how do the data get from respondents’ households into the tables you see in the final report?
We employ field interviewers to ask respondents the questions included in the DHS questionnaires – household, woman’s, man’s, and biomarkers. But the way we record their answers changes based on the data collection methodology. At The DHS Program, we employ three types of methodologies to collect data: paper questionnaires, Computer Assisted Field Editing (CAFE), and Computer Assisted Personal Interviews (CAPI).
The vast majority of DHS surveys in the past 30 years have used paper questionnaires to collect data. With physical paper questionnaires in hand, field interviewers go from house to house, ask the questions of the respondents, and manually fill out the questionnaires. After interviewers visit all households within a cluster, supervisors ship the questionnaires to the survey central office. Upon arrival, the data processing begins for that particular cluster.
The Computer Assisted Field Editing (CAFE) system allows for editing to happen as interviews are taking place. With CAFE, interviewers still use paper questionnaires, but Field Editors enter the questionnaires into computers while the team is still in the cluster. Essentially, questionnaires are fully field edited by an intelligent data entry program. With this type of data collection approach, Field Editors provide feedback to interviewers on any anomaly identified by the program such as interviewers missing full sections of the questionnaire or wrongly executing critical skip patterns. At this point in the survey process, it is relatively easy to send the interviewer back to the household to resolve any problems. With this approach, there is no need for main data entry as the data entered in the field is sent via the internet to the central office. Therefore, CAFE speeds up the survey process as cluster data files are available as soon as the data arrive to the central office for further processing.
The 2005 Colombia DHS was the first DHS survey to utilize the Computer Assisted Personal Interview (CAPI) methodology. CAPI does not involve any type of paper questionnaire—it is entirely digital. Back in 2005, field interviewers used bulky laptops, though nowadays we use lighter tablets and notebook computers.
The DHS CAPI data collection system consists of three comprehensive subsystems:
1. A system for interviewers to facilitate the interview process
2. A system for supervisors to centralize the data collected by interviewers
3. A system for the central office to monitor the fieldwork operation and to further process the data
The DHS CAPI system uses Bluetooth technology to transfer and share data among members of the same fieldwork team. Supervisors then send data to the central office headquarters using the Internet File Streaming System (IFSS), a cloud-based electronic file delivery system developed by The DHS Program. The primary objective of the service is to deliver files from one user to another in an exceptionally fast and secure way.
In the past 30 years, we’ve witnessed an incredible change in technology, especially with both hardware and software. When I first started at The DHS Program, running the program to impute the woman’s events dates could easily take more than six hours for a survey with a sample size of 2,000 to 3,000 households! Nowadays, with sample sizes of 20,000 to 30,000, this program takes just one to two minutes to run. CAFE and CAPI allow us to use the power of these newer innovations in technology to make sure that we carry out DHS surveys as efficiently and accurately as possible.
Guillermo Rojas is Chief of Data Processing at The DHS Program. He has more than 35 years of experience in computer science and survey data processing, and has provided data processing technical assistance and training for more than 20 surveys. Since the early stages of The Demographic and Health Surveys (DHS) program, Mr. Rojas has been involved in the design and development of the data processing methodology currently being used to process and analyze DHS surveys. He is the primary writer of the master programs for implementing the evolving data processing methodology. Mr. Rojas coordinates all DHS data-processing activities and supervises personnel to ensure the accuracy and quality of the processes implemented.