(Complete toolkit available at http://emergency.cdc.gov/disasters/surveillance/pdf/CASPER_Toolkit_Version_2_0_508_Compliant.pdf)
Households selected in cluster sampling have an unequal probability of selection. To avoid biased estimates, all data analyses should include a mathematical weight for probability of selection. Once all data are merged into a single electronic dataset, a weight variable must be added to each surveyed household by use of the formula below:
Weight = Total number of housing units in sampling frame
(number of housing units interviewed within cluster)*(number of clusters selected)
The sampling frame, referred to in the numerator, is defined as the entire assessment area in which CASPER is being conducted. The numerator is the total number of housing units in the sampling frame, and that number will be the same for every assessed household. To calculate the total number of houses in the sampling frame, follow the steps outlined in Section 3.1.3 and sum the “housing units” column (e.g., 6292 houses in Caldwell County, Kentucky).
If sampling has been 100% successful and information was obtained from exactly seven households in exactly 30 clusters, the denominator will be 7 * 30 = 210 for every housing unit. The sample, then, is self-weighting because all housing units in the sample had an equal probability of being selected. Likely, obtaining seven households in each of the 30 clusters will not be possible. When this occurs, the denominator will be different for each surveyed household, depending on the cluster from which the housing unit was selected. Households from the same cluster will have the same weight, but weights will differ between clusters. For example, if only five completed interviews occurred in a cluster, the denominator of the weight for each of the five surveyed households would be 5 * 30 = 150.
The “number of clusters selected” will be 30, even if there are some clusters with zero interviews. The only exception is if the decision to oversample clusters was made a priori (see Section 3.2).
The table depicted in Figure 11 displays the sampling weights for a CASPER conducted in Kentucky following the major ice storms in 2009. In stage one of sampling, 30 clusters were selected representing 19,370 housing units. The goal was to conduct 210 interviews, but only 187 were completed. For the purpose of calculating the “weight” column (highlighted in yellow), an additional column was added, “# interviews,” to represent the number of housing units interviewed within a cluster (highlighted in blue).
Once weights are assigned, frequencies can be calculated for each of the interview questions. To calculate frequencies in Epi Info™ 7 “classic mode”, read (import) the data file with the weight that was just created. Click on “Frequencies” along the left hand column. In the “frequency of” box, select each variable for which you would like results and, in the “weight” box, select the variable “WEIGHT” that was just created. Finally, click “OK” (Figure 12) and a report will be generated providing the estimates.
Figure 13 displays the Epi Info™ output window with the selected variables, followed by a table for each selection. These output tables should be saved for use in the report.
To obtain unweighted estimates, follow the above instructions, but do not assign a variable in the “weight” box. Applying the weights provides projected estimates that can be generalized to every housing unit in the assessment area or sampling frame. Table 7 shows the unweighted and weighted frequencies for a specific question from the 2009 Kentucky Ice Storm CASPER.
|
Weighted |
|
|
|
|
|
||
|
Unweighted |
|
Weighted |
|||||
Characteristic |
Frequency |
Percent |
|
Frequency |
Percent |
95% CI |
||
Source of Electricity |
|
|
|
|
|
|
||
Power company |
137 |
74.1 |
|
14190 |
74.0 |
61.9-86.0 |
||
Gasoline generator |
29 |
15.7 |
|
3200 |
16.7 |
7.6-25.7 |
||
None |
19 |
10.3 |
|
1789 |
9.3 |
3.8-14.8 |
Remember that weighted analysis does not account for the changes that may occur in the number of households between the time of the census and the time of the assessment (e.g., the number of households per cluster may have changed between 2000, when the census was conducted, and 2009, when the CASPER was conducted). Therefore, despite attempts to present unbiased estimates, the frequencies reported might lack precision.
The 95% confidence intervals (CIs) should be provided with the weighted estimates. These confidence intervals indicate the reliability of the weighted estimate. Follow these steps to calculate 95% confidence intervals in Epi Info™ 7:
Open Epi Info 7 in classic mode (Figure 14).
Read (import) the data file.
Select “Complex Sample Frequencies Command” under advanced statistics, and in the dialog box for Frequency, select the variable(s) in which you are interested (Figure 15).
Under the Weight drop-down menu, select the “weight” variable for calculating the weighted CI.
Under PSU, select the “Cluster Number” variable and Click OK (Figure 16).
Right-click on the table and select “Export to Microsoft Excel”.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Author | Nicole Nakata |
File Modified | 0000-00-00 |
File Created | 2021-01-22 |