Appendix L - Sampling and Weighting Plan

Appendix L - Sampling and Weighting Plan.pdf

NATIONAL YOUTH PHYSICAL ACTIVITY AND NUTRITION STUDY

Appendix L - Sampling and Weighting Plan

OMB: 0920-0832

Document [pdf]

Download: pdf | pdf

Detailed Sampling and Weighting Plan for the Student Survey

SAMPLING AND WEIGHTING PLAN
NATIONAL YOUTH PHYSICAL ACTIVITY AND NUTRITION SURVEY
The objective of the sampling design is to support estimation of the physical activity and
nutrition characteristics of students in a nationally representative population of 9th through 12th
graders, by gender and by age or grade. National estimates of all high school students are
specifically required as well as estimates by grade, by gender, and by race/ethnicity for white,
black, and Hispanic youth. The survey will emulate the sampling design of the Youth Risk
Behavior Survey (YRBS).
The sampling universe for the national survey will consist of public, Catholic, and other private
school students in grades 9 through 12 in the 50 states and the District of Columbia.
L.1

ESTIMATION AND JUSTIFICATION OF SAMPLE SIZE

L.1.1 Overview
As in the regular YRBS, the study will be designed to produce most estimates accurate to within
±5 percent at 95 percent confidence. The YRBS estimates meet this standard for overall
estimates and estimates by grade or gender or race/ethnicity, and meet a looser design target of
±5 percent at 90 percent confidence for estimates by grade and race/ethnicity. For the NYPANS,
the tighter precision levels are required for the key racial group estimates overall but not by
grade. Because some of the precision requirements are not as tight for the NYPANS as for the
YRBS, we can consider sample sizes that are smaller than the typical YRBS. Nevertheless, we
propose to replicate in most respects the sampling parameters used in the 2009 YRBS because
they met the levels of precision required for CDC's purposes.
The proposed sample consists of 75 primary sampling units (PSUs), defined as a county or a
group of counties. In each PSU, two different schools will contribute classes at each grade in the
9 through 12 range. The actual number of sampled schools will be greater than 75 x 2 = 150
because (1) some schools contain only some of the targeted grades (e.g., schools with grades 7th
– 9th) and (2) small schools are selected in a subset of the 75 PSUs over and above those initially
selected. A small school has an enrollment that is insufficient to generate the equivalent of one
full class section at each targeted grade contained in the school. We expect that approximately
159 sample schools will be selected to generate about 8,000 respondents. We anticipate that
approximately 127 schools will participate in the study (for a projected 80% school participation
rate).
We will select one class per school per eligible grade. We expect that a final sample of
approximately 8,000 respondents will be obtained on the NYPANS.
L.1.2 Expected Confidence Intervals for the NYPANS
Confidence intervals vary depending upon whether the prevalence estimate is for the full
population or for a subset, such as a particular grade or gender. They also vary from one variable
to another. Within a grouping, they also vary depending on the level of the estimate and the
design effect associated with the measure.

Precision for Subgroup Estimates. For the design of the 2010 study, we considered two
scenarios defined according to whether or not we would implement the double sampling of
sections in high-minority schools. The selection of two sections per grade in these schools leads
to larger numbers of students in the two minority groups and overall. On the other hand, the
double sampling of sections is not only more costly but also leads to higher design effects due to
clustering.
We used the 2005 YRBS sample schools to simulate and compare the two scenarios, with and
without double sampling of sections in high-minority schools. Table L-2 presents a summary
comparison of the two scenarios in terms of expected yields in the two key minority groups,
blacks and Hispanics.
Table L-2. Expected number of participating students in the two minority groups for the
two sampling scenarios
Ethnic Minority Group
Blacks
Hispanics

With double sampling
3524
3278

Without double sampling
2359
2420

To evaluate the precision expected for these groups under the two scenarios, we considered a
range of design effects. For the first scenario, the average design effect is expected to mirror the
DEFFs estimated from the YRBS for these key variables. Table L-3 provides empirical
estimates of standard errors and DEFFs computed for the 2005 YRBS survey. While we use the
average DEFF of 2.5 to estimate the precision expected for the first scenario (typical YRBS
design), we assume a lower DEFF=2.0 for the second scenario.
With these realistic parameters, the effective sample sizes for blacks and Hispanics are shown in
Table L-4. Effective sample sizes, computed as actual sample sizes divided by the design effect,
are not only simple summary sample sizes but also provide a direct link to the precision results
discussed next.
Table L-5 provides the precision expected for these different scenarios. The table shows the
expected standard error and confidence intervals for estimated percentages (or proportions). The
exhibit shows the precision expected for subgroup estimates based on the two scenarios: a)
n=1400, the approximate subgroup size anticipated under the first scenario, and b) n=1200, the
approximate subgroup size anticipated under the second scenario.

Table L-3. Design Effect (DEFF) for Physical Activity and Nutrition Variables
for the 2005 YRBS

qn83
qn84
qn85
qn86
qn87
qn88
qn89
qn90
qn76
qn77
qn78
qn79
qn80
qn81

Estimated
Proportion
0.828
0.5622
0.8823
0.1693
0.1419
0.103
0.0643
0.0846
0.8208
0.15
0.6371
0.2626
0.3565
0.3757

Std Error
0.0229
0.0121
0.0094
0.004
0.0043
0.008
0.0049
0.0058
0.0046
0.0108
0.0075
0.0078
0.0113
0.0133

DEFF
4.8413
2.7881
3.2658
1.2058
1.4127
2.8481
2.1743
2.3834
1.3611
3.6611
1.7862
2.0714
2.7994
3.1474

AVERAGE
DEFF:
2.553293

Table L-4. Effective sample sizes expected for blacks and Hispanics
for the two sampling scenarios
Ethnic Minority Group
Blacks
Hispanics

With double sampling

Without double sampling
1409.6
1179.5
1311.2
1210

Table L-5 Precision expected for racial subgroup estimates: standard error of estimated
percentages and associated 95% confidence intervals
a) First scenario: double sampling
Estimated Percentage
5%
10%
15%
20%
50%

Standard Error
0.58%
0.80%
0.95%
1.07%
1.34%

a) Second scenario: no double sampling
Estimated Percentage
Standard Error
5%
10%
15%
20%
50%

0.63%
0.87%
1.03%
1.15%
1.44%

95% Confidence
Interval
1.14%
1.57%
1.87%
2.10%
2.62%

95% Confidence
Interval
1.23%
1.70%
2.02%
2.26%
2.83%

The precision results shown in Table L-5 demonstrate that subgroup estimates will be within plus
or minus 3 percentage points (95% confidence intervals) under either scenario. This discussion
demonstrates that the planned design, which does not include double sampling of sections, will
ensure the precision for subgroup estimates as well as for overall study estimates.
In the 2005 YRBS, the school sample included 159 participating schools. Using the 2005 YRBS
school sample sizes (n=159 participants), the projected total number of participating students
goes down from 12,231 with double sampling to 10,255 without double sampling.
For the target 8,000 participating students, and under the no double sampling scenario, we
deflate the number of participating schools so that the projected number of participating schools
becomes 127.
It may be useful to examine also the precision expected for grade-level estimates under the
planned scenario without double sampling. For the planned sample of 8,000 students over 127
participating schools, the per-grade sample size would be approximately 2,000. Table L-6 shows
the expected precision for grade-level estimates, again assuming DEFF=2 for scenario#2 (no
double sampling).

Table L-6. Precision expected for grade-level subgroup estimates: standard error of
estimated percentages and associated 95% confidence intervals
Estimated Percentage

Standard Error

5%
10%
15%
20%
50%

0.69%
0.95%
1.13%
1.26%
1.58%

95% Confidence
Interval
1.35%
1.86%
2.21%
2.48%
3.10%

Also for grade-level estimates confidence intervals will be within +/- 3 percentage points.
In summary, the planned sample sizes are the minimal necessary to ensure that all key estimates
will achieve the required levels of precision for the NYPANS. The anticipated n=160 school
selections will generate 127 participating schools (79% participation rate) and approximately
8,000 participating students
L.1.3 School and Student Non-response
The school participation rate over the prior 10 YRBS studies has been between 75 and 80
percent; the average student participation rate has been 86 percent. To be conservative, we will
assume response rate values for the NYPANS similar to average YRBS values, subject to future
re-evaluation.
L.2

SAMPLING METHODS

L.2.1 Overview
The sampling universe for the NYPANS will consist of all public, Catholic and other private
school students in grades 9 through 12 in the 50 states and the District of Columbia. The sample
will be a stratified, three-stage cluster sample stratified by racial/ethnic status and urban versus
rural. PSUs are classified as "urban" if they are in one of the 54 largest MSAs in the U.S.;
otherwise, they were classified as "rural". Additional, implicit stratification will be imposed by
geography by sorting the PSU frame by state and by 5-digit Zip Code (within state). Within each
stratum, a primary sampling unit (PSU), defined as a county or a group of counties, will be
chosen without replacement at the first stage. In subsequent sampling stages, a probabilistic
selection of schools and students will be made from the sample PSUs. Table L-7 presents a
summary of the sampling design features.
Two strategies will be employed to achieve over-sampling of blacks and Hispanics: (1) larger
sampling rates will be used in high-Hispanic and high-black strata; and (2) a modified measure
of size will be employed that increases the probability of selection of schools with high minority
enrollments.

Table L-7 Key Sampling Design Features

Sampling
Stage
1

Sampling Units
Counties or groups
of counties

Schools

Classes/ students

Sample Size
(Approximate)
75 PSUs

159 selections (2
large schools per
PSU plus 10 small
schools)
1 class per grade
per school:
8,000 participating
students (expected)

Stratification
Urban vs. nonurban (2 strata)
Minority
concentration (8
strata)
Small vs. other

Measure of Size
Aggregate school size in
target grades

Weighted enrollment
(increased for black,
Hispanic groups)

L.2.2 Measure of Size
The sampling approach will utilize Probability Proportional to Size (PPS) sampling methods to
achieve over-sampling of blacks and Hispanics. In PPS sampling, when the measure of size is
defined as the count of final-stage sampling units, and a fixed number of units are selected in the
final stage, the result is an equal probability of selection for all members of the universe. For the
NYPANS, we approximate these conditions, and thus obtain a roughly self-weighting sample.
This section describes the type of measure of size to be employed for selecting PSUs and schools
with over-sampling of blacks and Hispanics.
A function of the form rhH + rbB + roO is used where the r's are the weighting factors for the
Hispanic, black, and Other racial/ethnic groups and the corresponding high school per-grade
enrollment totals are denoted by H, B, and O, respectively. This function will increase the
chances of schools with relatively large minority enrollments entering the sample, and will also
increase the probability of selection for high-minority PSUs.
The effectiveness of a weighted measure of size in achieving oversampling is dependent upon
the distributions of blacks and Hispanics in schools. For example, if U.S. schools had identical
percentages of minorities in every school, then the sample of students from any sample of
schools would mirror the national percentages and use of a weighted measure of size would fail
to oversample blacks and Hispanics. We know this is not the case, however, as the distribution
of high school students with respect to race and ethnicity follows that of the general population,
and here we find a great deal of clustering by race and ethnicity. This observation is further born
out by the success of the use of a weighted measure of size in prior studies as an effective means
of oversampling blacks and Hispanics.

In 1990, Macro conducted a series of simulation studies that investigated the relationship of
various weighting functions to the resulting numbers and percentages of minority students in the
obtained samples.1 These simulation studies have been regularly re-examined, and the
parameters adjusted to fit the changing picture of minority concentrations. In the 2007 and 2009
YRBS cycles, the following weighting function was used for the measure of size:
2H+2B+O
We will perform a new simulation study during the NYPANS design for similar purposes, i.e.,
fine-tuning the measure of size coefficients.
The measure of size will be used to compute stratum and PSU sizes as well. This will have the
effect of increasing the allocation of the sample to the strata with higher concentrations of the
two key minority groups. At the same time, PSUs with high minority concentrations will have a
higher likelihood of being included in the sample.
L.2.3 Definition of Primary Sampling Units
In defining PSUs, several issues are considered:





Each PSU should be large enough to contain the requisite numbers of schools and
students by grade.
Each PSU should be compact geographically so that field staff can go from school to
school easily.
There should be recent data available to characterize the PSUs.
PSUs definitions should be consistent with secondary sampling unit (school) definitions.

Generally, counties will be equivalent to PSUs, except where low population counties are
combined to provide sufficient numbers of schools and students. Also, very large counties are
divided into multiple PSUs so that no one county will be certain of selection. The variance
estimation process is more efficient without the need to account for certainty PSUs. The method
of dividing large PSUs will ensure that each sub-county PSU meets all of the criteria for a PSU.
County population figures will be aggregated from school enrollment data for the grades of
interest. Enrollment data are being obtained from the most recent Common Core of Data from
the National Center for Education Statistics, which are merged on a rolling basis into the current
school and school district data files of Quality Education Data, Inc.
Geographically, the 2010 NYPANS PSU sampling frame will be the same PSUs constructed
from counties for the 2009 YRBS. The schools constituting each PSU, as well as the PSU
measures of size and stratification, will be updated using current QED data.

Errecart, M.T., Issues in Sampling African-Americans and Hispanics in School-Based Surveys. Centers for
Disease Control, October 5, 1990.

L.2.4 Stratification and Selection of PSUs
L.2.4.1Definition of strata
The PSUs will be organized into 16 strata, based on urban/rural location (as defined above) and
minority enrollment. The approach involves the computation of optimum stratum boundaries
using the cumulative square root of “f” method developed by Dalenius and Hodges. The
boundaries or cutoffs change as the frequency distribution (“f”) for the racial groupings change
from one survey cycle to the next. These rules are summarized below, and the boundaries
computed for the 2007 YRBS are shown in Table L-8.


If the percentage of Hispanic students in the PSU exceeds the percentage of black students,
then the PSU is classified as Hispanic. Otherwise it is classified as black. (Table L-8,
column (a)).



If the PSU is within one of the 54 largest MSA in the U.S. it is classified as 'Urban',
otherwise it is classified as 'Rural' (Table L-8, column (b)).



Hispanic Urban and Hispanic Rural PSUs are classified into four density groupings (Table L8, column (c)) depending upon the percentages of Hispanics in the PSU. (Table L-8, column
(d)).



Black Urban and black Rural PSUs are also classified into four density groupings (Table L-8,
column (c)) depending upon the percentages of blacks in the PSU (Table L-8, column (d)),

L.2.4.2Allocation of the PSU sample
The 2010 NYPANS will be based on a larger number of sample PSUs than the typical YRBS
samples; specifically, 75 PSUs rather than a number between 55 and 60. A larger number of
PSUs has the greatest impact on variance reduction. In order to stay as close as possible to
maximum sample efficiency in terms of precision, the initial allocation will be made proportional
to student enrollment. Then, so as to meet design requirements in terms of minority student
yields, we will make adjustments to the initial allocation. These adjustments will be evaluated
and fine tuned using sample simulations. Response rates from prior cycles will be used to
inform the yield computations in the simulations.

Table L-8 First-Stage Strata and Frame PSU Distribution

Predominant
Minority
(a)
Black

Urban/Rur
al
(b)
Urban

Rural

Hispanic

Urban

Rural

Density
Group
Number
(c)
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4

Boundaries
(d)
0% - 22%
22% - 34%
34% - 56%
56% - 100%
0% - 18%
18% - 34%
34% - 58%
58% - 100%
0% - 22%
22% - 34%
34% - 45%
45% - 100%
0% - 22%
22% - 44%
44% - 66%
66 - 100%

Stratum
Code
(e)
BU1
BU2
BU3
BU4
BR1
BR2
BR3
BR4
HU1
HU2
HU3
HU4
HR1
HR2
HR3
HR4

Total
Number
PSU
(f)
91
25
12
8
373
100
94
26
60
13
10
4
373
44
19
13

L.2.4.3Selection of PSUs
Primary sampling units (PSUs) will be selected with the following sequence of steps.


Within each first-stage stratum, the PSUs will be sorted by five-digit zip code to attain a
form of implicit geographic stratification. Implicit stratification, coupled with the
probability proportional to size (PPS) sampling method described below, will ensure
geographic sample representation. With PPS sampling, the selection probability for each
PSU is proportional to the PSU’s measure of size. The following systematic sampling
procedures, similar to those adopted in previous YRBS cycles, will be applied to the
stratified frame to select a PPS sample of PSUs.



Select 75 PSUs with a systematic random sampling method within each stratum. The
method applies within each stratum a sampling interval computed as the sum of the
measures of size for the PSUs in the stratum divided by the number of PSUs to be
selected in the stratum.



Subsample at random 10 of the 75 sample PSUs for the small school sampling.

L.2.5 Selection of Schools
Schools in selected PSUs will be classified as “large” if they have 25 or more students per grade
in all eligible grades, otherwise they will be classified as small. The following procedures,
similar to those used in successive cycles of the YRBS, will be used to select large schools in
each stratum:


Schools will be classified as "whole" if they have all high-school grades 9-12.
Otherwise, they will be considered a "fragment" school. Fragment schools will be linked
with other schools (fragment or whole) to form a cluster school that has all four grades.
We will link schools before sampling using an algorithm that links geographically
proximate schools. Cluster schools are treated as a single school during sampling with
selection performed at the grade level as described below.



The weighted high school per-grade average enrollment will be computed for each
school, to be used as the measure of size. The estimate of enrollment will be developed
by averaging the enrollment at each eligible grade in the school. When enrollment by
grade is not available, we will divide total school enrollment by the number of grades
taught in the school.



Two large schools, or linked school clusters, will be selected in each PSU with
probability proportional to their measures of size.

Small schools will also be selected with an approach that resembles the YRBS methods yet with
modified sample sizes. Specifically, ten small schools will be drawn in the 2010 NYPANS to
represent the small percent of students attending small schools (less than six percent nationwide).
The sample of small schools will be selected in subsample PSUs, with one school selected per
PSU (in this case, n=10 subsample PSUs). All students in eligible grades will be selected per
school. Within each sub-sampled PSU, small schools will be selected with PPS sampling using
the same weighted measure of size used in selecting large schools. This approach minimizes the
linking of schools to create linked sampling units that span all grades and have a required
minimum grade size for selection.
L.2.6 Grade Selection
Except for school clusters, all eligible grades (i.e., 9-12) are included in the class selection in
each school. In school clusters, grade samples are selected independently with one component
school being selected for each grade.

L.2.7 Selection of Classes
The method of selecting students will vary from school to school, depending upon the
organization of that school and whether a school cluster is involved. The key element of the
school sampling strategy is to identify a structure that partitions the students into mutually
exclusive, collectively exhaustive groupings that are of approximately equal sizes and that are
accessible. Beyond that basic requirement, we will do the partitioning to result in groups in
which both genders and students of all ability levels are represented. In selecting classes, we will
generally give preference to selecting from mandatory courses, such as English. Another option
is to select from all classes that meet during a particular time of day such as all second or third
period classes.
We will not use special procedures to sample for minorities at the school building level for two
reasons:



Schools do not maintain student rosters that identify students by racial/ethnic affiliation.
We feel this would be viewed by many schools as an offensive practice.

We plan to select one class at each grade level from each participating school. In the case of
school clusters, we will conduct our sampling on a grade by grade basis. At each grade we will
determine the identity of all schools in the cluster with students in that grade. If each school has
enough students in the grade, then we will pick randomly one of the schools with probability
proportional to grade enrollment and then select one class per grade from that school. If one of
the schools does not have enough students in a grade, then its students will be combined with a
class of another school in the cluster. If that class is picked, then students are surveyed in both
schools.
A "class" will be defined by our sampling team so that it meets size and composition
requirements before the sampling is done. For example, two small classes may be combined and
treated as one for sampling purposes. Or, boys and girls physical education classes may be
combined. This approach is an efficient method of data collection in schools that also has the
advantage of using the classroom teacher to distribute consent forms and to "leverage" student
participation; hence, it tends to yield higher student participation rates. The disadvantage of this
approach is its tendency to make the sampling design less efficient because students within a
class section tend to be more homogeneous than the student population at large within a school.
The effect of this inefficiency has been accounted for in our estimates of the design effect of the
study.
L.2.8 Replacement of Schools/School Systems
We will not replace refusing school districts, schools, classes, or students. We have allowed for
school and student non-response by inflating the number of selections to account for the
expected levels of non-response.

L.2.9 Selection of Students
All students in a selected classroom will be surveyed.
L.3

Weighting and Variance Estimation

This section describes the procedures used to weight the data including:




Sampling weights
Non-response adjustments and weight trimming
Post-stratification to national estimates by race and grade

This section also provides a brief discussion of the estimators and variance estimators
that may be computed from the NYPANS survey data.
L.3.1 Weighting
Although the sample was designed to be self-weighting under certain idealized conditions, it will
be necessary to compute weights to produce unbiased estimates. The basic weights, or sampling
weights, will be computed on a case-by-case basis as the reciprocal of the probability of selection
of that case. Below is a simple presentation of the basic steps in weighting including a) sampling
weight computation, b) non-response adjustments, and c) post-stratification adjustments.
L.3.1.1Sampling Weights
If k is the number of PSUs to be selected from a stratum, Ni is the size of stratum i and Nij is the
size of PSU j in stratum i (in all cases "size" refers to our proposed measure of size), then the
probability of selection of PSU j is kNij/Ni. Assuming two large schools are to be selected in PSU
j in stratum i, and with the notation that Nijk is the size of school k in PSU j in stratum i, then the
conditional probability of selection of the school given the selection of the PSU is 2 Nijk/Nij .
The derivation is similar for small schools with an extra factor to account for PSU subsampling
probability.
If Cijk is the number of classes in school ijk then the conditional probability of selection of a class
is just 1/Cijk. Since all students are selected, the conditional probability of selection of a student
given the selection of the class is unity.
The overall probability of selection of a student in stratum i is the product of the conditional
probabilities of selection:


 k N ij  2 N ijk  1 

 = 2k  N ijk 







 N i  N ij  C ijk 
 N i C ijk 

(1)

The probabilities of selection will be the same for all students in a given school, regardless of
their ethnicity, but will vary among schools depending upon the racial/ethnic mix of the schools
and their surrounding regions.
Sampling weights assigned to each student record are the reciprocal of the overall probabilities
of selection for each student.
L.3.1.2 Non-response Adjustments and Weight Trimming
Several adjustments are planned to account for student and school non-response patterns. An
adjustment for student non-response will be made using gender and grade within school. With
this adjustment, the sum of the student weights over participating students within a school
matches the total enrollment by grade in the school. This adjustment factor will be capped in
extreme situations, such as when only one or two students respond in a school, to limit the
potential effects of extreme weights (i.e., unequal weighting effects on survey variances).
The weights of students in participating schools will be adjusted to account for nonparticipation
by other schools. The adjustment uses the ratio of the weighted sum of measures of size over all
selected schools in the stratum (numerator of adjustment factor), and over the subset of
participating schools in a stratum (denominator of adjustment factor). The adjustment factor will
be computed and applied to small and large schools separately.
Extreme variation in sampling weights can inflate sampling variances, and offset the precision
gained from a well-designed sampling plan. One strategy to compensate for these potential
effects is to trim extreme weights and distribute the trimmed weight among the untrimmed
weights. The trimming method that we will use, outlined in Potter2,3 , for example, is based on
procedures first developed for the National Assessment of Educational Progress (NAEP).
The trimming is an iterative procedure. In each iteration, an optimal weight, Wo is calculated
from the sum of the squared weights in the sample. Then, each weight Wi is marked and
trimmed if it exceeds that optimal weight. The trimmed weight is summed within grade and
spread out proportionally over the unmarked cases in the grade. This process is repeated until
little or no weight is being trimmed. Weight trimming is done within stratum.
Typically, 3 to 4 percent of the total sample weight is trimmed and redistributed under the weight
trimming procedure.

Potter F. "Survey of Procedures to Control Extreme Sampling Weights" in Proceedings of the Section on
Survey Research Methods, American Statistical Association, pp 453-458. 1988.

Potter F. "A Study of Procedures to Identify and Trim Extreme Sampling Weights," in Proceedings of the
Section on Survey Research Methods of the American Statistical Association, pp 225-230, 1990.

L.3.1.3Post-stratification to National Estimates of Racial Percentages and Student
Enrollment by Grade
National estimates of racial/ethnic percentages were obtained from two sources. Private schools
enrollments by grade and five racial/ethnic groups were obtained from the Private School
Universe Survey (PSS), and public school enrollments by grade, gender, and five racial/ethnic
categories were obtained from the Common Core of Data (CCD), both produced by the National
Center for Education Statistics (NCES). These databases were combined to produce the
enrollments for all schools and to develop population percentages to use as controls in the poststratification step. For post-stratification purposes, a unique race/ethnicity is assigned to
respondents with missing data on race/ethnicity, those with an “Other” classification, and those
reporting multiple races.
Given a national estimate of Ra and a weighted population estimate of Pa for race category a in
some grade, the simple post-stratification factor would be the ratio of Ra to Pa for each race and
grade.
L.3.2 Estimators and Variance Estimators
If wi is the weight of case i (the inverse of the probability of selection adjusted for non-response
and post-stratification adjustments) and xi is a characteristic of case i (e.g., xi=1 if student i
smokes, but is zero otherwise), then the mean of characteristic x will be (Σ wixi)/(Σ wi). A
population total would be computed similarly as (Σ wixi). The weighted population estimates
will be computed with the Statistical Analysis System (SAS) and SUDAAN software.
These estimates will be accompanied by measures of sampling variability, or sampling error,
such as variances and standard errors, that account for the complex sampling design. These
measures will support the construction of confidence intervals and other statistical inference such
as statistical testing (e.g., subgroup comparisons or trends over successive YRBS cycles).
Sampling variances will be estimated using the method of general linearized estimators4 as
implemented in the SUDAAN5 or SAS survey procedures. These software packages must be
used since they permit estimation of sampling variances for multistage stratified sampling
designs, and account for unequal weighting, and for sample clustering and stratification.

Skinner CJ, Holt D, and Smith TMF, Analysis of Complex Surveys, John Wiley & Sons, New York, 1989, pp.
50.

Shah BV, Barnwell GG, Bieler GS. SUDAAN: software for the statistical analysis of correlated data, release
7.5, 1997 [user’s manual]. Research Triangle Park, NC: Research Triangle Institute; 1997.

File Type	application/pdf
File Title	Microsoft Word - Appendix L - Sampling and Weighting Plan AMR.doc
Author	alice.m.roberts
File Modified	2009-03-09
File Created	2009-03-09