Appendix N. Sampling and Weighting Plan for the 2013 and 2015 National YRBS
The national Youth Risk Behavior Survey (YRBS) was developed to monitor priority health risk behaviors that contribute to the leading causes of mortality, morbidity, and social problems among youth and young adults in the United States. The YRBS monitors six categories of health risk behaviors: 1) behaviors that contribute to unintended injury and violence; 2) tobacco use; 3) alcohol and other drug use; 4) sexual behaviors that contribute to unintended pregnancy and sexually transmitted diseases, including HIV infection; 5) dietary behaviors; and 6) physical inactivity. The YRBS also monitors the prevalence of obesity and asthma.
The objective of the sampling design is to support estimation of the health risk behaviors in a nationally representative population of 9th through 12th grade students. Estimates will be generated among students overall and by sex, age or grade, and race/ethnicity (white, black, Hispanic). The 2013 YRBS will be the 13th fielding of this national survey.
Section N.1 presents the sampling design, including our plans for achieving the target number of participating students in the 2013 and 2015 national YRBS. Section N.2 describes the sampling methods planned for the surveys. Section N.3 presents the planned weighting and variance estimation procedures.
N.1 Estimation and Justification of Sample Size
N.1.1 Overview
The YRBS is designed to produce estimates with error margins of ±5 percent at:
95% confidence for domains defined by grade, sex, or race/ethnicity;
95% confidence for domains defined by crossing grade or race/ethnicity by sex; and
90% confidence for domains formed by crossing grade with race/ethnicity.
The sample design proposed for the 2013 YRBS and 2015 YRBS surveys is consistent with the sample design used in past cycles, which includes adjusting sampling parameters to reflect changing demographics of the in-school population of high school students.
The YRBS sample size calculations are premised on the following assumptions:
The main structure of the sampling design will be consistent with the design used to draw the sample for prior cycles of the YRBS.
3 Secondary Sampling Units (SSUs) within each sample Primary Sampling Unit (PSU) will be selected. A PSU is a county or a group of contiguous counties. A SSU is a “full” school that serves as a sampling unit that can supply a full complement of students in grades 9 through 12. SSU’s with at least 28 students per grade are considered “large;” otherwise they are considered “small.” On average, each selected class will included 28 students (based on historical averages).
Within each school, one class will be selected from each grade to participate in the survey except in “high minority” schools, where two classes per grade will be selected. Double class selection has been used in all previous YRBS surveys to support health risk behavior prevalence estimates by race/ethnicity. For the 2013 and 2015 YRBS cycles, we will implement double class selection in schools with greater than 55% African American student enrollment or greater than 50% Hispanic student enrollment
A 78% school participation rate and an 86% student participation rate, based on the average participation rates over the 12 prior cycles of YRBS.
Based on these assumptions, 54 PSUs, and 3 large SSUs (“full” schools) will be selected from each PSU resulting in a total of 162 SSUs. (SSUs are either comprised of a single school (if the school includes each of grades 9-12), or are created by linking two or more schools that do not include all of grades 9-12.) Of these, approximately 20% will be classified as “high minority” and 2 classes per grade will be selected in each of these schools. In the remaining 130 large schools, 1 class per grade will be selected. Therefore, we expect to select approximately 21,728 students from large SSUs [(32 SSUs * 8 classes * 28 students = 7,168 students) + (130 + 4 classes *28 students = 14,560 students) = 21,728 students]. To provide adequate coverage of students in small schools (those with an enrollment of less than 28 students in any grade) we also will select small SSUs from a subsample of 15 PSUs. As in prior YRBS cycles, we will select one small SSU in each of the 15 subsample PSUs, therefore adding an additional 15 SSUs to the sample. From historical averages, small SSUs are expected to add approximately 63 students per school before non-response. Therefore, we expect to select approximately 945 students (15 SSUs * 63 students) from small SSUs.
All together, the proposed sample design is expected to yield 177 SSUs. This is done to form school-based SSUs that provide coverage for all four grades in each unit. We expect approximately 200 physical schools in a sample of 177 SSUs. These schools are expected to yield approximately 22,673 selected students and approximately 15,194 participating students (based on the historical average school and student response rates).
N.1.2 Expected Confidence Intervals
Factors that influence the size of prevalence estimate confidence intervals, include 1) whether the estimate is for the full population or for a demographic subgroup (i.e., by sex, race/ethnicity, or grade) and 2) the prevalence and the design effect (DEFF) associated with each risk behavior. The design effect is defined as the ratio of actual variances attained under the actual design and the variances that would be obtained with a simple random sample of the same size. The DEFF, which equals 1.0 for simple random sampling, reflects the variance-increasing effects of unequal weighting and sample clustering
As a general guideline based on the prior YRBS studies, which had similar designs and sample sizes, we can expect the following levels of precision:
95% confidence for domains defined by grade, sex, or race/ethnicity;
95% confidence for domains defined by crossing grade or race/ethnicity by sex; and
90% confidence for domains formed by crossing grade with race/ethnicity.
N.2 Sampling Methods
The sampling universe for the national YRBS will consist of all regular public, Catholic and other private school students in grades 9 through 12 in the 50 states and the District of Columbia.
The sample will be a stratified, three-stage cluster sample with PSU stratified by racial/ethnic status and urban versus rural. PSUs are classified as "urban" if they are in one of the 54 largest MSAs in the U.S.; otherwise, they were classified as "rural". Within each stratum, primary sampling units (PSUs), defined as a county, a portion of a county, or a group of counties, will be chosen without replacement at the first stage. Exhibit N-1 presents key sampling design features.
Exhibit N-1 Key Sampling Design Features
Sampling Stage |
Sampling Units |
Sample Size (Approximate) |
Stratification |
Measure of Size |
1 |
Counties or groups of counties |
54 PSUs |
Urban vs. non-urban (2 strata) Minority concentration (8 strata) |
Aggregate school size in target grades |
2 |
Schools |
Sample 200 physical schools (>=3 per PSU) |
Small vs. other |
Enrollment |
3 |
Classes/ students |
1 or 2 classes per grade per school: 22,673 students (sampled) 15,194students (participating) |
|
|
N.2.1 Measure of Size
The sampling approach will utilize Probability Proportional to Size (PPS) sampling methods. In general, when the measure of size is defined as the count of final-stage sampling units, and a fixed number of units are selected in the final stage of a PPS sample, the result is an equal probability of selection for all members of the universe. This is the case for the YRBS, where students counts are used as the measure of size, and a roughly fixed number of students are selected from each school as the final stage. Thus this design results in a roughly self-weighting sample.
Prior cycles of YRBS have included a weighted measure of size to increase the probability of selection of high minority (Hispanic and black) PSUs and schools. The effectiveness of a weighted measure of size in achieving oversampling is dependent upon the distributions of black and Hispanic students in schools. The need for a weighted measure of size is predicated on a relatively low prevalence of minority students in the population; however, this premise has become less tenable with the growth in the population of black and Hispanic students, in particular.
In 1990, Macro conducted a series of simulation studies that investigated the relationship of various weighting functions to the resulting numbers and percentages of minority students in the obtained samples.1 We performed a new simulation study to inform the 2013 and 2015 YRBS sample design for similar purposes, i.e., ensuring that we are using the minimum amount of weighting necessary to achieve target yields of black and Hispanic students. In particular, we investigated whether we could move to an unweighted measure of enrollment size, which would increase the statistical efficiency of the design and therefore lead to more precise prevalence estimates. Based on the results of the simulation study, we concluded that target yields of black and Hispanic students will be achieved by using an unweighted measure of size. Therefore, an unweighted measure of size will be used for the 2013 and 2015 YRBS sampling designs.
N.2.2 First-stage Sampling
N.2.2.1 Definition of Primary Sampling Units
In defining PSUs, several issues are considered:
Each PSU should be large enough to contain the requisite numbers of schools and students by grade, and small enough so as not to be selected with near certainty.
Each PSU should be compact geographically so that field staff can go from school to school easily.
PSUs definitions should be consistent with secondary sampling unit (school) definitions.
PSUs are defined to contain at least five large high schools.
Generally, counties will be equivalent to PSUs, with two exceptions:
Low population counties are combined to provide sufficient numbers of schools and students.
High population counties are divided into multiple PSUs so that the resulting PSU will not be selected with certainty2.
The basic county-to-PSU assignments have remained relatively stable from one YRBS cycle to the next. As we obtain new frame data each YRBS cycle, school and student counts for each PSU are updated to account for school openings and closings.
County population figures were aggregated from school enrollment data for the grades of interest. Enrollment data were obtained from the most recent Common Core of Data from the National Center for Education Statistics,3 which are merged on a rolling basis into the current school and school district data files of obtained from MDR4 and used as a basis for the sampling frame.
The PSU frame is then screened for PSUs that no longer meet the criteria given above. We adjust the frame by re-combining small counties/PSU as necessary to ensure sufficient size while maintaining compactness. Near certainty PSUs are split using an automated procedure built into the sampling program.
N.2.2.2 Stratification of PSUs
The PSUs will be organized into 16 strata, based on the urban/rural location of the school and minority enrollment. The approach involves the computation of optimum stratum boundaries using the cumulative square root of “f” method developed by Dalenius-Hodges. This method is useful where there are many PSUs at the lower levels of concentration, and they become sparse at as the percentage increases, which is the case here. The boundaries or cutoffs change as the frequency distribution (“f”) for the racial groupings change from one survey cycle to the next.
The specific definitions of primary strata are as follows:
If the percentage of Hispanic students in the PSU exceeded the percentage of black students, then the PSU is classified as Hispanic. Otherwise it is classified as black.
If the PSU is within one of the 54 largest MSA in the U.S. it is classified as 'Urban', otherwise it is classified as 'Rural.'
Hispanic Urban and Hispanic Rural PSUs are classified into four density groupings depending upon the percentages of Hispanic students in the PSU.
Black Urban and black Rural PSUs ware also classified into four groupings depending upon the percentages of black students in the PSU.
Exhibit N-2 illustrates the process with preliminary boundaries. It is worth stressing that the boundaries are re-computed for each cycle of the YRBS as we employ the Dalenius-Hodges method (described above) to allow the boundaries to adapt to the changing race/ethnic distribution of the student population.
Exhibit N-2 Minority Percentage Bounds for PSU stratification
Minority Concentration |
Density Group |
Bounds |
|
Urban |
Rural |
||
Hispanic |
1 |
0%-22% |
0%-22% |
2 |
>22%-34% |
>22%-44% |
|
3 |
>34%-45% |
>44%-66% |
|
4 |
>45%-100% |
||
Black |
1 |
0%-22% |
0%-18% |
2 |
>22%-34% |
>18%-34% |
|
3 |
>34%-56% |
>34%-58% |
|
4 |
>56%-100% |
>58%-100% |
N.2.2.3 Allocation of the PSU sample
The PSUs will be allocated to strata proportional to student enrollment.
N.2.2.4 Selection of PSUs
We will select a sample of 54 PSUs for the YRBS with probability proportional to size (PPS). The size measure used will be the sum of total school enrollment across schools in the PSU. With PPS sampling, the selection probability for each PSU is proportional to the PSU’s measure of size.
N.2.3 Second-stage sampling
N.2.3.1 Second-stage units (SSUs)
Secondary Sampling Units (SSUs) are formed from single schools or “full” schools. Single schools represent their own SSU if they have students in each of grades 9th-12th. Schools that do not have all grades are grouped together to form an SSU. Most commonly, students from a 10-12th grade school are grouped with the 9th grade students from a nearby 7th-9th grade school to form a SSU. SSUs that contain all grades ensure representation at each grade level to support the selection of one or more classes from each grade in SSUs (third stage).
N.2.3.2 Stratification
SSUs are stratified into two size strata comprised of small and large schools. Small schools are defined as those that cannot support the selection of an entire class at all grade levels. That is, a school is considered to be small if it has less than 28 students per grade at any grade level; all other schools are considered large.
N.2.3.3 SSU selection
Three large high schools are selected from each PSU. In addition, one small school is selected from each of 15 sub-sample PSUs. SSUs will be selected using a systematic probability proportional to size (PPS) method, with the enrollment described earlier as the measure of size.
N.2.4 Third-stage sampling
N.2.4.1 Selection of grades
Within large SSUs, a grade is sampled to represent the school at each grade level. For the vast majority of SSUs, composed of 1 physical school, this means that all eligible grades are included in the class selection in the school; there is a one-to-one correspondence between SSU and school.
Within each SSU formed by linking, or combining physical schools, grade samples are drawn independently with one component school being selected to supply each grade, proportional to grade level enrollment.
For small schools, no grade level sampling is performed. All students in the eligible grades that make up the school will be selected. From historical averages, each small school supplies an expected draw of 63 students per school.
N.2.4.2 Selection of classes
In schools not designated as “high minority,” one class per grade will be selected to participate in the survey.
Two classes per grade instead of one will be selected in “high minority” schools that have sufficient enrollment to support a sample of 56 students in a given grade. “High minority” is defined as over 55% of enrolled students are black or over 50% of enrolled students are Hispanic. We double sample classes in order to supply sufficient sample size to meet precision requirements for racial/ethnic-specific prevalence estimates.
The selection of two classes in schools with over 55% black enrollment or over 50% Hispanic enrollment is expected to result in double sampling in 20% of schools (n=32 schools) on the average. The estimated 128 extra class selections (32 SSUs*4 classes) will result in adding 3,584 students to the sample (assuming 28 students per class). The percentage of sampled students is expected to increase from 16.5 to roughly 19.6 for blacks and from 19.3 to 24.6 for Hispanics.
A class will be defined by our sampling team so that it meets size and composition requirements before the sampling is done. For example, two small classes may be combined and treated as one for sampling purposes. Or, boys and girls physical education classes may be combined. This approach is an efficient method of data collection in schools that also has the advantage of using the classroom teacher to distribute consent forms and to "leverage" student participation; hence, it tends to yield higher student participation rates. The disadvantage of this approach is its tendency to make the sampling design less efficient because students within a class section tend to be more homogeneous than the student population at large within a school.
The method of selecting classes will vary from school to school, depending upon the organization of that school and whether a cluster of schools is involved. The key element of the class sampling strategy is to identify a structure that partitions the students into mutually exclusive, collectively exhaustive groupings that are of approximately equal sizes and that are accessible. Beyond that basic requirement, we will do the partitioning to result in groups in which both sexes and all students have a chance to be selected. In selecting classes, we will generally give preference to selecting from mandatory courses such as English. Another option is to select from all classes that meet during a particular time of day such as all second or third period classes.
N.2.4.3 Selection of students
All students in a selected classroom will be eligible for the survey.
N.2.4.4 Replacement of schools/school systems
We will not replace refusing school districts, schools, classes or students. We have allowed for school and student response in the sampling design. The numbers of selections are inflated to account for expected levels of non-response as discussed earlier.
N.3 Weighting and Variance Estimation
This section describes the procedures used to weight the data. From a sampling perspective, these include:
Sampling Weights
Nonresponse Adjustments and Weight Trimming
Post-stratification to National Estimates of Racial Percentages and Student Enrollment by Grade
Estimators and Variance Estimators
Although the sample was designed to be self-weighting under certain idealized conditions, it will be necessary to compute weights to produce unbiased estimates. The basic weights, or sampling weights, will be computed on a case by case basis as the reciprocal of the probability of selection of that case. Below is a simple presentation of the basic steps in weighting including a) Sampling weight computation, b) Nonresponse adjustments, and c) Post-stratification adjustments.
N.3.1 Sampling Weights
If k is the number of PSUs to be selected from a stratum, Ni is the size of stratum i and Nij is the size of PSU j in stratum i (in all cases "size" refers to our proposed measure of size), then the probability of selection of PSU j is k×Nij/Ni.
Assuming three large schools are to be selected in stratum i, Nijk is the size of school k in PSU j in stratum i, then the conditional probability of selection of the school given the selection of the PSU is 3×Nijk/Nij for YRBS Large schools
The derivation is similar for small schools, with an extra factor to account for PSU subsampling probability.
If Cijk is the number of classes in school ijk then the conditional probability of selection of a class is just 1/Cijk (or 2/Cijk if two classes are taken). Since all students are selected, the conditional probability of selection of a student given the selection of the class is unity.
The overall probability of selection of a student in stratum is the product of the conditional probabilities of selection. The probabilities of selection will be the same for all students in a given school, regardless of their ethnicity, but will vary among schools depending upon the racial/ethnic mix of the schools and their surrounding regions.
Sampling weights assigned to each student record are the reciprocal of the overall probabilities of selection for each student.
N.3.2 Non-response Adjustments and Weight Trimming
Several adjustments are planned to account for student and school nonresponse patterns. An adjustment for student nonresponse will be made using sex and grade within school. With this adjustment, the sum of the student weights over participating students within a school matches the total enrollment by grade in the school. This adjustment factor will be capped in extreme situations, such as when only one or two students respond in a school, to limit the potential effects of extreme weights (i.e., unequal weighting effects on survey variances).
The weights of students in participating schools will be adjusted to account for nonparticipation by other schools. The adjustment uses the ratio of the weighted sum of measures of size over all selected schools in the stratum (numerator of adjustment factor), and over the subset of participating schools in a stratum (denominator of adjustment factor). The adjustment factor will be computed and applied to small and large schools separately.
Extreme variation in sampling weights can inflate sampling variances, and offset the precision gained from a well-designed sampling plan. One strategy to compensate for these potential effects is to trim extreme weights and distribute the trimmed weight among the untrimmed weights. The trimming method that we will use, outlined in Potter,5,6 for example, is based on procedures first developed for the National Assessment of Educational Progress (NAEP).
The trimming is an iterative procedure. In each iteration an optimal weight, Wo is calculated from the sum of the squared weights in the sample. Then, each weight Wi is marked and trimmed if it exceeds that optimal weight. The trimmed weight is summed within grade and spread out proportionally over the unmarked cases in the grade. This process is repeated until little or no weight is being trimmed. Weight trimming is done within stratum. Typically, 3 to 4 percent of the total sample weight is trimmed and redistributed under the weight trimming procedure.
N.3.3 Post-stratification
Post-stratification approaches capitalize on known population totals and percentages available for groups of schools and students. National estimates of racial/ethnic percentages for post-stratification are obtained from two sources described next.
Private schools enrollments by grade and five racial/ethnic groups are obtained from the Private School Universe Survey (PSS). Public school enrollments by grade, sex, and five racial/ethnic categories are obtained from the Common Core of Data (CCD), both produced by the National Center for Education Statistics (NCES). These databases are combined to produce the enrollments for all schools, and to develop population percentages to use as controls in the post-stratification step. For post-stratification purposes, a unique race/ethnicity is assigned to respondents with missing data on race/ethnicity, those with an “Other” classification, and those reporting multiple races. For private schools, we use two race/ethnic classifications – white and non-white. For public schools we use the full five categories.
Given a national estimate of Ra and a weighted population estimate of Pa for race category a in some grade, the simple post-stratification factor would be the ratio of Ra to Pa for each grade.
N.3.4 Estimators and Variance Estimation
If wi is the weight of case i (the inverse of the probability of selection adjusted for nonresponse and post-stratification adjustments) and xi is a characteristic of case i (e.g., xi=1 if student i smokes, but is zero otherwise), then the mean of characteristic x will be (Σ wixi)/(Σ wi). A population total would be computed similarly as (Σ wixi). The weighted population estimates will be computed with the Statistical Analysis System (SAS) and SUDAAN software.
These estimates will be accompanied by measures of sampling variability, or sampling error, such as variances and standard errors, that account for the complex sampling design. These measures will support the construction of confidence intervals and other statistical inference such as statistical testing (e.g., subgroup comparisons or trends over successive YRBS cycles). Sampling variances will be estimated using the method of general linearized estimators7 as implemented in the SUDAAN8 or SAS survey procedures. These software packages must be used since they permit estimation of sampling variances for multistage stratified sampling designs, and account for unequal weighting, and for sample clustering and stratification.
1Errecart, M.T., Issues in Sampling African-Americans and Hispanics in School-Based Surveys. Centers for Disease Control, October 5, 1990.
2 The variance estimation process is more efficient without the need to account for certainty PSUs. The method of dividing large PSUs ensures that each sub-county PSU mirrors the distribution of schools in the county as a whole.
3 US Department of Education, National Center for Education Statistics. Common Core of Data Public Elementary/Secondary School Universe Survey. Washington, DC: US Department of Education, National Center for Education Statistics. Available at http://nces.ed.gov/ccd.
4MDR National Education Database Master Extract, Shelton, CT: Market Data Retrieval, Inc.: May 3, 2010.
5Potter F. "Survey of Procedures to Control Extreme Sampling Weights" in Proceedings of the Section on Survey Research Methods, American Statistical Association, pp 453-458. 1988.
6Potter F. "A Study of Procedures to Identify and Trim Extreme Sampling Weights," in Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp 225-230, 1990.
7Skinner CJ, Holt D, and Smith TMF, Analysis of Complex Surveys, John Wiley & Sons, New York, 1989, pp. 50.
8Shah BV, Barnwell GG, Bieler GS. SUDAAN: software for the statistical analysis of correlated data, release 7.5, 1997 [user’s manual]. Research Triangle Park, NC: Research Triangle Institute; 1997.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Author | Windows User |
File Modified | 0000-00-00 |
File Created | 2021-01-16 |