Sampling Design

0621 NYTS Att L_2021 Sampling Design FINAL.docx

2022 National Youth Tobacco Survey (NYTS)

Sampling Design

OMB: 0920-0621

Document [docx]
Download: docx | pdf

2021 NYTS Sampling Plan


This document provides a draft sampling plan for the 2021 NYTS. Sampling procedures follow closely those developed by the ICF statistical team and adopted in repeated cycles over the past decade.


The NYTS employs a stratified, three-stage cluster sample design to produce a nationally representative sample of middle school and high school students in the United States. Sampling procedures are probabilistic stages and entail selection of 1) Primary Sampling Units (PSUs) (defined as a county, or a group of small counties, or part of a very large county) within each stratum; 2) Secondary Sampling Units (SSUs), (defined as schools or linked schools) within each selected PSU; and 3) students within each selected school.1 Participating students complete the anonymous voluntary survey using a self-administered questionnaire.


The objective of the NYTS sampling design is to support estimation of tobacco-related knowledge, attitudes, and behaviors in a national population of public and private school students enrolled in grades 6 through 12 in the United States. More specifically, the study is designed to produce national estimates at a 95% confidence level by school level (middle and high school), by grade (6, 7, 8, 9, 10, 11, and 12), by sex (male and female), and by race-ethnicity (non-Hispanic White, non-Hispanic Black, and Hispanic). Additional estimates also are supported for subgroups defined by grade, by sex, and by race-ethnicity, each within school-level domains; however, precision levels vary considerably according to differences in subpopulation sizes. The NYTS employs a repeat cross-sectional design.


The universe for the study consists of all public and private school students enrolled in regular middle schools and high schools in grades 6 through 12 in the 50 U.S. states and the District of Columbia. Alternative schools, special education schools, Department of Defenseoperated schools, Bureau of Indian Affairs schools, vocational schools that serve only pull-out populations, and students enrolled in regular schools who are unable to complete the questionnaire without special assistance are excluded.


The 2021 NYTS will be a continuation of the NYTS cycles that have taken place since 1999, employing the general sampling design framework used in the previous cycles. The number of participating students, in excess of 24,000 as required, is larger than the numbers in recent cycles to generate approximately equivalent effective sample sizes and precision levels overall.


1. Frame Construction

The frame will be constructed from separate sources obtained from the National Center for Education Statistics (NCES) and from a commercial vendor, Market Data Retrieval, Inc. (MDR). The NCES files will be the Common Core of Data (CCD) for public schools and Private School Survey (PSS) for private schools.


The reason for moving to a frame built from multiple data sources is to increase the coverage of schools nationally. This dual-source frame build method was implemented for the 2014 NYTS survey for the first time. Including schools sourced from the two NCES files resulted in a coverage increase among all public and non-public schools of 11.3%.




A cut-off in school size was also added in the 2014 survey to ensure anonymity and the presence of all grades. Eligible schools need an enrollment of at least 40 students across the eligible grades. To improve coverage, we will investigate the impact of lowering the threshold to 35 students.


To illustrate the population sizes, Exhibit 1 presents the number of schools and students in the 2021 NYTS frame by school level.



Exhibit 1. Number of Schools and Students by School Level in the School Frame (NYTS 2021 Frame)


School Level

Schools

Students

High Schools

28,342

16,560,926

Middle Schools

42,832

12,320,865


  1. Sampling Stages and Measure of Size

The three-stage cluster sample will be stratified by racial/ethnic composition and urban versus rural status at the first (primary) stage. PSUs will be classified as “urban” if they are in one of the 54 largest metropolitan statistical areas (MSAs) in the U.S. using 2018 American Community Survey (ACS) data from the U.S. Census Bureau. Otherwise, they will be classified as nonurban.” Additionally, implicit stratification will be imposed by geography by sorting the PSU frame by state and by five-digit ZIP Code (within state). The implicit stratification will be extended to ensure regional stratification using the four US Census regions (i.e., we will stratify or sort by region first, then states and ZIP codes). Within each stratum, a PSU will be randomly sampled without replacement at the first stage.


In subsequent sampling stages, a probabilistic selection of schools and students will be made from the sample PSUs. The NYTS is designed to balance the yields across grades; therefore, the PSU subsampling is simplified to vary across school sizes but not between school-level categories.


The sampling stages may be summarized as follows, with additional details provided below:


  • Selection of PSUs: One hundred PSUs will be selected from 16 strata, with probability proportional to the total number of eligible students enrolled in all eligible schools located within a PSU.

  • Selection of Schools: At the second sampling stage, a total of 240 large schools, or SSUs, will be selected from the sample PSUs. Two large schools will be selected per sample PSU, one per level (middle or high). An additional large school for each level will be selected in a subsample of 20 PSUs. An additional 50 medium SSUs and 30 small SSUs will be selected from subsample PSUs, for a total of 320 sample SSUs (320 = 240 + 50 + 30). The PSU subsample will be drawn as a simple random sample, and the schools drawn with probability proportional to the total number of eligible students enrolled in a school.

  • Selection of Students: Students will be selected via whole classes, whereby all students enrolled in any one selected class will be by default chosen for participation. Classes will be selected from course schedules provided by each school that agreed to participate. Schedules will be constructed such that all eligible students are represented one time only.


Schools will be stratified into large, medium, and small schools based on their ability to support two, one, or less than one class selection per grade. Double class sampling will take place in a subset of half of the large schools.


The sampling approach uses PPS sampling methods. In PPS sampling, when the measure of size (MOS) is defined as the count of final-stage sampling units, and a fixed number of units is selected in the final stage, the result is an equal probability of selection for all members of the universe (“epsem design). For the NYTS, we approximate these conditions and thus obtain a roughly self-weighting sample. Self-weighting samples, and “epsem” designs,’ are more efficient statistically in the sense of minimizing variances.


The MOS also is used to compute stratum sizes and PSU sizes. Assigning an aggregate measure of size to PSU, the sample allocates the PSU sample in proportion to the student population. Exhibit 2 presents a high-level summary of the key sampling design features that will be described in detail in the next sections.


Exhibit 2. Key Sampling Design Features



Sampling Units


Stratification

Measure of Size

Designed Sample Size

Projected Sample Size




1


Counties, portions of a county, or groups of counties

Urban versus Nonurban (two strata);

Minority concentration (eight strata)



Aggregate school size in target grades



100 Counties, portions of a county, or groups of counties



100 Counties, portions of a county, or groups of counties




2




Schools


Small, medium and large;

High school versus middle school



Eligible enrollment

320 SSU (school)

selections*: 240

large schools, 50 medium schools,

and 30 small

schools




320 SSUs




3




Classes/ students



1 or 2 classes per grade (two per grade in large, high- minority schools)

37,527, students2 sampled




24,000 student participants

*Denotes virtual schools containing all grades of interest at a given school level. Note that the actual number of physical schools will be closer to 345-375 after the disaggregation of SSUs into physical buildings.

  1. Oversampling of Racial/Ethnic Minorities

To facilitate accurate prevalence estimates among racial/ethnic minority groups, prior cycles of the NYTS have employed multiple strategies to increase the number of non-Hispanic Black and Hispanic students included in the sample. The sampling design always seeks to balance increasing yields for minority students with overall precision, as oversampling leads to larger variances for overall estimates. These approaches have included over-sampling PSUs in strata with a high proportion of racial/ethnic minority students, the use of a weighted MOS, and double class selection in large schools that contained a sufficient proportion of minority students.


The only oversampling that remains in the more efficient design of the last couple of cycles of the NYTS is double class sampling. Double class sampling is focused on a subset of large schools.

The new design has been shown to reduce design effects for survey estimates, which is defined as the variance of actual survey estimates divided by the variance of a simple random sample of the same size. It is a common useful measure of the precision of survey estimates. The ICF team has developed a simulation program that calibrates the coefficients of the MOS to ensure the required yields while balancing the precision for minority group estimates and for overall estimates. While the allocation to strata will continue to be proportional, we will continue to implement double class selection in a subset of large schools.


ICF has historically conducted simulation studies to investigate the impact of various weighting functions on the numbers and percentages of racial/ethnic minority students. These simulations have been updated with each cycle of the NYTS to ensure that the minimum amount of oversampling while still achieving adequate representation of non-Hispanic Black and Hispanic students.


  1. Stratification and Linking

This section describes the following steps that are necessary for the selection of the first and secondary sampling units of PSUs and schools: organizing PSUs; linking schools into SSUs; and implementing the stratification and allocation methods at each of these stages.


Primary Sampling Unit (PSU)

Defining a PSU

In general, PSUs are geographic areas defined as counties or groupings of counties. In defining a PSU, several issues are considered:


      1. Each PSU should be large enough to contain the requisite numbers of schools and students by grade, yet not so large as to be selected with near certainty.

      2. Each PSU should be compact geographically so that field staff can go from school to school easily.

      3. Recent data should be available to characterize each PSU.

      4. Each PSU should contain at least four middle and five high schools.


Generally, counties are equivalent to PSUs with two exceptions:

  1. Low population counties are combined to provide sufficient numbers of schools and of students; and

  2. Counties that are very large may be split to avoid becoming certainty or near-certainty PSUs.


Certainty PSUs are those whose size is large enough to ensure selection with probability of one (1.0) with a PPS sampling design that selects larger PSUs with larger probabilities. As certainty PSUs lead to inefficiencies in the design, they are split so that the new smaller units are selected with a probability smaller than one. Near-certainty units also are split to build in a safety buffer in the PSU sizes. County population figures are aggregated from school enrollment data for the grades of interest.



Stratification of PSUs

The PSUs will be organized into 16 strata, on the basis of urban/rural location (as defined above) and racial/ethnic minority enrollment of non-Hispanic Blacks and Hispanics. In the traditional stratification used by the NYTS, the classification of PSUs into the two racial/ethnic minority strata, non-Hispanic Black and Hispanic, is based on the predominant minority in the PSU. This classification is coupled with the density distribution of non-Hispanic Blacks and Hispanics to subdivide each of the four primary strata into four substrata, indexed by one through four according to this density. The approach for computing stratum boundaries follows the cumulative square root of “f” method developed by Dalenius and Hodges.3 The boundaries or cutoffs change as the frequency distribution (“f”) for the racial groupings changes from one survey cycle to the next. These rules are summarized below.


  • If the PSU is within one of the 54 largest MSAs in the U.S., it is classified as “urban; otherwise it is classified as nonurban.

  • If the percentage of Hispanic students in the PSU exceeded the percentage of non-Hispanic Black students, then the PSU is classified as Hispanic. Otherwise it is classified as non- Hispanic Black.

  • Hispanic urban and Hispanic nonurban PSUs were classified into four density groupings, depending upon the percentages of Hispanics in the PSU.

  • Non-Hispanic Black urban and non-Hispanic Black nonurban PSUs also were classified into four groupings, depending upon the percentages of non-Hispanic Blacks in the PSU.


We will develop the cutoffs used in defining the substrata by concentrations of Black and Hispanic students in each of the four primary strata using the most recent frame data.


Allocation of the PSU Sample

We will select a sample of 100 PSUs allocated in proportion to student enrollment to maximize overall precision. The initial proportional allocation will be adjusted to ensure that racial/ethnic minority targets will be met. The adjustments will ensure that each stratum has at least two sampled PSUs and add balance to the distribution across strata. Exhibit 3 displays the allocation planned for the 2021 sample.


Exhibit 3: PSU Allocation


Stratum

# of Schools

# of Students

# of PSUs Allocated

BR1

3121

1,511,813

9

BR2

1281

768,040

5

BR3

1111

545,667

4

BR4

608

273,748

2

BU1

1815

1,270,183

8

BU2

1374

890,576

5

BU3

415

270,683

2

BU4

514

290,375

2

HR1

7128

2,971,743

16

HR2

1641

821,571

5

HR3

1081

577,091

4

HR4

575

378,629

3

HU1

2577

1,946,984

11

HU2

1920

1,475,655

9

HU3

1785

1,413,611

8

HU4

1386

1,114,118

7

Schools

Linking into Secondary Sampling Units

Schools will be classified as “whole” for high schools if they have all high-school grades (9th through12th), and whole for middle schools if they had all grades six through eight. Otherwise, they will be considered a “fragment” school. Fragment schools will be linked with other schools (fragment or whole) to form a linked school that has all grades present for a given level. We will link schools before sampling using an algorithm that links geographically proximate schools. Linked schools are treated as SSUs with selection performed at the grade level, as described below.


Stratification

SSUs will be stratified by school level (middle and high) and by size. Middle schools are those that contain any of grades 6-8 and high schools are those that contain any of grades 9-12. Schools that contain a mix of high and middle school grades will be split into two sampling units, or one for each level.


SSUs also will be stratified by school size into small, medium, and large strata on the basis of their ability to support less than one, one, or two class selections per grade. Operationally, large SSUs contain at least 56 students at each grade level, medium SSUs contain between 28 and 55 students per grade, and small SSUs contain less than 28 students at any grade level.


  1. Sample Sizes

This section provides the derivation of the NYTS sample sizes driven by target precision requirements overall and in key subgroups. The required student yields, or numbers of participating students, are translated into the necessary numbers of sample schools, and sample PSUs, using historical participation rates.


The NYTS is designed to produce accurate estimation within a margin of error (MOE) of 5% at a 95% precision level for the following key subgroup estimates:


  • Middle and high school (school level): middle school students in total (grades 6–8 combined) and high school students in total (grades 9–12 combined);

  • Grade: individual grades 6, 7, 8, 9, 10, 11, and 12;

  • Sex: males and females in total, by school level (male middle school students, female high school students), and by individual grade (sixth-grade males, sixth-grade females);

  • Race-Ethnicity: in total and by school level (e.g., Hispanic middle school students).


The sample sizes are developed to support analysis by individual grade and by sex without any special considerations in the sampling plan. Design effects will be relatively small for subgroups that cut across schools; therefore, estimates by sex have better precision than other subgroups, with confidence intervals within ± 3%. Because the design is expected to yield a greater number of completed surveys from high school students than from middle school students, overall estimates are anticipated to be more precise at the high school level than those at the middle school level.


The 2021 NYTS sampling design will aim at balancing student yields by grade, with target sample sizes of approximately 3,428 participating students per grade, so they also ensure the precision of estimates by individual grade (e.g., sex by grade subgroup estimates on the basis of about 1,700 students).




Across the 14 previous cycles of the NYTS, school participation has averaged 83.3%, and student participation has averaged 89.9%. Overall response rates have averaged 75.5%. Historical participation rates at both school and student levels, which guide the sampling design and sample sizes, are summarized in Exhibit 4.



Exhibit 4. Historical Summary of NYTS Participation Rates


YEAR

School Participation

Student Participation

Overall

1999

90.30%

93.20%

84.20%

2000

90.00%

93.40%

84.10%

2002

83.10%

90.08%

74.85%

2004

92.70%

87.90%

81.50%

2006

91.60%

87.60%

80.20%

2009

92.30%

91.90%

84.80%

2011

83.20%

88.00%

73.20%

2012

80.30%

91.70%

73.60%

2013

75.40%

90.70%

68.40%

2014

80.20%

91.40%

73.30%

2015

72.50%

87.40%

63.40%

2016

81.00%

88.00%

71.00%

2017

76.76%

88.73%

68.10%

2018

76.77%

88.82%

68.19%

2019

77.23%

85.85%

66.30%

Average over all previous cycles

83.30%

89.92%

75.43%



In calculating the sample sizes for the 2021 NYTS, we made our approach more robust by assuming a conservative combined rate (student x school) of 63.75%, substantially lower than the historical overall response rate. These numbers are closer to the more recent experience at both levels. The main reason, discussed below, is that the student participation rate needs to be adjusted to account for a growing number of ineligible students. This number needs to be subtracted from the net number of students available for selection in the participating schools.


Schools will be classified by size on the basis of grade-level enrollments. This ensures that a sampled school of a given size classification is able to support the student sample sizes summarized in Exhibit 5 below.


For this sampling plan, the NYTS sample size calculations were based on the following assumptions:


  • The main structure of the sampling design is consistent with the design used to draw the sample for prior cycles of the NYTS.

  • The selection of a minimum of one SSU at the high school level and one SSU at the middle school level within each PSU. In addition, we will select 20 additional large schools per level, one middle and one high-school in a subsample of 20 PSUs, as described below.

  • SSUs with at least 56 students per grade are considered large, and those among the others with 28 students per grade are considered medium; otherwise, they are considered small.

  • On average, each selected class includes 25 students (on the basis of historical averages) pre-attrition.

  • For half (50%) of the SSUs classified as large, we will sample double the amount of students by sampling eight classes instead of four in high schools, and six classes instead of three in middle schools.

  • A 63.75% overall response rate (based on historical averages) calculated as the product of the school response rate (75%) and student response rate (85%).


Note that the double sampling will occur for half of the large schools. Note also that the assumed student response rate is lower than in previous cycles to reflect a growing number of ineligible students that also need to be subtracted off from the net numbers. On the other hand, the school response rate will remain high especially as we have retained a relatively large number of large sample schools which tend to participate at higher rates than smaller (often non-public) schools.


Based on these assumptions, 100 PSUs will be selected at the first stage. Subsamples of these sample PSUs will be selected to supply additional SSUs as described next: a) small (30 SSUs from 15 subsample PSUs), b) medium (50 SSUs from 25 subsample PSUs), and c) large (40 SSUs in 20 subsample PSUs). The PSU subsamples will be drawn as simple random samples, and the schools drawn with probability proportional to the total number of eligible students enrolled in a school. The sampling and subsampling of PSUs are described in more detail next.


Within each of the 100 sample PSUs, two large schools will be drawn, one at the middle school level to supply students in grades 6 through 8, and one at the high school level to supply students in grades 9 through 12. An additional large school for each level will be selected in a subsample of 20 PSUs.


An additional 50 medium SSUs and 30 small SSUs will be selected from subsample PSUs, for a total of 320 sample SSUs (320 = 240 + 50 + 30).


In addition, 25 PSUs will be independently subsampled to supply medium SSUs (two selected per level in each subsample PSU), and 15 PSUs subsampled to supply small SSUs (two selected per level in each subsample PSU).


Exhibit 5 provides a detailed calculation of designed sample sizes across school level and school size categories. A larger school sample is selected from a larger number of PSUs to limit clustering effects. These sample sizes are larger than recent cycles of the NYTS to accommodate the larger student sample size desired by CDC. These sample sizes are designed to support estimates for smaller subgroups formed by cross classifications of race/ethnicity by grade or sex.





Exhibit 5. Summary of Expected Sample Sizes for the 2021 NYTS


PSU

Size

# of SSUs

Number of Schools Sampled

Number of Classes per School

Number of Students per Class

Number of Sampled Students Prior to Attrition

Combined School and Student 63.75%  Response Rate

100

Large High School

120

Double Classes: 60

8

25

12000

7650

Single classes:     60

4

25

6000

3825

Large Middle School

120

Double classes:     60

6

25

9000

5738

Single classes:     60

3

25

4500

2869

Large Total

240

 

 

 

31500

20081

25

(subsample)

Medium High School

25

 

4

25

2500

1594

Medium Middle School

25

 

3

25

1875

1195

Medium Total

50

 

 

 

4375

2789

15 (subsample)

Small High School

15

 

4

25

1500

956

Small Middle School

15

 

3

25

1125

717

Small Total

30

 

 

 

2625

1673

 

Overall Total

320

 

 

 

38500

24544


* In this exhibit, the schools are SSUs or “virtual schools” created by combining actual, physical schools so that each virtual school unit has a complete set of grades for the level. The virtual schools are expanded to physical schools. The number of physical schools in the sample is expected to range from 345 to 375.

1 Sampling is conducted without replacement at all stages in the sense that sampling units are not thrown back into the pool after being selected. (Note that the sampling design concept of “replacement” is not related to substitution of ineligible schools, a step that is undertaken prior to the data collection process.)

2 The total sample size of 37,527 was derived in our simulations; it is also approximately the ratio of the target number of participants, 24,000, divided by the anticipated response rate overall, 63.75%


3 Dalenius, T., & Hodges, J. L. (1959). Minimum variance stratification. Journal of American Statistical Association, 54, 88−101.

Shape1

Page 6


File Typeapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
AuthorIachan, Ronaldo
File Modified0000-00-00
File Created2021-04-29

© 2024 OMB.report | Privacy Policy