NAEP 2024 Appendix B Weighting Procedures

NAEP 2024 Appendix B v30.docx

National Assessment of Educational Progress (NAEP) 2024 Amendment #2

NAEP 2024 Appendix B Weighting Procedures

OMB: 1850-0928

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 1850-0928 can be found here:

Document [docx]

Download: docx | pdf

NATIONAL CENTER FOR EDUCATION STATISTICS NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS

National Assessment of Educational Progress (NAEP) 2024

Appendix B

NAEP 2018 Weighting Procedures

OMB# 1850-0928 v.30

June 2023

The 2018 Weighting Procedures documentation is the most current version available to the public. At this time, there is not a timeline for when the details for later assessment years will be publicly available.

NAEP Technical Documentation Website

NAEP Technical Documentation Weighting Procedures for the 2018 Assessment

NAEP assessments use complex sample designs to create student samples that generate population and subpopulation estimates with reasonably high precision. School and student sampling weights ensure valid inferences from the school and student samples to their respective populations. In 2018, weights were developed for schools and students sampled at grade 8 for assessments in civics, geography, U.S. history, and technology and engineering literacy (TEL).

Student Weights

The assessments for civics, geography, and U.S. history at grade 8 in both public and private schools were administered in two modes: paper and pencil mode, and a digital mode using tablets. As described in the NAEP 2018 sample design section, the random sample of students assigned to the paper-based assessment (PBA) group were assessed using paper and pencil, and the random sample of students assigned to the digitally based assessment (DBA) group were assessed using tablets. Separate sampling weights were computed for the DBA-only sample, PBA-only sample, and DBA and PBA samples combined. The weighting procedures for the civics, geography, and U.S. history assessments described in this report are

Computation of Full-Sample Student Weights

Computation of Replicate Student Weights for Variance Estimation

Computation of Full-Sample School Weights

Computation of Replicate School Weights for Variance Estimation

Quality Control on Weighting Procedures

based on combined DBA/PBA samples. The TEL assessment at grade 8 for public and private schools was only administered to students via tablets and therefore involved the computation of only one set of weights.

Each student was assigned a weight to be used for making inferences about students in the target population. This weight is known as the final full- sample student weight, and it contains six major components:

Shape1 the student base weight,

Shape2 Shape3 Shape4 school nonresponse adjustment, student nonresponse adjustment, school weight trimming adjustment,

Shape5 Shape6 student weight trimming adjustment, and student raking adjustment.

The student base weight is the inverse of the overall probability of selecting a student and assigning that student to a particular assessment. The sample design that determines the base weights is discussed in the NAEP 2018 sample design section.

The student base weight is adjusted for two sources of nonparticipation: school level and student level. These weighting adjustments seek to reduce the potential for bias from such nonparticipation by

Shape7 increasing the weights of students from schools similar to those schools not participating, and

Shape8 increasing the weights of participating students similar to those students from within participating schools who did not attend the assessment session (or makeup session) as scheduled.

Furthermore, the final student weights reflect the trimming of extremely large weights at both the school and student levels. These weighting adjustments seek to reduce variances of survey estimates.

An additional weighting adjustment was implemented in the geography and U.S. history assessments so that estimates for key student-level characteristics were in agreement across those assessments. This adjustment was implemented using a raking procedure.

In addition to the final full-sample student weight, a set of replicate weights was provided for each student. These replicate weights are used to calculate the variances of survey estimates using the jackknife repeated replication method.

The methods used to derive these weights were aimed at reflecting the features of the sample design. When the jackknife variance estimation procedure is implemented, approximate unbiased estimates of sampling variance are obtained. In addition, the various weighting procedures were repeated on each set of replicate weights to appropriately reflect the impact of the weighting adjustments on the sampling variance of a survey estimate.

School Weights

In addition to student weights, school weights were calculated to provide secondary users means to analyze data at the school level. The school weights are subject specific and are computed for the schools that contained at least one student that participated in the NAEP assessment for that subject.

Each school was assigned a weight to be used for making inferences about schools in the target population. This weight, known as the final full-sample school weight, contains five major components:

Shape9 the school base weight,

Shape10 Shape11 Shape12 Shape13 school nonresponse adjustment, school weight trimming adjustment, school-session assignment weight, and small-school subject adjustment.

The school base weight is the inverse of the overall probability of selecting a school for a particular assessment.

The school nonresponse adjustment increases the weights of participating schools to account for similar schools that did not participate, and the school trimming adjustment reduces extremely large weights to decrease variances of survey estimates. These two adjustments are the same school-level adjustments used in the final full-sample student weight described above.

The school-session assignment weight reflects the probability that the particular session type was assigned to the school.

The small-school subject adjustment accounts for very small schools that did not have enough participating students for every subject intended for the school. School weights for subjects that had at least one eligible student are inflated by this factor to compensate for schools of the same size that did not have any eligible students for those subjects and would not be represented otherwise.

In addition to the final full-sample weight, a set of replicate weights was provided for each school. These replicate weights are used to calculate the variances of school-level estimates using the jackknife repeated replication method.

Shape14 Quality Control Procedures

Quality control checks were implemented throughout the weighting process to ensure the accuracy of the full-sample and replicate school and student weights. See the Quality Control on Weighting Procedures link above for the various checks implemented and main findings of interest.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/weighting_procedures_for_the_2018_assessment.aspx

Shape15

NAEP Technical Documentation Computation of Full-Sample School Weights for the 2018 Assessment

The full-sample or final school weight is the sampling weight used to derive NAEP school estimates of population and subpopulation characteristics for a specified grade (8) and assessment subject (civics, geography, U.S. history, or technology and engineering literacy (TEL)). The full-sample school weight reflects the number of schools that the sampled school represents in the population for purposes of estimation.

The full-sample weight, which is used to produce survey estimates, is distinct from a replicate weight that is used to estimate variances of survey estimates. The full-sample weight is assigned to participating schools and reflects the school base weight after the application of the various weighting adjustments. The full-sample weight SCH_WGT_jsfor school s in stratum j can be expressed as follows:

where

Shape16

SCH_BWT_jsis the school base weight;

Shape17 Shape18 Shape19 SCH_NRAF_jsis the school-level nonresponse adjustment factor; SCH_TRIM_jsis the school-level weight trimming adjustment factor; and SCH_SUBJ_AF_sis the small-school subject adjustment factor.

The small-school subject adjustment accounts for 1) very small schools that did not have enough participating students for every subject assigned to the school and 2) small schools assigned only one paper-based assessment (PBA) session type. School weights for subjects that had at least one eligible student are inflated by this factor to compensate for schools of the same size that did not have any eligible students for those subjects and would not be represented otherwise.

For the digitally based assessments (DBA), the factor is equal to the inverse of the probability that a school of a given size had at least one eligible sampled student in the given subject:

where

Shape20

Shape21

SF is the spiraling factor for the given subject and assessment mode; and

n_sis the within-school student sample size.

For the 2018 operational assessments, schools could be assigned to four session types:

DBA civics/geography/U.S. history;
PBA geography/U.S. history;
PBA civics; or
DBA technology and engineering literacy (TEL).

Students in schools assigned to the DBA civics/geography/U.S. history sessions are assigned to subjects at rates of 7 in 23 for civics, 7 in 23 for geography, and 9 in 23 for U.S. history. These rates result in spiraling factors of 3.29, 3.29, and 2.56 respectively. If, for example, a school had only 1 (or 2) eligible students in civics, the small-school adjustment factor would be 3.29 (or 1.64). Students in schools assigned to the DBA TEL session had only one subject.

For PBA, the small-school subject adjustment factor formula includes an extra multiplier that reflects the probability that the particular PBA session containing the particular subject was assigned to the school. Students in schools assigned to the PBA geography/U.S. history sessions are assigned to subjects at rates of 4 in 9 for geography and 5 in 9 for U.S. history. Students in schools assigned to the PBA civics session had only one subject. The additional multipliers for the PBA civics and PBA geography/U.S. history session types were 9/4 and 9/5 respectively.

In summary, 46 of the 778 schools assigned to the civics assessment had their weights adjusted to compensate for schools that were too small to do civics. The small-school adjustment factors for civics ranged from 1.10 to 3.29. For the geography assessment, 44 of 777 schools had their school weights adjusted to compensate for schools that were too small to do geography. The small-school adjustment factors for geography ranged from 1.10 to

3.29. For the U.S. history assessment, 50 of 784 schools had their school weights adjusted to compensate for schools that were too small to do U.S. history. The small-school adjustment factors for U.S. history ranged from 1.28 to 2.56.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computation_of_full_sample_school_weights.aspx

Shape22

NAEP Technical Documentation Computation of Full-Sample Student Weights for the 2018 Assessment

The full-sample or final student weight is the sampling weight used to derive NAEP student estimates of population and subpopulation characteristics for a specified grade (8) and assessment subject (civics, geography, U.S. history, or technology and engineering literacy (TEL)). The full-sample student weight reflects the number of students that the sampled student represents in the population for purposes of estimation. The summation of the final student weights over a particular student group provides an estimate of the total number of students in that group within the population.

The full-sample weight, which is used to produce survey estimates, is distinct from a replicate weight that is used to estimate variances of survey estimates. The full-sample weight is assigned to participating students and reflects the student base weight after the application of the various weighting adjustments. The full-sample weight for student k from school s in stratum j (FSTUWGT_jsk) can be expressed as follows:

where

Computation of Base Weights

School and Student Nonresponse Weight Adjustments

School and Student Weight Trimming Adjustments

Student Weight Raking Adjustment

Shape23 STU_BWT_jskis the student base weight;

Shape24 Shape25 Shape26 Shape27 Shape28 SCH_NRAF_jsis the school-level nonresponse adjustment factor; STU_NRAF_jskis the student-level nonresponse adjustment factor; SCH_TRIM_jsis the school-level weight trimming adjustment factor; STU_TRIM_jskis the student-level weight trimming adjustment factor; and STU_RAKE_jskis the student-level raking adjustment factor.

School sampling strata for a given assessment vary by school type. See the links below for descriptions of the school strata for the various assessments.

Shape29 Shape30 Shape31 Public school sample for social sciences Private school sample for social sciences Public school sample for TEL

Shape32 Private school sample for TEL

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computation_of_full_sample_weights_for_the_2018_assessment.aspx

Shape33

NAEP Technical Documentation Computation of Base Weights for the 2018 Assessment

Every sampled school and student received a base weight equal to the reciprocal of its probability of selection. Computation of a school base weight varies by

Shape34 Shape35 type of sampled school (original or substitute), and sampling frame (new school frame or not).

Computation of a student base weight reflects

Shape36 Shape37 the student's overall probability of selection accounting for school and student sampling, assignment to session type at the school- and student-level, and

Shape38 the student's assignment to the civics, geography, U.S. history, or technology and engineering literacy (TEL) assessment.

School Base Weights Student Base Weights

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computation_of_base_weights_for_the_2018_assessment.aspx

Shape39

NAEP Technical Documentation School Base Weights for the 2018 Assessment

The school base weight for a sampled school is equal to the inverse of its overall probability of selection.

The overall selection probability of a sampled school differs by

Shape40 Shape41 type of sampled school (original or substitute), and sampling frame (new school frame or not).

Shape42 The overall probability of selection of an originally selected school reflects two components: the probability of selection of the primary sampling unit (PSU), and

Shape43 the probability of selection of the school within the selected PSU from either the NAEP public school frame or the private school frame.

Shape44 The overall selection probability of a school from the new school frame is the product of two quantities: the probability of selection of the school's district into the new-school district sample, and

Shape45 the probability of selection of the school into the new school sample.

Substitute public schools for the 2018 Social Sciences assessment

Substitute private schools for the 2018 Social Sciences assessment

Substitute public schools for the 2018 Technology and Engineering Literacy (TEL) assessment

Substitute private schools for the 2018 Technology and Engineering Literacy (TEL) assessment

The new-school district sampling procedure for the 2018 public school assessment in social sciences is very similar to the new-school district sampling procedure for the 2018 public school assessment in technology and engineering literacy (TEL).

Substitute schools are preassigned to original schools and take the place of original schools that refuse to participate. For weighting purposes, substitute schools are treated as if they were the original schools that they replaced, so they are assigned the school base weight of their original schools.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/school_base_weights_for_the_2018_assessment.aspx

Shape46

NAEP Technical Documentation Student Base Weights for the 2018 Assessment

Every sampled student received a student base weight, whether or not the student participated in the assessment. The student base weight is the reciprocal of the probability that the student was sampled to participate in the assessment for a specified subject. The student base weight for student k from school s in stratum j (STU_BWT_jsk) is the product of seven weighting components and can be expressed as follows:

where

Shape47

Shape48

Shape49

Shape50

Shape51

Shape52

Shape53

SCH_BWT_jsis the school base weight;

SCHSsessionassignmentESWT_jsis the school-level session assignment weight that reflects the conditional probability, given the school, that the particular session type was assigned to the school;

WINSCHWT_jsis the within-school student weight that reflects the conditional probability, given the school, that the student was selected for the NAEP assessment;

STUSESWT_jskis Stu_bookmark the student-level session assignment weight that reflects the conditional probability, given the particular session type was assigned to the school, that the student was assigned to that session type;

SUBJFAC_jskis the subject spiral adjustment factor that reflects the conditional probability, given the student was assigned to a particular session type, that the student was assigned the specified subject;

SUBADJ_jsis the substitution adjustment factor to account for the difference in enrollment size between the substitute and original school; and

YRRND_AF_jsis the year-round adjustment factor to account for students in year-round schools on scheduled break at the time of the NAEP assessment and thus not available for sample.

The within-school student weight (WINSCHWT_js) is the inverse of the student sampling rate in the school.

The subject spiral adjustment factor (SUBJFAC_jsk) adjusts the student weight to account for the spiral pattern used in distributing civics, geography, U.S. history, or technology and engineering literacy (TEL) booklets to students. The subject factor varies by grade, subject, assessment mode (paper-based or digitally-based) and school type (public/private). It is equal to the inverse of the booklet proportions (civics, geography, U.S. history, or TEL) in the overall spiral for a specific sample.

For cooperating substitutes of nonresponding sampled original schools, the substitution adjustment factor (SUBADJ_js) is equal to the ratio of the estimated grade enrollment for the originally sampled school to the estimated grade enrollment for the substitute school. The student sample from the substitute school then "represents" the set of grade-eligible students from the originally sampled school.

The year-round adjustment factor (YRRND_AF_js) adjusts the student weight for students in year-round schools who do not attend school during the time of the assessment. This situation typically arises in overcrowded schools. School administrators in year-round schools randomly assign students to portions of the year in which they attend school and portions of the year in which they do not attend. At the time of assessment, a certain percentage of students (designated as OFF_js) do not attend school and thus cannot be assessed. The YRRND_AF_jsfor a school is calculated as 1 / (1 - OFF_js/ 100).

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/student_base_weights_for_the_2018_assessment.aspx

Shape54

NAEP Technical Documentation School and Student Nonresponse Weight Adjustments for the 2018 Assessment

Nonresponse is unavoidable in any voluntary survey of a human population. Nonresponse leads to the loss of sample data that must be compensated for in the weights of the responding sample members. This differs from ineligibility, for which no adjustments are necessary. The purpose of the nonresponse adjustments is to reduce the mean square error of survey estimates. While the nonresponse adjustment reduces the bias from the loss of sample, it also increases variability among the survey weights leading to increased variances. However, it is presumed that the reduction in bias more than compensates for the increase in the variance, thereby reducing the mean square error and thus improving the accuracy of survey estimates. Nonresponse adjustments are made in the NAEP surveys at both the

School Nonresponse Weight Adjustment

Student Nonresponse Weight Adjustment

school and the student levels: the responding (original and substitute) schools receive a weighting adjustment to compensate for nonresponding schools, and responding students receive a weighting adjustment to compensate for nonresponding students.

The paradigm used for nonresponse adjustment in NAEP is the quasi-randomization approach (Oh and Scheuren 1983). In this approach, school response cells are based on characteristics of schools known to be related to both response propensity and achievement level, such as the urbanization classification (e.g., city of a metropolitan area) of the school. Likewise, student response cells are based on characteristics of the schools containing the

students and student characteristics that are known to be related to both response propensity and achievement level, such as student race/ethnicity, gender, and age.

Under this approach, sample members are assigned to mutually exclusive and exhaustive response cells based on predetermined characteristics. A nonresponse adjustment factor is calculated for each cell as the ratio of the sum of adjusted base weights for all eligible units to the sum of adjusted base weights for all responding units. The nonresponse adjustment factor is then applied to the base weight of each responding unit. In this way, the weights of responding units in the cell are "weighted up" to represent the full set of responding and nonresponding units in the response cell.

The quasi-randomization paradigm views nonresponse as another stage of sampling. Within each nonresponse cell, the paradigm assumes that the responding sample units are a simple random sample from the total set of all sample units. If this model is valid, then the use of the quasi-randomization weighting adjustment will eliminate any nonresponse bias. Even if this model is not valid, the weighting adjustments will eliminate bias if the achievement scores are homogeneous within the response cells. That is, bias is eliminated if there is homogeneity either in response propensity or in achievement levels. If neither the response propensity nor the achievement scores are perfectly homogeneous, nonresponse weight adjustments will reduce bias if the characteristics that define nonresponse cells are related to both achievement and response propensity. See, for example, chapter 4 of Little and Rubin (1987).

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/school_and_student_nonresponse_weight_adjustments_for_the_2018_assessment.aspx

Shape55

NAEP Technical Documentation School Nonresponse Weight Adjustment for the 2018 Assessment

The school nonresponse adjustment procedure inflates the weights of participating schools to account for eligible nonparticipating schools for which no substitute schools participated. The adjustments are computed within nonresponse cells and are based on the assumption that the participating and nonparticipating schools within the same cell are more similar to one another than to schools from different cells. Nonresponse cell definitions varied for public and private schools.

Development of Initial School Nonresponse Cells

Development of Final School Nonresponse Cells

School Nonresponse Adjustment Factor Calculation

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/school_nonresponse_weight_adjustment_for_the_2018_assessment.aspx

Shape56

NAEP Technical Documentation Development of Final School Nonresponse Cells for the 2018 Assessment

Limits were placed on the magnitude of cell sizes and adjustment factors to prevent unstable nonresponse adjustments and unacceptably large nonresponse factors. All initial weighting cells with fewer than six cooperating schools or adjustment factors greater than three for the full sample weight were collapsed with suitable adjacent cells. Simultaneously, all initial weighting cells for any replicate with fewer than four cooperating schools or adjustment factors greater than the maximum of three (or two times the full sample nonresponse adjustment factor) were collapsed with suitable adjacent cells. Initial weighting cells were generally collapsed in reverse order of the cell structure; that is, starting at the bottom of the nesting structure and working up toward the top level of the nesting structure.

Public Schools

For the public schools, race/ethnicity classification cells (categories based on the total percentage of Black, Hispanic, and American Indian/Alaska Native students) within an urbanization classification (four categories based on urban-centric locale) stratum and census region were collapsed first. If further collapsing was required after all levels of race/ethnicity cells were collapsed, urbanization strata within census region were combined next. Cells were never collapsed across census region.

Private Schools

For the private schools, urbanization classification strata within a census region and affiliation type were collapsed first. If further collapsing was required, census region cells within an affiliation type were collapsed. Cells were never collapsed across affiliation.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/development_of_final_school_nonresponse_cells_for_the_2018_assessment.aspx

Shape57

NAEP Technical Documentation Development of Initial School Nonresponse Cells for the 2018 Assessment

The cells for nonresponse adjustments are generally functions of the school sampling strata. Sampling strata definitions differ for public and private schools.

Public Schools

Shape58 For public school samples, initial weighting cells were formed using the following nesting cell structure: census region,

Shape59 urbanization classification (four categories based on urban-centric locale) stratum, and

Shape60 race/ethnicity classification (categories based on the total percentage of Black, Hispanic, and American Indian/Alaska Native students).

Private Schools

Shape61 For private school samples, initial weighting cells were formed using the following nesting cell structure: affiliation (Catholic or non-Catholic),

Shape62 census region, and

Shape63 urbanization classification (four categories based on urban-centric locale) stratum.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/development_of_initial_school_nonresponse_cells_for_the_2018_assessment.aspx

Shape64

NAEP Technical Documentation School Nonresponse Adjustment Factor Calculation for the 2018 Assessment

In each final school nonresponse adjustment cell c, the school nonresponse adjustment factor SCH_NRAF_cwas computed as follows:

where

Shape65 S_cis the set of all eligible sampled schools (cooperating original and substitute schools and refusing original schools with noncooperating or no assigned substitute) in cell c,

Shape66 Shape67 R_cis the set of all cooperating schools within S_c, SCH_BWT_sis the school base weight,

Shape68 SCH_TRIM_sis the school-level weight trimming factor,

Shape69 SCHSESWT_sis the school-level session assignment weight, and

Shape70 X_sis the estimated grade enrollment corresponding to the original sampled school.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/school_nonresponse_adjustment_factor_calculation_for_the_2018_assessment.aspx

Shape71

NAEP Technical Documentation Student Nonresponse Weight Adjustment for the 2018 Assessment

The student nonresponse adjustment procedure inflates the weights of assessed students to account for eligible sampled students who did not participate in the assessment. These inflation factors offset the loss of data associated with absent students. The adjustments are computed within nonresponse cells and are based on the assumption that the assessed and absent students within the same cell are more similar to one another than to students from different cells. Like its counterpart at the school level, the student nonresponse adjustment is intended to reduce the mean square error and thus improve the accuracy of NAEP assessment estimates.

Development of Initial Student Nonresponse Cells

Development of Final Student Nonresponse Cells

Student Nonresponse Adjustment Factor Calculation

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/student_nonresponse_weight_adjustment_for_the_2018_assessment.aspx

Shape72

NAEP Technical Documentation Development of Final Student Nonresponse Cells for the 2018 Assessment

Similar to the school nonresponse adjustment, cell and adjustment factor size constraints are in place to prevent unstable nonresponse adjustments or unacceptably large adjustment factors. All initial weighting cells with either fewer than 20 participating students, or adjustment factors greater than two for the full sample weight, were collapsed with suitable adjacent cells. Simultaneously, all initial weighting cells for any replicate with either fewer than 15 participating students or an adjustment factor greater than the maximum of two or one and a half times the full sample nonresponse adjustment factor were collapsed with suitable adjacent cells. Initial weighting cells were generally collapsed in reverse order of the cell structure; that is, starting at the bottom of the nesting structure and working up toward the top level of the nesting structure.

Public Schools

For public schools, race/ethnicity cells within students with disabilities/English learners (SD/EL) status and subject, school nonresponse cell, age, and gender classes were collapsed first. If further collapsing was required, cells were next combined across gender, then age, and finally school nonresponse cells. Cells for public schools are never collapsed across SD/EL status and subject.

Private Schools

For private schools, race/ethnicity cells within school nonresponse cell, age, and gender classes were collapsed first. If further collapsing was required, cells were next combined across gender, then age, and finally school nonresponse cells.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/development_of_final_student_nonresponse_cells_for_the_2018_assessment.aspx

Shape73

NAEP Technical Documentation Development of Initial Student Nonresponse Cells for the 2018 Assessment

Cell definitions for the student nonresponse adjustment vary by public and private schools.

Public Schools

Shape74 The public school samples formed initial student nonresponse cells using the following nesting cell structure: students with disabilities (SD)/English learners (EL) by subject,

Shape75 school nonresponse cell,

Shape76 Shape77 relative age¹(classified into "older" student and "modal age or younger" student), gender, and

Shape78 race/ethnicity.

The highest level variable in the cell structure separates students who were classified either as with disabilities (SD) or as English learners (EL) from those who are neither, since SD or EL students tend to score lower on assessment tests than those without these limitations. In addition, the SD/EL categories are further broken down by subject since rules for excluding students from the assessment differ by subject. Non-SD/EL students are not broken down by subject because the exclusion rules do not apply to them.

Private Schools

Shape79 The private school samples formed initial student nonresponse cells using the following nesting cell structure: school nonresponse cell,

Shape80 Shape81 relative age¹(classified into "older" student and "modal age or younger" student), gender, and

Shape82 race/ethnicity.

The structure for private school cells for student nonresponse is slightly different from public school cells. SD/EL status is not included in the private school cell structure because very few students (between 5-8 percent) in private schools are classified as SD/EL.

¹Older students in the grade 8 assessment are those born before October 1, 2003. Students born after that date are classified as modal age or younger.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/development_of_initial_student_nonresponse_cells_for_the_2018_assessment.aspx

Shape83

NAEP Technical Documentation Student Nonresponse Adjustment Factor Calculation for the 2018 Assessment

In each final student nonresponse adjustment cell c for a given sample, the student nonresponse adjustment factor STU_NRAF_cwas computed as follows:

where

Shape84

Shape85

Shape86

Shape87

Shape88

Shape89

S_cis the set of all eligible sampled students in cell c for a given sample,

R_cis the set of all assessed students within S_c,

STU_BWT_kis the student base weight for a given student k,

SCH_TRIM_kis the school-level weight trimming factor for the school associated with student k, SCH_NRAF_kis the school-level nonresponse adjustment factor for the school associated with student k, and SUBJFAC_kis the subject factor for a given student k.

The student weight used in the calculation above is the adjusted student base weight, without regard to subject, adjusted for school weight trimming and school nonresponse.

Nonresponse adjustment procedures are not applied to excluded students because they are not required to complete an assessment. In effect, excluded students were placed in a separate nonresponse cell by themselves and all received an adjustment factor of one. While excluded students are not included in the analysis of the NAEP scores, weights are provided for excluded students in order to estimate the size of this group and its population characteristics.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/student_nonresponse_adjustment_factor_calculation_for_the_2018_assessment.aspx

Shape90

NAEP Technical Documentation School and Student Weight Trimming Adjustments for the 2018 Assessment

Weight trimming is an adjustment procedure that involves detecting and reducing extremely large weights. "Extremely large weights" generally refer to large sampling weights that were not anticipated in the design of the sample. Unusually large weights are likely to produce large sampling variances for statistics of interest, especially when the large weights are associated with sample cases reflective of rare or atypical characteristics. To reduce the impact of these large weights on variances, weight reduction methods are typically employed. The goal of weight reduction methods is to reduce the mean square error of survey estimates. While the trimming of large weights

Trimming of School Base Weights

Trimming of Student Weights

reduces variances, it also introduces some bias. However, it is presumed that the reduction in the variances more than compensates for the increase in the bias, thereby reducing the mean square error and thus improving the accuracy of survey estimates (Potter 1988). NAEP employs weight trimming at both the school and student levels.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/school_and_student_weight_trimming_adjustments_for_the_2018_assessment.aspx

Shape91

NAEP Technical Documentation Trimming of School Base Weights for the 2018 Assessment

Large school weights can occur for public schools selected from the NAEP new-school sampling frame and for private schools. New schools that are eligible for weight trimming are public schools with a disproportionately large student enrollment in a particular grade from a school district that was selected with a low probability of selection. The school base weights for such schools may be large relative to what they would have been if they had been selected as part of the original sample.

To detect extremely large weights among new schools, a comparison was made between a new school's school base weight and its ideal weight (i.e., the weight that would have resulted had the school been selected from the original school sampling frame). If the school base weight was more than three times the ideal weight, a trimming factor was calculated for that school that scaled the base weight back to three times the ideal weight. The calculation of the school-level trimming factor for a new school s is expressed in the following formula:

where

Shape92

Shape93

EXP_WT_sis the ideal base weight the school would have received if it had been on the NAEP public school sampling frame, and

SCH_BWT_sis the actual school base weight the school received as a sampled school from the new school frame.

No new schools in the NAEP 2018 sample had their weights trimmed.

Private schools eligible for weight trimming were Private School Universe Survey (PSS) nonrespondents who were found subsequently to have either larger enrollments than assumed at the time of sampling, or an atypical probability of selection given their affiliation, the latter being unknown at the time of sampling. For private school s, the formula for computing the school-level weight trimming factor SCH_TRIM_sis identical to that used for new schools. For private schools,

Shape94 EXP_WT_sis the ideal base weight the school would have received if it had been on the NAEP private school sampling frame with accurate enrollment and known affiliation, and

Shape95 SCH_BWT_sis the actual school base weight the school received as a sampled private school.

No private schools in the NAEP 2018 sample had their weights trimmed.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/trimming_of_school_base_weights_for_the_2018_assessment.aspx

Shape96

NAEP Technical Documentation Trimming of Student Weights for the 2018 Assessment

Large student weights generally come from compounding nonresponse adjustments at the school and student levels with artificially low first-stage selection probabilities, which can result from inaccurate enrollment data on the school frame used to define the school size measure. Even though

measures are in place to limit the number and size of excessively large weights—such as the implementation of adjustment factor size constraints in both the school and student nonresponse procedures and the use of the school trimming procedure—large student weights can still occur due to compounding effects of various weighting components.

The student weight trimming procedure uses a multiple median rule to detect excessively large student weights. Any student weight within a given trimming group greater than a specified multiple of the median weight value of the given trimming group has its weight scaled back to that threshold. Trimming groups were defined by region and high minority (American Indian/Alaska Native, or Black/Hispanic) strata for public schools and affiliation (Catholic/non-Catholic) for private schools.

The procedure computes the median of the nonresponse-adjusted student weights in the trimming group g for a given grade and subject sample. Any student k with a weight more than M times the median (where M = 3.5 for public and private schools) received a trimming factor calculated as follows:

where

Shape97
Shape98
Shape99

M is the trimming multiple,

MEDIAN_gis the median of nonresponse-adjusted student weights in trimming group g, and

STUWGT_gkis the weight after student nonresponse adjustment for student k in trimming group g.

In the 2018 NAEP assessment, relatively few students had weights considered excessively large. Out of the approximately 59,100 students in the combined 2018 assessment samples, approximately 200 students had their weights trimmed.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/trimming_of_student_weights_for_the_2018_assessment.aspx

Shape100

NAEP Technical Documentation Student Weight Raking Adjustment for the 2018 Assessment

Development of Final Raking Dimensions

Weighted estimates of population totals for student-level subgroups for a given grade will vary across subjects even though the student samples for each subject generally come from the same schools. These differences are the result of sampling error associated with the random assignment of subjects to students through a process known as spiraling. Any difference in demographic estimates between subjects, no matter how small, may raise concerns about data quality. To remove

Raking Adjustment Control Totals Raking Adjustment Factor Calculation

these random differences and potential data quality concerns, a step was added to the NAEP weighting procedure in 2009. This step adjusts the student weights in such a way that the weighted sums of population totals for specific student groups are the same across all subjects. It was implemented using a raking procedure and applied only to public school assessments.

Raking is a weighting procedure based on the iterative proportional fitting process developed by Deming and Stephan (1940) and involves simultaneous ratio adjustments to two or more marginal distributions of population totals. Each set of marginal population totals is known as a dimension, and each population total in a dimension is referred to as a control total. Raking is carried out in a sequence of adjustments. Sampling weights are adjusted to one marginal distribution and then to the second marginal distribution, and so on. One cycle of sequential adjustments to the marginal distributions is called an iteration. The procedure is repeated until convergence is achieved. The criterion for convergence can be specified either as the maximum number of iterations or an absolute difference (or relative absolute difference) from the marginal population totals. More discussion on raking can be found in Oh and Scheuren (1987).

For NAEP 2018, the student raking adjustment was carried out separately for the geography and U.S. history public school samples at grade 8. The dimensions used in the raking process were race/ethnicity, SD/EL status, and gender. The control totals for the raking dimensions for both subject-based samples (i.e., geography and U.S. history) for grade 8 were obtained from summing the combined DBA/PBA student sample weights of the geography and U.S. history public school samples combined.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/student_weight_raking_adjustment_for_the_2018_assessment.aspx

Shape101

NAEP Technical Documentation Development of Final Raking Dimensions for the 2018 Assessment

The raking procedure involved three dimensions. The variables used to define the dimensions are listed below along with the categories making up the initial raking cells for each dimension.

Race/Ethnicity

White, not Hispanic
Black, not Hispanic

Hispanic
Asian
American Indian/Alaska Native
Native Hawaiian/Pacific Islander
Two or More Races

SD/EL status

SD, but not EL
EL, but not SD
SD and EL
Neither SD nor EL

Gender

Male
Female

The initial cells were created at the national level. Similar to the procedure used for school and student nonresponse adjustments, limits were placed on the magnitude of the cell sizes and adjustment factors to prevent unstable raking adjustments that could have resulted in unacceptably large or small adjustment factors. Levels of a dimension were combined whenever there were fewer than 30 assessed or excluded students (20 for any of the replicates) in a category, if the smallest adjustment was less than 0.5, or if the largest adjustment was greater than 2 for the full sample or for any replicate.

If collapsing was necessary for the race/ethnicity dimension, individual groups with similar student achievement levels were combined first. If further collapsing was necessary, the next closest race/ethnicity group was combined as well, and so on until all collapsing rules were satisfied. In some instances, all seven categories had to be collapsed.

If collapsing was necessary for the SD/EL dimension, the SD/not EL and SD/EL categories were combined first, followed by EL/not SD if further collapsing was necessary. In some instances, all four categories had to be collapsed.

Collapsing gender is generally not expected. However, in the rare event that it is necessary, male and female categories would be collapsed.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/development_of_final_raking_dimensions_for_the_2018_assessment.aspx

Shape102

NAEP Technical Documentation Raking Adjustment Control Totals for the 2018 Assessment

The control totals used in the raking procedure for NAEP 2018 at grade 8 were estimates of the student population derived from the set of assessed and excluded students pooled across subjects (geography and U.S. history) and assessment modes (PBA and DBA). The control totals for category c within dimension d were computed as follows:

where

Shape103

Shape104

Shape105
Shape106
Shape107
Shape108
Shape109

R_c(d)is the set of all assessed students in category c of dimension d,

E_c(d)is the set of all excluded students in category c of dimension d, STU_BWT_kis the student base weight for a given student k,

SCH_TRIM_kis the school-level weight trimming factor for the school associated with student k, SCH_NRAF_kis the school-level nonresponse adjustment factor for the school associated with student k, STU_NRAF_kis the student-level nonresponse adjustment factor for student k, and

SUBJFAC_kis the subject factor for student k.

The student weight used in the calculation of the control totals above is the adjusted student base weight, without regard to subject, adjusted for school weight trimming, school nonresponse, and student nonresponse. Control totals were computed for the full sample and for each replicate independently.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/raking_adjustment_control_totals_for_the_2018_assessment.aspx

Shape110

NAEP Technical Documentation Raking Adjustment Factor Calculation for the 2018 Assessment

For assessed and excluded students in a given subject, the raking adjustment factor STU_RAKE_kwas computed as below. First, the weight for student k was initialized as follows:

where

Shape111

Shape112

Shape113

Shape114

Shape115

STU_BWT_kis the student base weight for a given student k

SCH_TRIM_kis the school-level weight trimming factor for the school associated with student k SCH_NRAF_kis the school-level nonresponse adjustment factor for the school associated with student k, STU_NRAF_kis the student-level nonresponse adjustment factor for student k, and

SUBJFAC_kis the subject factor for student k.

Then, the sequence of weights for the first iteration was calculated as follows for student k in category c of dimension d: For dimension 1:

For dimension 2:

For dimension 3:

where

Shape116
Shape117
Shape118

R_c(d)is the set of all assessed students in category c of dimension d, E_c(d)is the set of all excluded students in category c of dimension d, and Total_c(d)is the control total for category c of dimension d.

The process is said to converge if the maximum difference between the sum of adjusted weights and the control totals is 1.0 for each category in each dimension. If after the sequence of adjustments the maximum difference was greater than 1.0, the process continues to the next iteration, cycling back to the first dimension with the initial weight for student k equaling STUSAWT_k^adj(3)from the previous iteration. The process continued until convergence was reached.

Once the process converged, the adjustment factor was computed as follows:

where

Shape119

STUSAWT_kis the weight for student k after convergence.

The process was done independently for the full sample and for each replicate.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/raking_adjustment_factor_calculation_for_the_2018_assessment.aspx

Shape120

NAEP Technical Documentation Computation of Replicate School Weights for Variance Estimation for the 2018 Assessment

In addition to the full-sample weight, a set of 62 replicate weights was provided for each school. These replicate weights are used in calculating the sampling variance of estimates obtained from the data, using the jackknife repeated replication method. The method of deriving these weights was aimed at reflecting the features of the sample design appropriately for each sample, so that when the jackknife variance estimation procedure is implemented, approximately unbiased estimates of sampling variance are obtained. This section gives the specifics for generating the replicate weights for the 2018 assessment samples.

For each sample, replicates were formed in two steps. First, each school was assigned to one or more of 62 replicate strata. In the next step, a random subset of schools in each replicate stratum was excluded. The remaining subset and all schools in the other replicate strata then constituted one of the 62 replicates.

Defining Replicate Strata and Forming Replicates

A replicate weight was calculated for each of the 62 replicates using weighting procedures similar to those used for the full-sample weight. Each replicate base weight contains an additional component, known as a replicate factor, to account for the subsetting of the sample to form the replicate. By repeating the various weighting procedures on each set of replicate base weights, the impact of these procedures on the sampling variance of an estimate is appropriately reflected in the variance estimate.

Each of the 62 replicate weights for school s in stratum j can be expressed as follows:

where

Shape121

Shape122

Shape123

Shape124

SCH_BWT_js(r) is the replicate school base weight for replicate r SCH_NRAF_js(r) is the school-level nonresponse adjustment factor for replicate r SCH_TRIM_jsis the school-level weight trimming adjustment factor; and SCH_SUBJ_AF_sis the small-school subject adjustment factor.

Specific school nonresponse adjustment factors were calculated separately for each replicate, as indicated by the index (r) in the formula, and applied to the replicate school base weights. Computing separate nonresponse adjustment factors for each replicate allows resulting variances from the use of the final school replicate weights to reflect components of variance due to this weight adjustment.

School weight trimming adjustments were not replicated, that is, not calculated separately for each replicate. Instead, each replicate used the school trimming adjustment factors derived for the full sample. Statistical theory for replicating trimming adjustments under the jackknife approach has not been developed in the literature. Due to the absence of a statistical framework, and since relatively few school weights in NAEP require trimming, the weight trimming adjustments were not replicated.

In addition, the small-school subject adjustment factor also used the same factor derived for the full sample.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computation_of_replicate_school_weights_for_variance_estimation_for_the_2018_assessment.aspx

Shape125

NAEP Technical Documentation Computation of Replicate Student Weights for Variance Estimation for the 2018 Assessment

In addition to the full-sample weight, a set of 62 replicate weights was provided for each student. These replicate weights are used in calculating the sampling variance of estimates obtained from the data, using the jackknife repeated replication method. The method of deriving these weights was aimed at reflecting the features of the sample design appropriately for each sample, so that when the jackknife variance estimation procedure is implemented, approximately unbiased estimates of sampling variance are obtained. This section gives the specifics for generating the replicate weights for the 2018 assessment samples. The theory that underlies the jackknife variance estimators used in NAEP studies is discussed in the section Replicate Variance Estimation.

For each sample, replicates were formed in two steps. First, each school was assigned to one or more of 62 replicate strata. In the next step, a random subset of schools (or, in some cases, students within

Defining Replicate Strata and Forming Replicates

Computing School-Level Replicate Base Weights

Computing Student-Level Replicate Base Weights

Replicate Variance Estimation

schools) in each replicate stratum was excluded. The remaining subset and all schools in the other replicate strata then constituted one of the 62 replicates.

Each of the 62 replicate weights for student k in school s and stratum j can be expressed as follows:

where

Shape126
Shape127
Shape128
Shape129
Shape130
Shape131

STU_BWT_jsk(r) is the student base weight for replicate r;

SCH_NRAF_js(r) is the school-level nonresponse adjustment factor for replicate r; STU_NRAF_jsk(r) is the student-level nonresponse adjustment factor for replicate r; SCH_TRIM_jsis the school-level weight trimming adjustment factor;

STU_TRIM_jskis the student-level weight trimming adjustment factor; and

STU_RAKEjsk(r) is the student-level raking adjustment factor for replicate r.

Specific school and student nonresponse and student-level raking adjustment factors were calculated separately for each replicate, as indicated by the index (r) in the formula, and applied to the replicate student base weights. Computing separate nonresponse and raking adjustment factors for each replicate allows resulting variances from the use of the final student replicate weights to reflect components of variance due to these various weight adjustments.

School and student weight trimming adjustments were not replicated, that is, not calculated separately for each replicate. Instead, each replicate used the school and student trimming adjustment factors derived for the full sample. Statistical theory for replicating trimming adjustments under the jackknife approach has not been developed in the literature. Due to the absence of a statistical framework, and since relatively few school and student weights in NAEP require trimming, the weight trimming adjustments were not replicated.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computation_of_replicate_weights_for_variance_estimation_for_the_2018_assessment.aspx

Shape132

NAEP Technical Documentation Computing School-Level Replicate Base Weights for the 2018 Assessment

For the NAEP 2018 assessment, school-level replicate base weights for school s in primary stratum j (SCH_BWT_js(r), r = 1,..., 62) were calculated as follows:

where

Shape133
Shape134
Shape135

SCH_BWT_jsis the school base weight for school s in primary stratum j,

R_jris the set of schools within the r-th replicate stratum for primary stratum j, and

U_jsis the variance unit (1 or 2) for school s in primary stratum j.

For schools in replicate strata comprising three variance units, two sets of school-level replicate base weights were computed (see replicate variance estimation for details): one for the first replicate r₁and another for the second replicate r₂. The two sets of school-level replicate base weights SCH_BWT_js(r₁), r₁= 1,..., 62 and SCH_BWT_js(r₂), r₂= 1,..., 62 were calculated as described below.

where

Shape136
Shape137
Shape138
Shape139

SCH_BWT_jsis the school base weight for school s in primary stratum j,

R_jr1is the set of schools within the r₁-th replicate stratum for primary stratum j, R_jr2is the set of schools within the r₂-th replicate stratum for primary stratum j, and U_jsis the variance unit (1, 2, or 3) for school s in primary stratum j.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computing_school_level_replicate_base_weights_for_the_2018_assessment.aspx

Shape140

NAEP Technical Documentation Computing Student-Level Replicate Base Weights for the 2018 Assessment

For the 2018 assessment, the calculation of the student-level replicate base weights for student k from school s in stratum j for each of the 62 replicates,

STU_BWT_jsk(r), where r = 1 to 62, were calculated as follows:

where

Shape141
Shape142
Shape143
Shape144
Shape145
Shape146
Shape147

SCH_BWT_js(r) is the replicate school base weight;

SCHSESWT_jsis the school-level session assignment weight used in the full-sample weight; WINSCHWT_jsis the within-school student sampling weight used in the full-sample weight; STUSESWT_jskis the student-level session assignment weight used in the full-sample weight; SUBJFAC_jsis the subject factor used in the full-sample weight;

SUBADJ_jsis the substitute adjustment factor used in the full-sample weight; and

YRRND_AF_jsis the year-round adjustment factor used in the full-sample weight.

These components are described on the Student Base Weights page.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/computing_student_level_replicate_base_weights_for_the_2018_assessment.aspx

Shape148

NAEP Technical Documentation Defining Replicate Strata and Forming Replicates for the 2018 Assessment

For the NAEP 2018 assessment, replicates were formed separately for each sample indicated by grade (8) and subject (civics/geography/U.S. history or TEL). The first step in forming replicates was to assign each first-stage sampling unit in a primary stratum to a replicate stratum. The formation of replicate strata varied by noncertainty and certainty primary sampling units (PSUs). For noncertainty PSUs, the first-stage units were PSUs, and the

primary stratum was the combination of region and metropolitan status. For certainty PSUs, the first-stage units were schools, and the primary stratum was school type (public or private).

Civics/Geography/U.S. History Assessments

For noncertainty PSUs, where only one PSU was selected per PSU stratum, replicate strata were formed by pairing sampled PSUs with similar stratum characteristics within the same primary stratum (region by metropolitan status). This was accomplished by first sorting the 76 sampled PSUs by PSU stratum number and then grouping adjacent PSUs into 38 pairs. The values for a PSU stratum number reflect region and metropolitan status, as well as socioeconomic characteristics such as percent of renters and percentage of children below the poverty line. The formation of these replicate strata in this manner models a design of selecting two PSUs with probability proportional to size with replacement from each of 38 strata.

For certainty PSUs, the first stage of sampling is at the school level, and the formation of replicate strata must reflect the sampling of schools within the certainty PSUs. Replicate strata were formed by sorting the sampled schools in the 29 certainty PSUs by their order of selection within a primary stratum (school type) so that the sort order reflected the implicit stratification (region, urbanization classification, race/ethnicity classification, and student enrollment for public schools; and region, private school type, and student enrollment size for private schools) and systematic sampling features of the sample design.

The first-stage units were then paired off into 24 preliminary replicate strata. Within each primary stratum with an even number of first-stage units, all of the preliminary replicate strata were pairs, and within primary strata with an odd number of first-stage units, one of the replicate strata was a triplet (the last one), and all others were pairs.

If there were more than 24 preliminary replicate strata within a primary stratum, the preliminary replicate strata were grouped to form 24 replicate strata. This grouping effectively maximized the distance in the sort order between grouped preliminary replicate strata. The first 24 preliminary replicate strata, for example, were assigned to 24 different final replicate strata in order (1 through 24), with the next 24 preliminary replicate strata assigned to final replicate strata 1 through 24, so that, for example, preliminary replicate stratum 1, preliminary replicate stratum 25, preliminary replicate stratum 49 (if there were that many), etc., were all assigned to the first final replicate stratum. The final replicate strata for the schools in the certainty PSUs were 1 through 24.

Within each pair of preliminary replicate stratum, the first first-stage unit was assigned as the first variance unit and the second first-stage unit as the second variance unit. Within each triplet preliminary replicate stratum, the three schools were assigned variance units 1 through 3.

TEL Assessment

For noncertainty PSUs, where only one PSU was selected per PSU stratum, replicate strata were formed by pairing sampled PSUs with similar stratum characteristics within the same primary stratum (region by metropolitan status). This was accomplished by first sorting the 38 sampled PSUs by PSU stratum number and then grouping adjuacent PSUs into 19 pairs. The values for a PSU stratum number reflect region and metropolitan status, as well as socioeconomic characteristics such as percent of renters and percentage of children below the poverty line. The formation of these replicate strata in this manner models a design of selecting two PSUs with probability proportional to size with replacement from each of 19 strata.

enrollment for public schools; and region, private school type, and student enrollment size for private schools) and systematic sampling features of the sample design.

The first-stage units were then paired off into 43 preliminary replicate strata. Within each primary stratum with an even number of first-stage units, all of the preliminary replicate strata were pairs, and within primary strata with an odd number of first-stage units, one of the replicate strata was a triplet (the last one), and all others were pairs.

If there were more than 43 preliminary replicate strata within a primary stratum, the preliminary replicate strata were grouped to form 43 replicate strata. This grouping effectively maximized the distance in the sort order between grouped preliminary replicate strata. The first 43 preliminary replicate strata, for example, were assigned to 43 different final replicate strata in order (1 through 43), with the next 43 preliminary replicate strata assigned to final replicate strata 1 through 43, so that, for example, preliminary replicate stratum 1, preliminary replicate stratum 44, preliminary replicate stratum 87 (if there were that many), etc., were all assigned to the first final replicate stratum. The final replicate strata for the schools in the certainty PSUs were 1 through 43.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/defining_replicate_strata_and_forming_replicates_for_the_2018_assessment.aspx

Shape149

NAEP Technical Documentation Replicate Variance Estimation for the 2018 Assessment

Variances for NAEP assessment estimates are computed using the paired jackknife replicate variance procedure. This technique is applicable for common statistics, such as means and ratios, and differences between these for different subgroups, as well as for more complex statistics such as linear or logistic regression coefficients.

In general, the paired jackknife replicate variance procedure involves initially pairing clusters of first-stage sampling units to form H variance strata (h = 1, 2, 3, ...,H) with two units per stratum. The first replicate is formed by assigning, to one unit at random from the first variance stratum, a replicate weighting factor of less than 1.0, while assigning the remaining unit a complementary replicate factor greater than 1.0, and assigning all other units from the other (H - 1) strata a replicate factor of 1.0. This procedure is carried out for each variance stratum resulting in H replicates, each of which provides an estimate of the population total.

In general, this process is repeated for subsequent levels of sampling. In practice, this is not practicable for a design with three or more stages of sampling, and the marginal improvement in precision of the variance estimates would be negligible in all such cases in the NAEP setting. Thus in NAEP,

when a two-stage design is used – sampling schools and then students – beginning in 2011 replication is carried out at both stages for the purpose of computing replicate student weights. The change implemented in 2011 permitted the introduction of a finite population correction factor at the school sampling stage. Prior to 2011, replication was only carried out at the first stage of selection for the purpose of computing replicate student weights. See Rizzo and Rust (2011) for a description of the methodology.

When a three-stage design is used, involving the selection of geographic primary sampling units (PSUs), then schools, and then students, the replication procedure for the purpose of computing replicate student weights is only carried out at the first stage of sampling (the PSU stage for noncertainty PSUs, and the school stage within certainty PSUs). In this situation, the school and student variance components are correctly estimated, and the overstatement of the between-PSU variance component is relatively very small.

The jackknife estimate of the variance for any given statistic is given by the following formula:

where

Shape150 Shape151 represents the full sample estimate of the given statistic, and represents the corresponding estimate for replicate h.

Each replicate undergoes the same weighting procedure as the full sample so that the jackknife variance estimator reflects the contributions to or reductions in variance resulting from the various weighting adjustments.

The NAEP jackknife variance estimator is based on 62 variance strata resulting in a set of 62 replicate weights assigned to each school and student.

The basic idea of the paired jackknife variance estimator is to create the replicate weights so that use of the jackknife procedure results in an unbiased variance estimator for sample totals and means, which is also reasonably efficient (i.e., has a low variance as a variance estimator). The jackknife variance estimator will then produce a consistent (but not fully unbiased) estimate of variance for (sufficiently smooth) nonlinear functions of total and mean estimates such as ratios, regression coefficients, and so forth (Shao and Tu, 1995).

The development below shows why the NAEP jackknife variance estimator returns an unbiased variance estimator for totals and means, which is the cornerstone to the asymptotic results for nonlinear estimators. For example, see Rust (1985). This paper also discusses why this variance estimator is generally efficient (i.e., more reliable than alternative approaches requiring similar computational resources).

The development is done for an estimate of a mean based on a simplified sample design that closely approximates the sample design for first-stage units used in the NAEP studies. The sample design is a stratified random sample with H strata with population weights W_h, stratum sample sizes n_h, and

stratum sample means . The population estimator and standard unbiased variance estimator are:

with

The paired jackknife replicate variance estimator assigns one replicate h=1,…, H to each stratum, so that the number of replicates equals H. In NAEP, the replicates correspond generally to pairs and triplets (with the latter only being used if there are an odd number of sample units within a particular primary stratum generating replicate strata). For pairs, the process of generating replicates can be viewed as taking a simple random sample (J) of size n_h/2 within the replicate stratum, and assigning an increased weight to the sampled elements, and a decreased weight to the unsampled elements. In certain applications, the increased weight is double the full sample weight, while the decreased weight is in fact equal to zero. In this simplified case, this

assignment reduces to replacing with , the latter being the sample mean of the sampled n_h/2 units. Then the replicate estimator corresponding to stratum r is

The r-th term in the sum of squares for is thus:

In stratified random sampling, when a sample of size n_r/2 is drawn without replacement from a population of size n_r, the sampling variance is

For example, see Cochran (1977), Theorem 5.3, using n_ras the “population size,” n_r/2 as the “sample size,” and s ²as the “population variance” in the given formula. Thus,

Taking the expectation over all of these stratified samples of size n_r/2, it is found that

In this sense, the jackknife variance estimator “gives back” the sample variance estimator for means and totals as desired under the theory.

In cases where, rather than doubling the weight of one half of one variance stratum and assigning a zero weight to the other, the weight of one unit is multiplied by a replicate factor of (1+δ), while the other is multiplied by (1- δ), the result is that

In this way, by setting δ equal to the square root of the finite population correction factor, the jackknife variance estimator is able to incorporate a finite population correction factor into the variance estimator.

In practice, variance strata are also grouped to make sure that the number of replicates is not too large (the total number of variance strata is usually 62 for NAEP). The randomization from the original sample distribution guarantees that the sum of squares contributed by each replicate will be close to the target expected value.

For triples, the replicate factors are perturbed to something other than 1.0 for two different replicate factors, rather than just one as in the case of pairs. Again, in the simple case where replicate factors that are less than one are all set to zero, the replicate weights are calculated as follows:

For unit i in variance stratum r

where weight w_iis the full sample base weight. Furthermore, for r' = r + 31 (mod 62):

And for all other values r*, other than r and r´,w_i(r*) = 1.

In the case of stratified random sampling, this formula reduces to replacing with for replicate r, where is the sample mean from a

“2/3” sample of 2n_r/3 units from the n_rsample units in the replicate stratum, and replacing with for replicate r', where is the sample mean from another overlapping “2/3” sample of 2n_r/3 units from the n_rsample units in the replicate stratum.

The r-th and r´-th replicates can be written as:

From these formulas, expressions for the r-th and r´-th components of the jackknife variance estimator are obtained (ignoring other sums of squares from other grouped components attached to those replicates):

These sums of squares have expectations as follows, using the general formula for sampling variances:

Thus,

as desired again.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/replicate_variance_estimation_for_the_2018_assessment.aspx

Shape152

NAEP Technical Documentation Quality Control on Weighting Procedures for the 2018 Assessment

Given the complexity of the weighting procedures utilized in NAEP, a range of quality control (QC) checks was conducted throughout the weighting process to identify potential problems with collected student-level demographic data or with specific weighting procedures. The QC processes included

Shape153 Shape154 Shape155 Shape156 checks performed within each step of the weighting process; checks performed across adjacent steps of the weighting process; review of response, exclusion, and accommodation rates; checking demographic data of individual schools;

Shape157 Shape158 comparisons with 2014 demographic data; and nonresponse bias analyses.

Participation, Exclusion, and Accommodation Rates

Nonresponse Bias Analyses

To validate the weighting process, extensive tabulations of various school and student characteristics at different stages of the process were conducted. The school-level characteristics included in the tabulations were enrollment by race/ethnicity and urbanization classification (four categories based on urban-centric locale). At the student level, the tabulations included race/ethnicity, gender, relative age, students with disability (SD) status, English learners (EL) status, and participation status in National School Lunch Program (NSLP).

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/quality_control_on_weighting_procedures_for_the_2018_assessment.aspx

Shape159

NAEP Technical Documentation Final Participation, Exclusion, and Accommodation Rates for the 2018 Assessment

Final participation, exclusion, and accommodation rates were presented in quality control tables for the grade 8 assessments by reporting group. School-level participation rates were calculated as they had been calculated for previous assessments and according to National Center for Education Statistics (NCES) standards.

Grade 8 - Civics

Grade 8 - Geography

School-level participation rates were below 85 percent for public schools in grade 8 civics/geography/U.S. history and for private schools in both grade 8 civics/geography/U.S. history and grade 8 TEL. Student-level participation rates were all above 85 percent. As required by NCES standards, nonresponse bias analyses were conducted on each reporting group falling below the 85 percent participation threshold.

Grade 8 - U.S. History

Grade 8 - Technology and Engineering Literacy (TEL)

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/final_participation_exclusion_and_accommodation_rates_for_the_2018_assessment.aspx

Shape160

NAEP Technical Documentation Participation, Exclusion, and Accommodation Rates for Grade 8 Civics

The following table displays the school-level participation rates and student-level participation, exclusion, and accommodation rates for the grade 8 civics assessment. Various weights were used in the calculation of the rates, as indicated in the column headings of the table.

The participation rates reflect the participation of the original sampled schools only and do not reflect any effect of substitution. The rates weighted by the school base weight and enrollment show the approximate proportion of the student population in the domain that is represented by the responding schools in the sample. These rates differ because schools differ in size.

Participation, exclusion, and accommodation rates for grade 8 civics national assessment, by geographic region and school type: 2018

		School	School
		participation	participation
		rates (percent)	rates (percent)
		before	before
		substitution	substitution			Weighted
	Number	(weighted by	(weighted by		Weighted	student	Weighted
	of schools	school base	school base	Number of	percent of	participation	percent of
Geographic region	in original	weight	weight	students	students	rates (percent)	students
and school type	sample	and enrollment)	only)	sampled	excluded	after makeups	accommodated

		School	School
		participation	participation
		rates (percent)	rates (percent)
		before	before
		substitution	substitution			Weighted
	Number	(weighted by	(weighted by		Weighted	student	Weighted
	of schools	school base	school base	Number of	percent of	participation	percent of
Geographic region	in original	weight	weight	students	students	rates (percent)	students
and school type	sample	and enrollment)	only)	sampled	excluded	after makeups	accommodated

_National_all¹1,130 81.82 73.92 15,500 1.84 92.01 12.45

Northeast all	190	90.33	77.75	2,400	2.07	89.11	15.53
Midwest all	220	71.02	71.86	2,500	1.21	92.65	11.42
South all	440	83.97	77.05	6,200	2.32	92.70	14.59
West all	270	81.88	68.41	4,300	1.47	92.57	7.99
National public	790	84.44	85.88	14,000	1.99	91.96	13.04
National private	330	48.66	51.93	1,500	0.05	92.93	5.29
Catholic	80	85.85	89.89	700	0.16	94.64	4.12
Non-Catholic	250	27.28	36.80	800	0.00	90.60	5.91

Shape161

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/participation_exclusion_and_accommodation_rates_for_grade_8_civics.aspx

Shape162

NAEP Technical Documentation Participation, Exclusion, and Accommodation Rates for Grade 8 Geography

The following table displays the school-level participation rates and student-level participation, exclusion, and accommodation rates for the grade 8 geography assessment. Various weights were used in the calculation of the rates, as indicated in the column headings of the table.

Participation, exclusion, and accommodation rates for grade 8 geography national assessment, by geographic region and school type: 2018

		School	School
		participation	participation
		rates (percent)	rates (percent)
		before	before
		substitution	substitution			Weighted
	Number	(weighted by	(weighted by		Weighted	student	Weighted
	of schools	school base	school base	Number of	percent of	participation	percent of
Geographic region	in original	weight	weight	students	students	rates (percent)	students
and school type	sample	and enrollment)	only)	sampled	excluded	after makeups	accommodated

_National_all¹1,130 81.82 73.92 14,900 1.75 92.03 12.30

Northeast all	190	90.33	77.75	2,500	2.42	90.10	15.26
Midwest all	220	71.02	71.86	2,300	1.48	92.88	11.04
South all	440	83.97	77.05	6,100	1.85	92.60	13.59
West all	270	81.88	68.41	3,900	1.38	91.91	9.31
National public	790	84.44	85.88	13,800	1.87	91.95	12.86
National private	330	48.66	51.93	1,100	0.32	93.41	5.02
Catholic	80	85.85	89.89	600	0.00	94.22	3.74
¹National all includes national public, national private, Bureau of Indian Education (BIE), and Department of Defense Education Activity (DoDEA) schools that are located in the United States. School counts are rounded to nearest ten and student counts are rounded to nearest hundred. Detail may not sum to totals because of rounding. SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2018 Geography Assessment.

		School	School
		participation	participation
		rates (percent)	rates (percent)
		before	before
		substitution	substitution			Weighted
	Number	(weighted by	(weighted by		Weighted	student	Weighted
	of schools	school base	school base	Number of	percent of	participation	percent of
Geographic region	in original	weight	weight	students	students	rates (percent)	students
and school type	sample	and enrollment)	only)	sampled	excluded	after makeups	accommodated

Shape163
Non-Catholic 250 27.28 36.80 500 0.52 92.08 5.82

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/participation_exclusion_and_accommodation_rates_for_grade_8_geography.aspx

Shape164

NAEP Technical Documentation Participation, Exclusion, and Accommodation Rates for Grade 8 Technology and Engineering Literacy (TEL)

The following table displays the school-level participation rates and student-level participation, exclusion, and accommodation rates for the grade 8 TEL assessment. Various weights were used in the calculation of the rates, as indicated in the column headings of the table.

Participation, exclusion, and accommodation rates for grade 8 TEL national assessment, by geographic region and school type: 2018

		School	School
		participation	participation
		rates (percent)	rates (percent)
		before	before
		substitution	substitution			Weighted
	Number	(weighted by	(weighted by		Weighted	student	Weighted
	of schools	school base	school base	Number of	percent of	participation	percent of
Geographic region	in original	weight	weight	students	students	rates (percent)	students
and school type	sample	and enrollment)	only)	sampled	excluded	after makeups	accommodated

_National_all¹760 86.42 68.80 17,500 1.50 93.16 10.84

Northeast all	130	89.91	57.05	2,800	2.22	91.41	16.89
Midwest all	140	81.23	75.27	3,200	1.10	93.16	9.53
South all	290	88.51	70.21	7,100	1.42	93.63	11.17
West all	190	85.09	69.14	4,400	1.52	93.67	7.36
National public	620	88.79	85.31	16,100	1.63	93.15	11.48
National private	140	58.38	44.72	1,400	0.00	93.38	3.45
Catholic	30	89.82	87.10	700	0.00	94.54	4.50
Non-Catholic	110	36.19	29.89	700	0.00	91.70	2.71
¹National all includes national public, national private, Bureau of Indian Education (BIE), and Department of Defense Education Activity (DoDEA) schools that are located in the United States. School counts are rounded to nearest ten and student counts are rounded to nearest hundred. Detail may not sum to totals because of rounding. SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2018 TEL Assessment.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/participation_exclusion_and_accommodation_rates_for_grade_8_technology_and_engineering_literacy_tel.aspx

Shape165

NAEP Technical Documentation Participation, Exclusion, and Accommodation Rates for Grade 8 U.S. History

The following table displays the school-level participation rates and student-level participation, exclusion, and accommodation rates for the grade 8 U.S. history assessment. Various weights were used in the calculation of the rates, as indicated in the column headings of the table.

Participation, exclusion, and accommodation rates for grade 8 U.S. history national assessment, by geographic region and school type: 2018

		School	School
		participation	participation
		rates (percent)	rates (percent)
		before	before
		substitution	substitution			Weighted
	Number	(weighted by	(weighted by		Weighted	student	Weighted
	of schools	school base	school base	Number of	percent of	participation	percent of
Geographic region	in original	weight	weight	students	students	rates (percent)	students
and school type	sample	and enrollment)	only)	sampled	excluded	after makeups	accommodated

_National_all¹1,130 81.82 73.92 18,800 1.72 92.20 12.05

Northeast all	190	90.33	77.75	3,100	1.98	89.60	14.61
Midwest all	220	71.02	71.86	3,000	1.04	92.92	11.13
South all	440	83.97	77.05	7,800	1.71	92.74	13.43
West all	270	81.88	68.41	5,000	2.14	92.75	8.92
National public	790	84.44	85.88	17,400	1.84	92.21	12.51
National private	330	48.66	51.93	1,400	0.19	92.02	6.15
Catholic	80	85.85	89.89	800	0.09	92.60	4.91
Non-Catholic	250	27.28	36.80	600	0.25	91.09	6.93
¹National all includes national public, national private, Bureau of Indian Education (BIE), and Department of Defense Education Activity (DoDEA) schools that are located in the United States. School counts are rounded to nearest ten and student counts are rounded to nearest hundred. Detail may not sum to totals because of rounding. SOURCE: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics, National Assessment of Educational Progress (NAEP), 2018 U.S. History Assessment.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/participation_exclusion_and_accommodation_rates_for_grade_8_us_history.aspx

Shape166

NAEP Technical Documentation Nonresponse Bias Analyses for the 2018 Assessment

NCES Statistical Standards call for a nonresponse bias analysis to be conducted for a sample with a response rate below 85 percent at any stage of sampling. Weighted school response rates for the 2018 assessment indicated that there was a need for school nonresponse bias analyses for public school and private school samples for grade 8. At the student level no nonresponse bias analyses were necessary since the student-level participation rates for all reporting groups were above the 85 percent participation threshold.

Analyses were conducted for the Social Sciences assessments and Technology and Engineering Literacy (TEL) assessment. For the Social Sciences assessments, the analysis was conducted for both public and private schools with Civics, Geography, and U.S. History combined into one analysis because each sampled school was assessed in all subjects. The analysis for the TEL assessment was conducted for private schools only.

The procedures and results from these analyses are summarized briefly below. The analyses conducted consider only certain characteristics of schools and students. They do not directly consider the effects of the nonresponse on student achievement, the primary focus of NAEP. Thus, these analyses are not conclusive of either the existence or absence of nonresponse bias on student achievement.

Each school-level analysis was conducted in three parts. The first part of the analysis looked for potential nonresponse bias that was introduced through school nonresponse. The second part of the analysis examined the remaining potential for nonresponse bias after accounting for the mitigating effects of substitution. The third part of the analysis examined the remaining potential for nonresponse bias after accounting for the mitigating effects of both school substitution and school-level nonresponse weight adjustments. The characteristics examined were census region, private school type, urbanization classification (four categories based on urban-centric locale), size of school (categorical), size of school (continuous), and race/ethnicity enrollment percentages.

Based on the school characteristics available for the private school samples, there does not appear to be evidence of substantial potential bias resulting from school nonresponse after substitution and non-response adjustment. Please see the NAEP 2018 Non Response Bias Analysis Report for more details.

http://nces.ed.gov/nationsreportcard/tdw/weighting/2018/nonresponse_bias_analyses_for_the_2018_assessment.aspx

Shape167

File Type	application/vnd.openxmlformats-officedocument.wordprocessingml.document
File Modified	0000-00-00
File Created	0000-00-00