APPENDIX J 2021 NSCG Methodological Experiments – Minimum Detectible Differences

Appendix_J_MinDetectibleDifferences.pdf

National Survey of College Graduates (NSCG)

APPENDIX J 2021 NSCG Methodological Experiments – Minimum Detectible Differences

OMB: 3145-0141

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 3145-0141 can be found here:
Document [pdf]
Download: pdf | pdf
APPENDIX J
2021 NSCG Methodological Experiments –
Minimum Detectible Differences

J-1

Minimum Detectable Differences for the
2021 NSCG Methodological Experiments
I. Background
This appendix provides minimum detectable differences for the proposed sample sizes in each of
the 2021 NSCG methodological experiments.
Adaptive Design Experiment:
• New sample treatment group – 8,000 cases
• Returning sample treatment group – 10,000 cases
Prenotice Experiment:
• New sample treatment group – 10,000 cases
• Returning sample treatment group – 10,000 cases
II. Minimum Detectable Differences Equation and Definitions
To calculate the minimum detectable difference between two response rates with fixed sample
sizes, we used the formula from Snedecor and Cochran (1989) for determining the sample size
when comparing two proportions.


 p (1 − p1 ) p 2 (1 − p 2 )  
 D 
+
δ ≥  (Z α */ 2 + Z β )2  1

n1
n2
 



1/ 2

where:

δ
α*

= minimum detectible difference
= alpha level adjusted for multiple comparisons
= critical value for set alpha level assuming a two-sided test

Zα*/2
Zβ = critical value for set beta level
p1 = proportion for group 1
p2 = proportion for group 2
D
n1
n2

= design effect due to unequal weighting
= sample size for a single treatment group or control
= sample size for a second treatment group or control

The alpha level of 0.10 was used in the calculations. The beta level was included in the formula
to inflate the sample size to decrease the probability of committing a type II error. The beta level
was set to 0.10.
The estimated proportion for the groups was set to 0.50 for the sample size calculations. This
conservative approach minimizes the ability to detect statistically significant differences.
J-0

Design effects represent a variance inflation factor due to sample design when compared to a
simple random sample. Because all experiment samples and the control will be representative,
the weight distributions should be similar throughout all samples, negating the need to include a
design effect. We do not expect to see a weight-based or sampling-based effect on response in
any of the samples. However, for the sake of completeness, minimum detectible differences
were calculated both ways, including and ignoring the design effect. 1
III. Pairwise Comparisons and the Bonferroni Adjustment
The number of pairwise comparisons included in the adaptive design experiment evaluation is
one (treatment vs. control). For the contact strategies experiment, the number of pairwise
comparisons increases because numerous treatment groups can be compared as discussed in the
research questions listed in Appendix I. In these instances, α* is adjusted to account for the
multiple comparisons.
The Bonferroni adjustment reduces the overall α by the number of pairwise comparisons so
when multiple pairwise comparisons are conducted the overall α will not suffer. The formula is:

α* =

α

nc

The adjusted alpha α* is calculated by dividing the overall target α by the number of pairwise
comparisons, nc. It is worth noting that, despite being commonly used, the Bonferroni
adjustment is very conservative, actually reducing the overall α below initial targets. An
example showing how the overall α is calculated using an alpha level of 0.10, the Bonferroni
adjustment, and 11 pairwise comparisons follows:

α overall = 1 − (1 − α *) n

c

α overall = 1 − (1 − 0.009)11 = 0.095 < 0.100
αoverall is the resulting overall α after the Bonferroni correction is applied;
αtarget = 0.100, and is the original target α level;
nc = 11, and is the number of comparisons (i.e., Appendix I research questions)
α* = αoverall/ nc = 0.009, and is the Bonferroni-adjusted α
In this example, the Bonferroni adjustment overcompensates for multiple comparisons, making it
more likely that a truly significant effect will be overlooked.
1

Design effects were calculated by examining the weight variation present in all cases in the 2017 NSCG
new sample (5.42), and the returning sample (4.65).

J-1

Sample sizes were provided in section I of this appendix and are used in the formula. All
minimum detectable differences using the Bonferroni adjustment were calculated and are
summarized at the end of this appendix in table form.
IV. A Model-Based Alternative to Multiple Comparisons
Rather than relying on the Bonferroni adjustment for multiple comparisons, effects on response,
cost per case or other outcome variables could be modeled simultaneously to determine which
treatments have a significant effect on response.
All sample cases, auxiliary sample data, and treatments are included in the model below, which
predicts a given treatment’s effect on response rate (or other outcome variable).

  
y = β 0 + β 1 Ι + αX + ε
Assuming response rate is the outcome variable of interest:
y is the average response propensity (response rate) for the entire sample;

β0

β1

is the intercept for the model;

is a vector of effects, one for each treatment;

Ι is a vector of indicators to identify a treatment in



α is a scalar vector;


β1 ;


X is a matrix of auxiliary frame or sample data; and
ε is an error term.
Once data collection is complete, the average response propensity is equal to the response rate.
In the simplest case, no treatment has any effect (the 2nd term would drop out), and no auxiliary
variables explain any of the variation in response propensities (the 3rd term would drop out). In
that case, the average of the response propensities, and thus the response rate, would just equal:

y = β0 + ε
However, a more complicated model gives information about each treatment’s effect (2nd term)
while taking into account sample characteristics (3rd term) that might augment or reduce the
effect of a given treatment.
As a simple example, ignore the error term, and assume the overall mean response propensity
was 72%. Also, assume the mean response propensity for a given treatment group was 83%. If
J-2

only terms 1 and 2 were included in the model (no sample characteristics accounted for), the
given treatment appears to have increased the response propensity by 11%. However, if the
sample was poorly designed, or if a variable not included in the sample design turned out to be a
good predictor of response, there is value in adding the 3rd term. If auxiliary information added
by the 3rd term shows that the cases in a particular sample group are 5% more likely to respond
than the average sample case (because of income, internet penetration, age, etc.), this would
suggest that while the treatment group had a response propensity 11% higher than the average,
5% came from sample person characteristics, and only 6% of that increase was really due to the
treatment.
This method has several benefits over the multiple comparisons method. First, the number of
degrees of freedom taken up by the model is the number of treatment groups plus one for the
intercept, which is fewer than the number of pairwise comparisons that might be conducted.
Second, because confidence intervals are calculated around the


β1

values, it is easier to observe

a treatment’s effect on the outcome measures. Third, variables can be controlled for in the
model, making significant results more meaningful. While we are striving to ensure the
experimental samples are as representative (and as similar) as possible, the ability to add other
variables to the model helps control for unintended effects.
The method uses response propensities, not the actual response rate. While the mean response
propensity after the last day of data collection equals the overall response rate, it is important to
note how the propensity models are built. If they are weighted models, weighted response
propensities should be used in this model. The weights could be added as one of the auxiliary

variables included in the X matrix.
V. Comments
It is worth noting from the calculations below that even using the Bonferroni adjustment, and
conducting all pairwise comparisons, a difference of 6% - 8% in outcome measures should be
large enough to appear significant, when the design effect is excluded from the calculations.
Because the experimental samples are all systematic random samples, and should have similar
sample characteristics and weight distributions, excluding the design effect seems appropriate.

J-3

Minimum Detectable Differences for the 2021 NSCG Methodological Experiments
Minimum Detectable Difference Equation for Response Rates

δ
∗δ
α*
Zα*/2
Zβ
p1
p2
deff
n1
n2

Adaptive Design Experiment (new sample)
8,000 Cases in Experimental Group
=
0.100
α*
=
1.645
Zα*/2
Zβ

=

1.282

δ=

0.0568

p1

=

0.5

∗δ =

0.0231

p2
deff
n1
n2

=
=
=
=

0.5
6.02
8,000
30,000

Adaptive Design Experiment (returning sample)
10,000 Cases in Experimental Group
=
0.100
α*
=
1.645
Zα*/2
Zβ

=

1.282

δ=

0.0460

p1

=

0.5

∗δ =

0.0207

p2
deff
n1
n2

=
=
=
=

0.5
4.95
10,000
49,000

J-4

=
=
=
=
=
=
=
=
=
=

minimum detectible difference
minimum detectible difference without using design effect
alpha level adjusted for multiple comparisons (Bonferroni)
critical value for set alpha level assuming a two-sided test
critical value for set beta level
proportion for group 1
proportion for group 2
design effect due to unequal weighting
sample size for group 1
sample size for group 2

Prenotice Experiment (new sample)
10,000 cases in Experimental Group
=
0.100
α*
=
1.6449
Zα*/2
=
1.282
0.0353
Zβ
δ=
p1
=
0.5
0.0164
*δ =
p2
=
0.5
deff
=
4.65
n1
=
10,000
n2
=
40,000
Prenotice Experiment (returning sample)
10,000 cases in Experimental Group
=
0.100
α*
=
1.6449
Zα*/2
=
1.282
0.0373
Zβ
δ=
p1
=
0.5
0.0160
*δ =
p2
=
0.5
deff
=
5.42
n1
=
10,000
n2
=
50,000

J-5
File Type	application/pdf
Author	Milan, Lynn M.
File Modified	2020-09-24
File Created	2020-09-24