Sample Size Justification

P_Att04_JustificationSampleSizeCalcs 20180802.docx

Human Health Effects of Drinking Water Exposures to Per- and Polyfluoroalkyl Substances (PFAS) at Pease International Tradeport, Portsmouth, NH (The Pease Study)

Sample Size Justification

OMB: 0923-0061

Document [docx]

Download: docx | pdf

Attachment 4 – Justification for Sample Size Calculations

Sample size calculations were conducted using OpenEpi Version 3.03 (Dean AG, Sullivan KM, Soe MM. OpenEpi: Open Source Epidemiologic Statistics for Public Health, www.OpenEpi.com, updated 2014/09/22). For some health-related endpoints, calculations could not be conducted because of a lack of information in the studies on the parameters needed to make the calculations.

Sample size calculation for mean difference:

Where N₂/N₁ is the ratio of the two sample sizes. Then N₂ is simply this ratio multiplied by N₁. For a type 1 error (or α error) of .05, the Z_1-α/2 value is 1.96. This calculation is for a two-tailed hypothesis test and equivalent to using a 95% confidence interval to determine statistical significance. For a one-tail test with α =.05, the Z_1-α/2 in the above equation is replaced by Z_1-α and its value is 1.65, equivalent to using a 90% confidence interval to determine statistical significance. The Z_1-β in the above equation is the Z value for the selected power. For 80% power, Z_1-β = 0.84, for 90% power, Z_1-β = 1.28, and for 95% power, Z_1-β = 1.65. (See Rosner B. Fundamentals of Biostatistics, 7^th Edition, equation 8.27, p. 302).

The sample size calculations for odds ratios, risk ratios, etc. are as follows:

The sample size formula without the correction factor by Fleiss is:

For the Fleiss method with the correction factor, take the sample size from the uncorrected sample size formula and place into the following formula:

When the input is provided as an odds ratio (OR) rather than the proportion of exposed with disease, the proportion of exposed with disease is calculated as:

When the input is provided as a risk (or prevalence) ratio (RR) rather than the proportion of exposed with disease, the proportion of exposed with disease is calculated as:

Fleiss JL. Statistical Methods for Rates and Proportions. John Wiley & Sons, 1981.

Note: In some studies, the standard deviation is not presented but instead, the interquartile range (IQR) is given. Assuming a normal distribution for the outcome under evaluation (e.g., thyroid function measures), the standard deviation can be calculated by dividing the IQR range by 1.35. However if the outcome is not normally distributed, this formula could underestimate the standard deviation. In particular, if the outcome under evaluation has been log-transformed presumably to achieve a normal distribution, the untransformed outcome is unlikely to have a normal distribution. Therefore, using this formula when the outcome does not have a normal distribution may underestimate the SD by as much as 20% according to simulations conducted in Wan X. 2014. A higher SD would increase the sample size requirement.

Attachment 4a. Sample Size for Child Study

The following notes provide comments and information on the parameters (e.g., standard deviation, disease prevalence) used in the sample size calculations provided in Table 1 for the children study.

Sample size calculations were conducted with type 1 (“α error”) set at .05 and type 2 error (“β error) set at .20. Sample sizes per stratum group were calculated. It was considered important that a study have a total sample size so that exposures could be categorized into tertiles (i.e., reference level, medium level, and high level) or quartiles (i.e., reference level, low, medium and high).

Studies were selected that were considered the most representative of U.S. populations exposed via drinking water to PFOA, PFOS and/or PFHxS as a result of the migration of these PFAS chemicals into ground water or surface water sources from the use of aqueous film forming foam (AFFF). The PFAS serum results from the Pease International Tradeport testing program were used as representative PFAS serum levels.

Studies conducted using NHANES data had PFOA and PFHxS serum levels similar to or lower than those observed at Pease. Therefore the PFOS, PFOA and PFHxS results in the NHANES studies were used in many of the sample size calculations. For those outcomes not included in NHANES studies, the C8 studies were used. Where applicable studies from Taiwan or other major industrialized countries were also used.

Examples:

Lipids

In the C8 study (Frisbee 2010), the mean total cholesterol level in the study population was 160.7 mg/dL and the standard deviation (SD) was 29.3. The sample size calculations assumed the same SD in the Pease children and the unexposed group. For hypercholesterolemia (total cholesterol ≥ 170 mg/dL), the prevalence in the C8 study was 34.2%.

Uric Acid

In the NHANES study (Geiger 2013), the mean uric acid level in the study population was 5.07 mg/dL with a SD of 1.19. The sample size calculations assumed the same SD in the Pease children and the unexposed group. The prevalence of hyperuricemia (uric acid ≥ 6 mg/dL) in the NHANES study was 16%.

Kidney Function

The mean estimated glomerular filtration rate (eGFR) in the C8 study of children and adolescents (Watkins 2013) was 133 mL/min/1.73 m² with a SD of 23.9. The sample size calculations assumed the same SD in the Pease children and the unexposed group.

Attention Deficit/Hyperactivity Disorder (ADHD)

In the C8 study (Stein 2011), the prevalence of participant-reported ADHD was 12.4% and the prevalence for participant-reported + used medications for ADHD was 5.1%. Sample size calculations used the 12.4% prevalence.

Hypersensitivity-related Outcomes

From an NHANES study (Stein 2016), the prevalence of current asthma and rhinitis among those aged 12-19 were 10.9% and 25.6%, respectively. For atopic dermatitis, the prevalence for children and adolescents (ages 5-17) is about 12% based on data from the National Health Interview Survey.

Sex hormones and Insulin-like growth factor – 1 (IGF-1)

C8 study of children (Lopez-Espinosa 2016)

a. Testosterone

For PFOS, there was a -6.6% difference in the natural log testosterone among girls (per interquartile range of the natural log of PFOS). Among girls, the median testosterone level was 15 ng/dL with an IQR of <LOD, 21 and the LOD of 10 ng/dL. For the sample size calculation of mean difference, the standard deviation was assumed to be equal for the exposed and unexposed groups and equal to 11.85. (Assuming LOD/2 was the lower limit of the IQR, the range = 21 - 5 = 16. Assuming a normal distribution, dividing 16 by 1.35 converts the IQR to a standard deviation, which equaled 11.85)¹. To obtain the mean difference, the median testosterone level (15 ng/dL) was assumed to be the reference level (i.e., the level among the unexposed). The natural log of the median equals 2.71. A 6.6% decrease equals 2.53. Exponentiating 2.53 equals 12.55. The mean difference is then 15 – 12.55 = 2.45.

Assuming a 95% CI and 80% power, the sample size = 368/group; for a ratio of 2, the sample sizes = 552 and 276.

b. IGF-1

For PFHxS, there was a -2.5% difference in the natural log IGF-1 among boys (per interquartile range of the natural log of PFHxS). Among boys, the median IGF-1 level was 147 ng/mL with an IQR of 116, 187. For the sample size calculation of mean difference, the standard deviation was assumed to be equal for the exposed and unexposed groups and equal to 52.6. (The IQR range was 187 – 116 = 71. Assuming a normal distribution, dividing 71 by 1.35 converts the IQR to a standard deviation, which equaled 52.6)¹. To obtain the mean difference, the median IGF-1 level (147 ng/mL) was assumed to be the reference level (i.e., the level among the unexposed). The natural log of the median equals 4.99. A 2.5% decrease equals 4.865. Exponentiating 4.865 equals 129.7. The mean difference is then 147 – 129.7 = 17.3.

Assuming a 95% CI and 80% power, the sample size = 146/group; for a ratio of 2, the sample sizes = 218 and 109.

For PFOS, there was a -5.9% difference in the natural log IGF-1 among boys (per interquartile range of the natural log of PFOS). This would require considerably smaller sample sizes for IGF-1 than those for PFHxS.

Thyroid function – Children/Adolescents

Taiwan study of children. (Lin 2013)

a. For males aged 12-19, there was a mean difference in the log TSH of -.50 mIU/L for PFOA levels in the 90^th percentile (>9.71 ng/ml) compared to the reference level of PFOA exposure. The standard error for the reference group was 0.26 with N=32 in this group; and the standard error for the 90^th percentile group was 0.33 with N=6. The standard deviations for the reference and 90^th percentile groups were therefore 1.47 and 0.81, respectively.

Assuming a 95% CI and 80% power, the sample size = 89/group; for a ratio of 2, the sample sizes = 158 and 79.

b. For females aged 12-19, there was a mean difference in the log TSH of -.35 mIU/L for PFOA levels in the 90^th percentile (>9.71 ng/ml) compared to the reference level of PFOA exposure. The standard error for the reference group was 0.18 with N=71 and the standard error for the 90^th percentile group was 0.24 with N=14. The standard deviations for the reference and 90^th percentile groups were therefore 1.52 and 0.90, respectively.

Assuming a 95% CI and 80% power, the sample size = 200/group; for a ratio of 2, the sample sizes = 348 and 174.

Additional notes:

Sample sizes for the categorical outcomes in Table 1 were based on the following prevalence in children (also listed in Appendix C Table 1):

Hypercholesterolemia: 34.2%

Hyperuricemia: 16%

Thyroid disease: 0.6%

ADHD 12.4% reported only; 5.1% reported with additional reporting on medications used for ADHD

Asthma: 11%

Rhinitis: 25.6%

Atopic dermatitis: 10.7%

Hypertension: 23.4%

Obesity: 17%

Table 1. Minimum detectable effects for a Pease children study with 350 exposed and 175 unexposed.

Endpoint	α = .05, β=.20
Mean difference:
Total cholesterol	9.8 mg/dL
Uric acid	0.40 mg/dL
eGFR	9.3 mL/min/1.73 m²
IGF – 1	22.5 ng/mL
TT₄	0.37 µg/dL*
Wechsler Full Scale IQ	3.4 points*

Odds Ratio (OR)
Hypercholesterolemia	OR = 2.00
Hypertension	OR = 2.12
Overweight/Obese	OR = 2.18
Hyperuricemia	OR = 2.30
ADHD^¶	OR = 2.47
Asthma	OR = 2.56
Atopic dermatitis	OR = 2.49

^¶ eGFR –estimated glomerular filtration rate, TT4 – total thyroxine; IGF-1 – insulin-like growth factor 1; ADHD – attention-deficit and hyperactivity disorder.

Attachment 4b. Adult Study:

The following provides information on the parameters and sample size calculation used in Table 2 for the adult study.

Sample size calculations were conducted with type 1 (“α error”) set at .05 and type 2 error (“β error) set at .20. It was considered important that a study have a total sample size so that exposures could be categorized into tertiles (i.e., reference level, medium level, and high level) or preferably into quartiles (i.e., reference level, low, medium and high).

Studies conducted using NHANES data had PFOA and PFHxS serum levels similar to or lower than those observed at Pease. In some of the more recent NHANES studies, the PFOS serum levels were only moderately higher than at Pease. Therefore the PFOS, PFOA and PFHxS results in the NHANES studies were used in many of the sample size calculations. For those outcomes not included in NHANES studies, the C8 studies were used.

Example:

Liver Function – Adults

In the C8 study (Darrow 2016), the mean alanine aminotransferase (ALT) level was 26 IU/L and the standard deviation was 19. The linear regression coefficient for the natural log ALT in the fifth quintile level of cumulative natural log PFOA was 0.058. Assuming that the reference group had an ALT level equal to the mean, the natural log of the mean ALT would be 3.26. Therefore the natural log of ALT for the fifth quintile cumulative log PFOA would be 3.32. Exponentiating 3.32 equals 27.6. The mean difference in the untransformed ALT is then 1.6.

Assuming a 95% CI and 80% power, the sample size = 2,214/group.

In the C8 study (Gallo 2012), the linear regression on the natural log of ALT resulted in a regression coefficient for the natural log PFOS of 0.029. The top quintile of PFOS level in the Pease adult population was about 15 ng/mL. The natural log of 15 is 2.71; multiplying by 0.029 results in a natural log ALT increase of 0.08. From the graph in the article, the reference level of ALT is about 21.3 IU/L. The natural log of 21.3 is 3.06. Adding 0.08 to 3.06 equals 3.14, and exponentiating 3.14 equals 23.1. Therefore the mean difference is 23.1 – 21.3 which equals 1.8.

The ALT standard deviation for the entire population was 20.1, and it was assumed that this was the standard deviation for each quintile PFOS.

Assuming a 95% CI and 80% power, the sample size = 1,958/group.

Thyroid Function – Adults (not included in Table 2)

In a study done by Shrestha 2015, the sample size was 87 adults aged 55-74. Mean and SD for TSH was 2.58 µIU/mL and 1.47, respectively. The linear regression of the natural log TSH resulted in a coefficient for the natural log PFOS of 0.129. Using a PFOS level of 15 ng/mL, the natural log of 15 is 2.71; multiplied by 0.129 equals 0.35. The reference level TSH was assumed to be the median TSH of 2.15 µIU/mL. The natural log of 2.15 is 0.77; adding 0.35 equals 1.12. Exponentiating 1.12 equals 3.06. The mean difference is then 3.06 – 2.15 = 0.91. The standard deviation of 1.47 was used for each group.

Assuming a 95% CI and 80% power, the sample size = 41/group.

Assuming a 95% CI and 95% power, the sample size = 68/group.

a. TSH

In Ji 2012, the sample size was 633, ≥12 years of age and the median TSH level was 1.37 µIU/mL with an IQR of 0.90, 2.01. The standard deviation was estimated as the IQR range divided by 1.35: (2.01 - .90)/1.35 = 0.82. This standard deviation was assumed for each group. For TSH, the linear regression coefficients for PFOS and PFHxS were 0.062 and 0.013, respectively. Using a PFOS level of 15 ng/mL and a PFHxS level of 9 ng/mL, the mean difference for PFOS and PFHxS are 0.93 and 0.12, respectively.

Assuming a 95% CI and 80% power, the sample size = 13/group for PFOS

Assuming a 95% CI and 80% power, the sample size = 733/group for PFHxS

b. TT₄ (total thyroxine)

In Ji 2012, the sample size was 633, ≥12 years of age and the median TT₄ level was 7.4 µg/dL and the IQR was 6.7, 8.1. The standard deviation was estimated: (8.1 – 6.7)/1.35 = 1.04. This standard deviation was assumed for each group. For TT₄, the linear regression coefficients for PFOS and PFHxS were -0.021 and -0.007, respectively. Using a PFOS level of 15 ng/mL and a PFHxS level of 9 ng/mL, the mean difference for PFOS and PFHxS are -0.32 and -0.06, respectively.

Assuming a 95% CI and 80% power, the sample size = 166/group for PFOS

Assuming a 95% CI and 80% power, the sample size = 4,716/group for PFHxS

Sample sizes for the categorical outcomes in Table 2 were based on the following prevalences in adults:

Hypercholesterolemia: 15%

Hyperuricemia: 24%

Thyroid disease: 6.5% (reported and confirmed by medical records); 11.5% (reported only)

Elevated ALT: 11.2%

Elevated GGT: 14%

Elevated bilirubin: 1.1%

Osteoporosis: 5%

Osteoarthritis: 7.6%

Cardiovascular disease: 13%

Ulcerative colitis: 0.5%

Rheumatoid arthritis: 1.2%

Health related Endpoints Not shown in Table 2:

Chronic kidney disease: 1.4%

Liver disease: 2%

Hypertension: 37%

Pregnancy-induced hypertension: 8.5%

Endometriosis: 7%

Lupus: 0.2%

Multiple sclerosis: 0.32%

Kidney cancer: 0.3%

Table 2 presents alternative sample size calculations assuming 1,000 adults from Pease and 100 referent adults will participate in the study. The calculations also assume a type 1 error of .05 and a type 2 error of .20. There is sufficient power to detect mean differences in total cholesterol, uric acid, and ALT based on estimated from other epidemiological studies. For categorical outcomes such as hyperuricemia and hypertension, ORs below 2.0 are detectable. For health outcomes such as hypercholesterolemia, thyroid disease, elevated liver enzymes, cardiovascular disease and osteoarthritis, ORs between 2.0 and 3.0 are detectable in agreement with odds ratios reported in other well designed environmental health studies. In addition, several epidemiological studies of adults exposed to PFAS that reported robust statistical associations with these health outcomes had similar or smaller sample sizes, e.g., NHANES studies (Nelson 2010, Wen 2013), a C8 longitudinal study (Fitz-Simon 2013), a C8 immune study (Looker 2014), and studies in China (Fu 2014) and Korea (Ji 2012).

Table 2. Minimum detectable effects for a Pease adult study of 1,000 exposed and 100 referents.

Endpoint	α = .05, β=.20
Mean difference in:
Total cholesterol	12.4 mg/dL
Uric acid	0.46 mg/dL
ALT	5.92 IU/L
Uric acid (mean difference)	0.46 mg/dL

Odd Ratios (OR)
Hyperuricemia	OR=1.96
Elevated GGT (>55 IU/L, men; >38 IU/L, women)	OR=2.26
Hypercholesterolemia	OR=2.21
Thyroid disease	OR=3.03
Elevated ALT (>45 IU/L, men; >34 IU/L, women)	OR=2.43
Cardiovascular disease	OR=2.31
Hypertension	OR=1.83
Thyroid disease	OR=3.03
Osteoarthritis	OR=2.83
Osteoporosis	OR=3.44

Rheumatoid arthritis	OR=4.65
Ulcerative colitis	OR= 8.0

File Type	application/vnd.openxmlformats-officedocument.wordprocessingml.document
Author	Stephanie Davis
File Modified	0000-00-00
File Created	2021-10-07