2293ssb01_final

2293ssb01_final.doc

National-Scale Activity Survey (N-SAS)

OMB: 2060-0627

Document [doc]

Download: doc | pdf

Supporting Statement for a Request for OMB Review under

the Paperwork Reduction Act

Part B

Survey Objectives, Key Variables, and Other Preliminaries

Survey Objectives

EPA supports the Air Quality Index (AQI), a program that uses data from air quality monitors to forecast pollution levels and to notify the public of health hazards associated with air pollution, primarily ozone and particulate matter pollution (PM). EPA, and specifically the Office of Air Quality Planning and Standards, which manages the AQI program, is interested in assessing the public’s awareness, knowledge and both stated and actual behavioral response to AQI warnings. To address this need, OAQPS wishes to conduct the National-Scale Activity Surveys (N-SAS) to gather information on perceptions, awareness, attitudes, and stated and actual behaviors in response to AQI warnings. The goal of N-SAS is to provide insights into awareness, knowledge and behavior related to the AQI. This study is exploratory in nature and is not intended to be used as a formal program evaluation or to obtain generalizable estimates; however, EPA expects that it will provide some useful insights on the current AQI program.

The N-SAS consists of a series of nine surveys. A screening survey at the beginning and a debriefing survey at the end will provide information on the research participants, their awareness and knowledge of air pollution and the Air Quality Index (AQI), risk perceptions regarding health effects, and reported behaviors on high ozone days. After the screening survey, research participants will be administered a set of seven activity diaries administered on both high and low ozone days to collect information on actual behavior.

The information obtained from N-SAS will be used by EPA to assess hypotheses for the N-SAS research participants regarding

Extent of awareness of and knowledge about the AQI
The effects of the AQI-based warnings on behavior in eight cities with significant pollution problems
The correlation between awareness, knowledge, stated behavior on high pollution days and data on behavior reported in the activity diaries
Differences in behavior, awareness and knowledge among different sub-samples of the research participants

Below, we provide additional detail on the use of N-SAS results by EPA for (1) accountability, education and outreach and (2) exposure analysis.

1. Accountability and Education/Outreach

Information from analysis of questions on awareness and knowledge will be used to

Identify the elements of the AQI message that are understood and not understood among research participants
Identify whether and how understanding of the elements of the AQI message varies between sub-groups of the research participants by age, gender, race, and health condition
Respond to the directive from the Clean Air Act Advisory Committee to evaluate the impact of the AQI on behavior
State/local air quality regulators will be interested in the information gained from this sample and the possible implications for how they communicate with the public about ozone levels and air quality.

Information from the analysis of the activity diary data for changes in behavior will be used to

Quantify the extent to which the research participants change their behavior on high ozone days (Code Red days) including time outdoors, exertion level, and time spent driving a personal vehicle to assess the impact of high ozone days on behavior within the N-SAS sample
Compare behavior change measured in activity diaries with stated behavior change from debriefing survey to assess reliability of stated behavior reports within the N-SAS sample

2. Exposure Analysis

Provide additional activity data for CHAD that is more recent, and fills a gap in information about older adults, for use in creating exposure profiles
Provides multiple observations for each individual, which most of the CHAD data do not, to provide better estimates of within-person variation in time outdoors to improve exposure profiles

B1. Respondent Universe and Sampling Methods

Sampling Frame and Methods for Research Participants

This study will include approximately 1,600 adult in the Knowledge Networks’ panel. The respondents will be adults age 35 and over living in Washington, D.C.; Sacramento (also other cities in San Joaquin Valley -- San Joaquin, Stanislaus, Merced, Madera, Fresno), Chicago, Dallas, Houston, Atlanta, Philadelphia, or St. Louis. These cities were selected from the 25 cities with the worst levels of ozone pollution in 2006.

In addition, we will limit the sample to adults who spend some time being physically active and outdoors. To screen the potential research participants, we will use a question from the BRFSS “During the past month, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?” All else equal, adults who are completely sedentary are at less risk from exposure to ozone than active adults. Based on data from the BRFSS, 30% to 35% of adults age 55 and older did not engage in any physical activity in the last month.

The current Knowledge Networks panel consists of approximately 40,000 adults actively participating in research. The Knowledge Networks panel is recruited through random-digit-dialing (RDD) and is comprised of both Internet and non-Internet households. The panel sample is selected using 1+ directory-listed random digit dialing telephone methodology, providing a probability-based starting sample of U.S. telephone households. The web-enabled panel is comprised of both Internet and non-Internet households; KN supplies the non-Internet households an Internet appliance and Internet connection.

Samples selected from the KN panel for individual studies are selected using probability methods. The sampling approach for this project is designed to increase the demographic similarities between the completed interviews and the U.S. Census population benchmarks for the cities in the study by factoring demographic groups’ estimated response rates into the initial sample draw. Essentially, by oversampling groups that tend to have a lower response and consistency rate and undersampling groups that tend to have a higher response and consistency rate, the completed interviews mirror the Census demographic benchmarks closely. The procedure takes into account an analysis of patterns of response and non-response to KN panel surveys by demographic segments.

1. Select the demographic variables that are important for a given study.

2. Determine the subgroups for each of the variables selected. The breaks used in this step may include: gender, age, race/ethnicity, education, the eight cities, internet access, and household income.

3. Determine the number of cells needed for sample stratification. The number of cells needed is dependent upon the number of demographic variables selected and their breakdowns. In the example shown above, a total of 2048 (2*2*4*4*8*2*2) cells would need to be created.

4. Calculate the percentage of each of the 2048 cells in the final sample. For example, cell 1 (male, age 35-54, white, less than HS, in Chicago, with Internet access, less than 75K) will receive a final percentage by multiplying the percents of male on gender, 35-54 on age, white on race, less than HS on education, Chicago on city, yes on Internet access, and less than 75K on income. Each cell is assigned a percentage in this manner.

5. Multiply the percentage assigned to each cell by the desired sample size. This provides us the number of cases that should be randomly selected for each cell.

6. Round the products to the next closest integers. The cells that are rounded to zero are excluded from the sampling frame, as we must have a minimum value of 1 case in each cell. In the end, we obtained the number of cells and the number of people in the desired sample.

In reality, it is generally not possible to construct 2048 cells for a typical survey due to the limitation of the sample size of a typical survey. The sample size of a given survey needs to be significantly larger than the number of cells we construct to make this approach efficient. Most commonly, a few the above-mentioned demographic variables are selected for stratification and some of the subgroups may be collapsed to reduce the number of cells. We have frequently used a few hundred cells for surveys that require a sample size of a few hundred to a few thousand cases. The stratification process may also need to be repeated several times to determine the optimal number of cells and breakdowns of the demographic variables.

Table B1 presents the expected sample size and response rate for each survey.

Table B1. Expected Sample Size and Response Rate

Survey Component	Description	Number of respondents per survey component	Expected Response Rate

Invitation email	Panel members who receive an initial email invitation to take the survey	3266
Screening survey	Number and percent of panel members receiving an invitation who take screening survey	2286	70%
Baseline	Number and percent of panel members who take the screening survey, qualify for the study and complete baseline survey	1600	70%
Activity Diaries	Number and percent who complete the baseline survey who complete an activity diary	960	60%¹
Debriefing Survey	Number and percent of panel members who took the baseline survey who complete the debriefing survey	1120	70%

¹We expect that 60% of the 1,600 respondents will complete each diary. However, it may not be the same people completing all the diaries. Some people may complete a subset of the diaries.

Sampling Frame and Methods for Activity Diary Days

The activity diaries will be administered on both Code Red days (high ozone days) and non-Code Red days defined using the AQI. To ensure that each diary contains at least 2 Code Red days we will ask research participants to fill out 4 diaries the first time a Code Red day is forecast in the city starting the day before the forecast Code Red day and continuing for the next 3 days. The remaining 3 diaries will be administered when the next Code Red day is forecast. The cities in our sample are all expected to have at least 6 Code Red days this summer, so we are comfortable that our sampling plan will yield enough Code Red day diaries.

EPA and the contractor will work with EPA air quality monitoring staff, Sonoma Technologies (the contractor that provides the forecast information for the AQI and AirNow website), and a panel of monitoring and communications staff from the cities included in the sample. The air quality monitoring staff from EPA, Sonoma Technologies and the local metro areas will provide up-to-date forecasts and information about activities or special events in the cities to determine suitable days for administering the activity diaries in order to obtain a mixture of high and low ozone days.

B2. Procedures for the Collection of Information

Procedures

The design for N-SAS was modeled on previous studies including a 2002 STAR grant lead by Dr. Mansfield, regional surveys on air quality warnings, other time and activity diary studies used for exposure assessment, and other national surveys including the BRFSS, the American Time Use Survey, the National Health Interview Survey (NHIS) and the California Health Interview Survey (CHIS). In the screening survey, the question used to screen for physical activity comes from the BRFSS. Questions about chronic disease status and health are either the same as the NHIS or CHIS. The definition of moderate intensity physical activity was taken from the National Health and (NHANES). Questions on commuting patterns were taken from the American Community Survey.

A number of considerations went into the design of the activity diary. The diary covers a 24 hour time period, which fits with the criteria for inclusion in the Consolidated Human Activity Database (CHAD). The diary follows an “event based” model, rather than a “time based” model. A time based model asks what the research participant was doing during a specific time period. The “event based” diary takes the research participant through their day based on activities and allows the research participant to define the length of time for each activity. Based on our own prior diary studies and advice from EPA exposure staff with significant activity diary experience, the “event based” diaries are easier for research participants to complete.

With regard to the activities, we are most interested in the research participant’s location (outdoors and indoors primarily) and exertion level. To minimize research participant burden, the diaries include a small number of general activity categories designed to fit with existing categories in CHAD. The choices for location and exertion level are based on the needs of the exposure analysis and past research using activity diaries.

As discussed elsewhere, the questions were pretested for understanding and ease of response.

The surveys are self-administered computer surveys that will be administered by Knowledge Networks to the research participants in N-SAS. The screening survey will be administered at the beginning of the summer to recruit the research participants. The survey invitation will be sent to the potential participants and the survey will be left open (available to take) for approximately 10 to 14 days.

The activity diaries for each research participant need to cover both Code Red (high ozone days) and non-Code Red days. EPA and RTI will monitor the ozone and weather forecasts in the cities included in the sample. When the first Code Red day for ozone (high ozone day) is forecast, we will administer the diary surveys the day before the forecast Code Red day, the Code Red day and two days after. When the second Code Red day is forecast, we will administer the diary surveys on the day before the Code Red day, the Code Red day and the day after the Code Red day. Capturing at least two Code Red days for each research participant is an important survey administration objective. However, this objective must be balanced against research participant burden. Based on past experience, we believe that research participants will have an easier time completing several diaries in a row, rather than getting single diaries on unrelated days. Because two or more Code Red days often come together during periods of higher temperatures, we hope to capture more Code Red days by administering before and after a forecast Code Red day.

Two to three days before each set of diaries is administered (based on the forecast for each city), the research participants will receive an email and an automated phone call alerting the research participant that they will be receiving an invitation to the activity diaries in the next 2 to 3 days. The respondents will also receive an automated phone call reminding them that the activity diaries need to be filled out between 24 and 36 hours after the first diary invitation was sent. Finally, a follow-up email will be sent the next day if the research participant has not completed the activity diary. The activity diaries will be left open for 48 hours after the target day, and after 48 hours the respondent will not be able to complete that diary to help prevent recall problems.

The debriefing survey will be administered a few weeks after all the diaries for each city have been sent. The entire sample of research participants will take the debriefing survey at the same time. The debriefing survey will be left open (available to take) for approximately 10 to 14 days.

The web panel and self-administered computer survey method offer a number of distinct advantages in collecting diary data.

Compared to a paper diary, the web diary should produce higher quality data because the computer program guides the research participant through the process of filling out the diary. In addition, the computer survey eliminates the need for data entry.
Compared to a paper diary, we know when the survey was filled out. We will also limit research participants to 48 hours to complete the diary to minimize recall problems.
Compared to both paper and telephone diaries, it is not costly to administer the survey on a particular day. Because we need to target Code Red ozone days and Code Green ozone days, research participants will be sent an email link to the survey the day before we want them to fill it out and send reminders after 24 and 36 hours.

Sample Size and Power Calculations

The target sample size for the baseline survey is 1,600 (800 adults age 35-54 and 800 age 55 and older). It is estimated that this baseline sample will provide an estimated 6,720 activity diaries (we anticipate that 960 respondents will complete each of the 7 diaries). Appendix A presents the formula used to calculate the longitudinal sample sizes. Data from a diary survey that examined hours outdoors on Code Red and non-Code Red days (funded by an EPA STAR grant, Mansfield et al. 2004) provided estimates of the variance in hours outdoors across days within an individual and across individuals in different cities needed for the calculations. The STAR grant survey collected data on children’s activities (age 2-12).

Assuming the response variable of interest is the difference in outdoor hours on Code Red and non-Code Red days, Tables B1 and B2 present sample sizes by minimum detectable difference and number of diaries per type of day (Code Red and non-Code Red) for the mean and median of the distribution of variances in the STAR grant data.

Table B1. Sample Size by Minimum Detectable Difference (Hours) and Number of Days (Replicates) for Each Type of Day, Using the Median of the Distribution of Variances from STAR data

Note: Significance Level = 5% and Power Level = 80%

Minimum Detectable Difference Hours	Number of Days for Each Type of Day (Red or Nonred)
Minimum Detectable Difference Hours	1	2	3	4	5	6	7	8	9	10
0.25	583	529	511	502	496	493	490	488	487	486
0.50	146	132	128	125	124	123	123	122	122	121
0.75	65	59	57	56	55	55	54	54	54	54
1.00	36	33	32	31	31	31	31	31	30	30
1.25	23	21	20	20	20	20	20	20	19	19
1.50	16	15	14	14	14	14	14	14	14	13
1.75	12	11	10	10	10	10	10	10	10	10
2.00	9	8	8	8	8	8	8	8	8	8

Table B2. Sample Size by Minimum Detectable Difference (Hours) and Number of Days (Replicates) for Each AQI Type of Day, Using the Mean of the Distribution of Variances

Note: Significance Level = 5% and Power Level = 80%

Minimum Detectable Difference Hours	Number of Days for Each Type of Day (Red or Nonred)
Minimum Detectable Difference Hours	1	2	3	4	5	6	7	8	9	10
0.25	954	748	680	645	625	611	601	594	588	584
0.50	239	187	170	161	156	153	150	149	147	146
0.75	106	83	76	72	69	68	67	66	65	65
1.00	60	47	42	40	39	38	38	37	37	36
1.25	38	30	27	26	25	24	24	24	24	23
1.50	27	21	19	18	17	17	17	17	16	16
1.75	19	15	14	13	13	12	12	12	12	12
2.00	15	12	11	10	10	10	9	9	9	9

We also plan to compare proportions across subsamples of the N-SAS research participant for differences in awareness of AQI-based warnings, risk perceptions and stated behaviors on high ozone days using questions from the debriefing survey. Table B3 presents the sample sizes needed for comparisons of proportions for different minimum detectable differences. The following equation was used to calculate a sample size, n, that allows detecting a difference between the compared subgroups of size δ, with probability equal to β. δ is also known as the detectable difference, and, 1–β denotes the statistical power. α is the significance level and p is the assumed population proportion.

(B.1)

where Z_α/2 and Z_β are the (1–α/2) and (1–β) quantiles of the standard normal distribution, respectively.

Table B3. Sample Size Needed for Each Group Based on Paired Comparisons with Power Equal to 0.80 by True Proportion and Minimum Detectable Difference

True Proportion	Minimum Detectable Difference
True Proportion	0.05	0.06	0.07	0.08	0.09	0.1	0.11	0.12	0.13	0.14	0.15	0.16	0.17	0.18	0.19	0.2
0.10	566	393	289	221	175	142	117	99	84	73	63	56	49	44	40	36
0.15	801	556	409	313	248	201	166	139	119	103	89	79	70	62	56	51
0.20	1,005	698	513	393	311	252	208	175	149	129	112	99	87	78	70	63
0.25	1,178	818	601	460	364	295	244	205	175	151	131	115	102	91	82	74
0.30	1,319	916	673	516	407	330	273	229	196	169	147	129	115	102	92	83
0.35	1,429	993	729	559	441	358	296	249	212	183	159	140	124	111	99	90
0.40	1,507	1,047	769	589	466	377	312	262	223	193	168	148	131	117	105	95
0.45	1,555	1,080	793	608	480	389	322	270	230	199	173	152	135	120	108	98
0.50	1,570	1,091	801	614	485	393	325	273	233	201	175	154	136	122	109	99

Table B4. shows expected values of precision (half-width of confidence interval) to achieve 90% and 95% confidence in estimates for selected proportions on a sample size of 1600. Precision is determined as precision =

_{Table
B4. Expected Precision Values}

Assumed Proportion	Precision with 95% Confidence for alternative sample sizes	Precision with 90% Confidence for alternative sample sizes
0.10	1.47	1.23
0.15	1.75	1.47
0.20	1.96	1.64
0.25	2.12	1.78
0.30	2.25	1.88
0.35	2.34	1.96
0.40	2.40	2.01
0.45	2.44	2.05
0.50	2.45	2.06

B3. Methods to Maximize Response Rates and Deal with Nonresponse

The plan for administering N-SAS includes a number of methods to maximize response rate and minimize response error including

Pretested survey questions so that questions are easy to understand and correctly interpreted by the respondents
Provide an incentive of $4 per survey for a total of $36 incentive to respondents who complete all 9 surveys
Send a reminder email at least 2 days before the activity diaries are sent
Automated telephone call to respondents to alert them that the activity diary survey is coming
Automated telephone call to respondents to alert them they have an activity diary survey 1 day after the invitation to take the first diary in the set is sent.
Send a reminder email the day after the survey invitation for the activity diary has been sent if the diary has not been completed

For this study, we do not require a representative sample of population in each of the eight cities. However, we would like to assess the degree to which the N-SAS participants are representative of the Knowledge Networks panel in the eight cities. Using available profile data, Knowledge Networks will recruit individuals age 35 and older. However, perspective N-SAS participants must also answer the activity question in the screener. This will give us three main sets of Knowledge Network panelists to compare:

Knowledge Network panelists who qualify for and participate in N-SAS.
Knowledge Network panelists who take the screening survey, but do not qualify based on the activity criteria.
Knowledge Networks panelists who are sent an email invitation to take the survey, but who do not open the email and/or complete the screener.

Nonsampling error can arise in any survey due to factors such as response error and nonreponse error. Response error is being minimized through the careful design of the survey questionnaires, which have been reviewed and pretested. The three main sources of possible nonresponse error for the N-SAS participants include nonresponse error that may have arisen in creating the Knowledge Networks web panel, nonresponse arising from the web panel members invited to take the survey and who decline, and attrition over the course of the nine surveys. Because we do not need a fully representative sample to achieve the goals of N-SAS, we are less concerned with nonresponse error that might exist in the Knowledge Networks panel relative to the U.S. population (a number of studies have examined this issue and additional studies are underway). Our efforts focus on nonresponse arising from the web panel members invited to take the survey and who decline, and attrition over the course of the nine surveys. We are particularly concerned with detecting nonresponse bias that might affect the key variables in our study – hours of outdoor activities and the questions about awareness and risk perceptions related to air pollution in the debriefing survey.

The nonresponse bias tests we plan for the N-SAS study include

Representativeness of the recruited research participants relative to Knowledge Network’s panel. Knowledge Networks will recruit from its probability-sample of pre-recruited households a sample that will volunteer to be part of the diary study. The demographic characteristics of the entire panel are already on file; therefore, a simple comparison of N-SAS research participants with other members of the Knowledge Networks panel can be made. The statistical comparisons will be made among the age-appropriate group for Knowledge Networks’ active panelists in the targeted areas, Knowledge Networks’ active panelists sampled and invited for N-SAS, Knowledge Networks’ active panelists that initially agree to participate in the project (complete the screening and baseline survey), Knowledge Networks’ active panelists that complete the minimum number of interviews required for the study, and Knowledge Networks’ active panelists that complete all the diary surveys. These simple analyses will provide information on the demographic correlates to survey participation.

Representativeness of the sample in terms of chronic illness prevalence. The propensity to agree to be part of N-SAS and responses to the study questions may be conditioned by participants’ current or past history of chronic illness. For instance, individuals with a history of lung problems might be more or less likely to join the panel and have a higher awareness of the AQI. The analysis for demographic representativeness of the N-SAS participants relative to the Knowledge Network’s panel will also be undertaken using Chronic Illness prevalence information already recorded by Knowledge Networks. Knowledge Networks has complete information on the chronic illness state on the research subjects prior to their being invited to be part of the study. Chronic illness information is available for self-reported, doctor-diagnosed chronic illnesses for cardiac disease and lung problems. Therefore, the statistical comparisons will be made to determine whether having a study-relevant chronic illness is predictive of the propensity to be part of the study and, secondly, to participate in all waves of the data collection.

Attrition over the set of activity diaries

In addition to non-response bias created when panel members invited to take the survey declines, we are also concerned about attrition over the series of surveys, particularly the activity diaries. If individuals with certain characteristics related to the amount of time they spend outdoors are more likely to drop out of the project and not complete the activity diaries, our measures of total outdoor time on Code Red and non-Code Red days may be biased. We will use two methods to check for differences between research participants who complete the full set of activity diaries and those that do not.

We will compare demographics and responses to other questions in the baseline and debriefing survey (if taken) between those who complete the full set of activity diaries and those who do not. We will also estimate a survival model (either Poisson or negative binomial) to identify characteristics of the individuals that influence their “survival” on the panel (the number of diaries completed) or the probability that they complete the entire set of activity diaries.

B4. Tests of Procedures or Methods to be Undertaken

To ensure the feasibility of the proposed data collection, the surveys have been pretested using cognitive interviews and reviewed by experts as described more fully in Section A8. The survey instruments have been revised to address issues that arose in these processes. The cognitive interviews identified areas of the survey where the language and ordering of questions needed to be improved; however, overall the research participants had very little difficulty in understanding or responding to the survey questions. The instruments were also reviewed by survey methodologists, content experts and Knowledge Networks’ staff.

B5. Individuals Consulted on Statistical Aspects and Individuals Collecting and/or Analyzing Data

The agency official responsible for receiving and approving contract deliverables is:

Zachary Pekar

(919)541-3704

[email protected]

U.S. Environmental Protection Agency

Office of Air Quality Planning and Standards

Health and Environmental Impacts Division

Mail Code C504-06

Research Triangle Park, NC 27711

The person who supported EPA in the design of the data collection is:

Carol Mansfield, Ph.D.

919-541-8053

[email protected]

RTI International

3040 Cornwallis Road

Research Triangle Park, NC 27709

The person who will collect the data is:

J. Michael Dennis, Ph.D.

650-289-2000

[email protected]

Knowledge Networks, Inc.

1350 Willow Road, Suite 102

Menlo Park, CA 94025

The person who will lead the analysis of the data is:

Carol Mansfield, Ph.D.

919-541-8053

[email protected]

RTI International

3040 Cornwallis Road

Research Triangle Park, NC 27709

Appendix A: Sample Size and Power Calculations for Activity Diary Sample

Objective: Determine if there are differences in hours spent outdoors on high and low ozone days (Code Red and non-Code Red days). Each person in the sample needs to have observations for both Code Red and non-Code Red days. In this case, the response of interest is the difference:

Response = (hours outdoors on Code Red days) – (hours outdoors on non-Code Red days)

If this is the objective of interest, then it would be desirable to have at least two repeated days from the same type to provide some measure of variability for each type of day. For each person, calculate the mean of hours outdoors on Code Red days and the mean of hours outdoor on non-Code Red days, and the difference between these two means will be the response defined above. However, the formula below will work for other response variables.

Formula for sample size:

where

i = 1,2 (comparing two cities)

j = 1,…n individual within city

k = Code Red or non-Code Red

l = 1,2,..,t (number of days per type of day – Code Red and non-Code Red)

Z_α/2 and Z_β = (1–α/2) and (1–β) quantiles of the standard normal distribution.

δ = Minimum detectable difference

σ_IC = Variance in outdoor hours between cities

σ_ = Variance in outdoor hours between persons

File Type	application/msword
File Title	Supporting Statement for a Request for OMB Review under
Author	George Van Houtven
Last Modified By	zachp
File Modified	2009-04-22
File Created	2009-04-22