PATH Study SSB 8-14-13 - clean v4

PATH Study SSB 8-14-13 - clean v4.doc

Population Assessment of Tobacco and Health (PATH) Study (NIDA)

OMB: 0925-0664

Document [doc]
Download: doc | pdf

Supporting Statement B
for
Population Assessment of
Tobacco and Health (PATH) Study (NIDA)





August 14, 2013


















Submitted by:

Kevin P. Conway, Ph.D.

Deputy Director

Division of Epidemiology, Services, and Prevention Research

National Institute on Drug Abuse

6001 Executive Blvd., Room 5185

Rockville, MD 20852

Phone: 301-443-8755

Email: [email protected]


Table of Contents

Section Page


B. Collections of Information Employing Statistical Methods 1


B.1 Respondent Universe and Sampling Methods 1

B.2 Procedures for the Collection of Information 14

B.3 Methods to Maximize Response Rates and Deal with Nonresponse 36

B.4 Test of Procedures or Methods to be Undertaken 41

B.5 Individuals Consulted on Statistical Aspects and Individuals Collecting and/or Analyzing Data 45


References 46



LIST OF ATTACHMENTS

(Appearance in Supporting Statement B)


Attachment 2. Field Test Report 41, 44, 45


Attachment 3. PATH Study Data Collection Instruments 16, 20


Attachment 9. Follow-up, Retention, and Tracking Materials 17, 18


Attachment 12. Advance Materials 36


Attachment 13. Consent Materials 17, 18, 41


Attachment 20. Field Data Collection Materials 17, 18, 37


Attachment 22 Precision Calculations for Blood Sample under Projected and Worst-case Scenarios. 12, 33


Attachment 23. List of Statistical Consultants 45



B. Collections of Information Employing Statistical Methods

The following section focuses on a description of the statistical methods planned for the PATH Study. Section B.1 describes the target population of the PATH Study as well as the respondent universe and the desired sample composition by various age, tobacco-use, and race-ethnic subgroups. It includes tables summarizing the number of persons in the universe and the expected sample composition. An overview of the sampling frame and sample design is also provided. The section ends with a description of the PATH Study’s expected response rates. Section B.2 describes the procedures for collecting PATH Study data. Weighting and estimation procedures are presented, followed by an elaboration of the degree of precision expected for the analyses of various domains of interest. Section B.3 describes procedures for maximizing the participation and retention of the PATH Study respondents. Section B.4 presents details for the field testing of the PATH Study data collection procedures and operations. Lastly, Section B.5 presents a list of statistical consultants for the PATH Study.



B.1 Respondent Universe and Sampling Methods

B.1a Target Population

The target population of the PATH Study is the civilian household population 18 years of age or older in the U.S. (the 50 states and the District of Columbia), and youth ages 12 to 17. College students will be sampled at their permanent residence rather than at their dormitory as described later in this document. Active-duty members of the military (Army, Navy, Marines, Air Force, and Coast Guard) will be excluded, as will all persons living in group quarters other than college dormitories. The exclusion applies to both institutional and noninstitutional group quarters. Spouses and children of active-duty military living off post in the 50 states and D.C. will be covered.


Consideration was given to sampling other noninstitutional group quarters such as group homes, half-way houses, and shelters. However, important factors weighed against their inclusion: (1) a limited ability to analyze these groups separately given small estimated sample sizes; and (2) the high mobility among persons in such dwellings would lead to high attrition, thereby reducing the information to be gained from this longitudinal cohort study.



B.1b Respondent Universe and Estimated Sample Composition

One component of the PATH Study sample design is the selection of a “shadow” sample of 9 to 11 year-olds at baseline (see Section B.1d). Sampled children in this age range are not interviewed until they enter the youth cohort in later waves of the study on reaching 12 years of age. However, for completeness, the estimated respondent universe and sample size of 9 to 11 year-olds are shown in the first row of Table 1a.


Estimates of the PATH Study youth respondent universe and estimated respondent sample size are shown in the second row of Table 1a. Under the planned sample design, the estimated number of completed interviews with youth ages 12 to 17 at baseline is approximately 16,186. The estimates in the first two rows of the table are based on data from the 2010 Census and 2010 American Community Survey (ACS).


Estimates of the PATH Study adult respondent universe are also shown in Table 1b, which presents the number of persons in specific age, tobacco usage, and race domains derived from population projections. There are varying definitions of “tobacco user.” Table 1b presents estimated sample sizes for each of three definitions of interest for PATH. The first, called the “wide net” definition, classifies a person as a tobacco user if he or she has smoked a cigarette, cigar, or pipe or used smokeless tobacco in the last 30 days; or has ever used an e-cigarette, dissolvable tobacco, or smoked tobacco in a hookah. This “wide net” is intended to capture adults who have had experience with tobacco products and who may be at risk of progressing to more frequent use. A “current user” of tobacco is anyone who (1) has smoked at least 100 cigarettes in their lifetime and smokes cigarettes every day or some days, and/or (2) smokes cigars/cigarillos/pipe and/or uses smokeless tobacco every day or some days, and/or (3) uses e-cigarettes, hookah tobacco, snus, or dissolvable tobacco every day or some days1. Finally, a “current or experimental user” of tobacco is either (a) anyone who is a “current user” or (b) anyone who has used any of these tobacco products in the past month but not every day or some days.


The tobacco use rates from the 2006 and 2007 Current Population Survey-Tobacco Use Supplement were originally used as the basis for estimating the number of tobacco users and non-users overall and in various subgroups. At that time, these data were seen as the best available source for this purpose. Although the CPS-TUS data do not cover the use of new products or experimental use of products that may be of scientific interest for PATH, it was originally assumed that these uses were rare. The CPS-TUS smoking rates are also recognized as lower than those derived from the National Survey of Drug Use and Health (NSDUH). One of the purposes of the field test was to develop data about rates of tobacco use based on the PATH questionnaire and definitions. Experience from the field test showed that the PATH questioning approach yielded current use rates that are similar to the NSDUH rates and also that a substantial fraction of respondents were experimental tobacco product users who did not meet the more stringent CPS-TUS definition of tobacco user. These experimental users are of particular interest to the PATH Study because they often represent early indicators of changes in tobacco use patterns. As a result of the field test experience, the tobacco prevalence rates for the baseline have been revised upward and, applied to the adult civilian household population from the 2010 Census, within age/race domains, provide the respondent universes under the “wide net” definition presented in the second column of Table 1b. These revised rates were informed by the CPS-TUS and NSDUH rates in addition to a detailed modeling of the field test data that took the purposive selection of PSUs in the field test into account. At this point, the respondent universe counts in the second column of Table 1b are best estimates; they should therefore be recognized as necessarily imprecise. Any significant deviations from these rates will be addressed by varying the rates over the year of data collection; the sample will be issued in replicates over time, making it possible to adjust sampling rates in later replicates if the results in the first replicate indicate a need to do so.


As discussed at the beginning of this section, Table 1a displays the estimated sample sizes for the youth and shadow youth. Table 1b shows the expected number of adult respondents in the PATH Study baseline sample classified by race and age group and also by their tobacco use status under each of the three tobacco use definitions. Under the planned sample design, the estimated number of completed adult interviews at baseline is 42,730, including approximately 10,709 young adults (18 to 24 year-olds) and 6,000 Blacks or African Americans (Black/AA)2. The third column of Table 1b shows the relative sampling weights (the inverses of relative sampling rates) for the eight groups within the adult cohort, relative to the 18-24 Black/AA tobacco users group (which is the most heavily oversampled). These relative weights apply to the groups formed by using the “wide net” definition. The number of tobacco users is, of course, largest and the number of non-users is smallest under the “wide net” definition whereas the reverse is true under the “current user” definition.


Except for the number of youth in the shadow sample, i.e., 9 to 11 year-olds selected at baseline for the purpose of replenishing the 12 to 17 year-old youth sample in later waves but not for the purpose of interviews, the sample size estimates in Tables 1a and 1b apply to the baseline completed interviews (with or without biological specimens for adults). Specific subgroups in these tables were selected because they represent the major sampling strata at the person level. Power projections provided later in this submission focus on subgroups of potential analytic interest.


Table 1a. PATH Study youth and shadow youth respondent universes and estimated baseline sample sizes


Group

Respondent universe

Estimated baseline sample size

Children 9-11 (shadow sample)

12,639,240

8,202

Youth 12-17

25,611,322

16,186


Table 1b. PATH Study adult respondent universes and estimated baseline sample sizes


Group

Respondent universe under the “wide net” definition of tobacco use

Relative sampling weight under the “wide net” definition

Estimated

baseline sample size under the “wide net” definition

Estimated

baseline sample size under the "current user" definition

Estimated

baseline sample size under the "current or experimental" user definition

18-24 Black/AA user

2,688,087

1

1,367

807

1,176

18-24 Black/AA non-user

1,946,545

2.2

443

1,003

634

18-24 non-Black/AA user

14,780,375

1

7,283

3,933

5,608

18-24 non-Black/AA non-user

9,058,939

2.9

1,616

4,966

3,291

25+ Black/AA user

10,394,556

1.8

2,817

1,972

2,507

25+ Black/AA non-user

14,354,387

5.3

1,373

2,218

1,683

25+ non-Black/AA user

67,096,238

1.5

20,279

13,181

15,209

25+ non-Black/AA non-user

114,244,946

7.7

7,552

14,650

12,622

All adults

234,564,073


42,730

42,730

42,730

B.1c Sampling Frames

The baseline sample for the PATH Study will be selected using a four-stage, stratified probability sample design involving the selection of: (1) primary sampling units (PSUs) consisting of counties or groups of contiguous counties; (2) second-stage sampling units (referred to as segments); (3) mailing addresses; and (4) eligible persons within households occupying dwelling units (DUs) at sampled addresses. In addition to the four stages of selection, a two-phase approach will be used for the fourth stage of sampling (persons within households). The sampling frames to be used at each stage are described here.


For the initial stage of sampling, a PSU frame will be created using the Census 2010 county-level data files. The PSUs will be formed as single counties or groups of contiguous counties, depending on the population size and the end-to-end distance within a PSU. The objective of the PSU formation process will be to simultaneously maximize internal PSU heterogeneity and minimize travel distance within a PSU (e.g., to ensure that the maximum distance is no more than 100 miles), subject to a specified minimum PSU population size of 8,000 DUs. Data from the 2010 Census, and data from other sources used for stratification purposes, will be appended to the PSU frame. For example, data will be appended from the National Cancer Institute (NCI) small area estimates of county-specific current smoking rates (http://sae.cancer.gov/estimates/tables/both_current.html) and estimates of socio-demographic characteristics from the 5 year ACS.


The second-stage sampling units (referred to as segments) will be based on Census-defined blocks. The frame of segments will be created within the sampled PSUs using the 2010 Census Redistricting Data (P.L. 94-171) Summary File block data, together with address data from the U.S. Postal Service (USPS) Computerized Delivery Sequence Files (CDSFs) of residential addresses. The CDSFs are derived from mailing addresses maintained and updated by the USPS, and they are available from commercial vendors. The second-stage frame will take data from the most recent CDSF file at the time that the segment sampling is being implemented. Within the sampled PSU, where possible, the associated CDSF addresses will be geocoded to Census blocks and then, as necessary, the blocks will be grouped to create list segments of CDSF addresses. Note that post office (PO) box addresses cannot be geocoded and hence will be excluded from this process: thus, DUs that only have a PO box address are not covered by the list segments (however, approximately 90 percent of DUs with PO boxes also have street mailing addresses). Blocks with no population in the 2010 Census will be included in the segment formation process to ensure that all areas are covered. The addresses geocoded to a single block will be used as a list segment if the number of such addresses is larger than a minimum threshold. Otherwise, addresses geocoded to neighboring blocks will be combined to reach the required threshold number of addresses per list segment. Associated with each resulting list segment, will be one or more Census blocks, and the physical boundaries of these blocks will delineate areas of land, referred to as area segments. Note that the size of a list segment is based on the number of geocoded CDSF addresses and may well be different from the size of the associated area segment based on 2010 Census data. Differences will arise in part because of date differences but mainly because of geocoding errors made in assigning CDSF addresses to the area segments. Some of the CDSF addresses geocoded to a given area segment may actually be outside the segment’s geographical boundaries, and some CDSF addresses that are geocoded to other area segments may be in the given area segment. With the exception of the procedure for providing coverage to addresses not on the CDSFs (discussed later), addresses sampled from a segment will be drawn from the CDSF addresses geocoded to the area segment—that is, from the list segment—irrespective of whether the addresses fall in the area segment or not.


A frame of list segments will be constructed within each sampled PSU by using the prime contractor’s software that is designed to create segments that are contiguous and as compact as possible given the size constraints. The frame of list segments will contain details about the numbers of addresses from the CDSF, the number of households in the associated area segment, and characteristics of the associated area segments and census tracts from sources such as the 5 year ACS (e.g., urban-rural status, percent Black or African American, percent Hispanic, percent of occupied housing that is owned, and average tract-level household income). In a few rural PSUs, only a small number of geocodeable addresses will appear on the CDSF; in these PSUs, rather than using list segments, conventional area listing procedures will be applied to construct a frame of DUs in the sampled segments.


At the third stage of selection, a sample of addresses will be selected from the sampled list segments in the sampled PSUs (except for the few rural PSUs noted earlier). Recent studies indicate that the coverage of the CDSF lists of geocodeable addresses is generally high for urban and large suburban areas, and sometimes reasonably high for parts of rural areas (Montaquila, Hsu, Brick, English and O’Muircheartaigh, 2009; Dohrmann, Han, and Mohadjer, 2007; Iannchionne, Staab, and Redden, 2003; O’Muircheartaigh, Eckman, and Weiss, 2002). To handle any address noncoverage in the CDSF lists, a coverage enhancement procedure, referred to as address verification, will be applied for a subsample of segments. Although applied only in a subsample of segments, this procedure in effect gives coverage for unlisted and non-geocodeable addresses in all segments.


When a segment is subsampled for address verification, the entire area segment is canvassed by the field interviewer and any addresses not on the CDSF for that list segment are listed for potential inclusion on the supplementary address sampling frame. To handle geocoding errors, the addresses so identified are then matched against the addresses on the CDSF for the ZIP area containing the area segment, and only those not on that CDSF list are retained as a supplementary frame of addresses that will be sampled. The address verification procedure will be applied at higher rates for segments where CDSF undercoverage of geocodeable addresses is likely to be more problematic (e.g., segments where the number of CDSF addresses falls well short of the Census number of households), and the rates for sampling addresses from the supplementary lists will be determined to counterbalance the segment subsampling rate for verification. In most urban areas, the plan will be to subsample segments for verification at a low rate and then sample all the addresses on the supplementary frame.


A special issue arises in the case of multi-unit structures that are identified on the CDSF as a “drop point.” A drop point is a set of housing units that receive their mail at a single mail “drop.” The individual units at a drop point are called “drop units.” The CDSF frame includes a flag that identifies drop point addresses, as well as a variable containing a count of the number of drop units associated with each drop point. If a drop point is associated with a sizable number of drop units, then the drop point will be sampled at a higher rate than other addresses in order that, in combination with subsampling of the DUs at the drop point if selected, the sampled DUs will retain the desired selection probabilities.


While the CDSF contains flags for nearly all drop points, there are a few cases of multi-structure units (e.g., duplexes, single homes converted into multiple units) with single mailing addresses that are not flagged. When a field interviewer encounters such an address, the interviewer will initiate the add unit procedure and record each unit that he or she can identify. If a small number of units is identified (for example, no more than three), the field interviewer will attempt to administer the screener to all households residing in the identified units. If a larger number of units are identified, the units recorded by the field interviewer will be transferred to the home office where a proportion of them will be sampled for interview.


Another coverage improvement procedure, called the hidden DU procedure, will be applied during the administration of the screener. The hidden DU procedure is carried out by the field interviewer at the end of the screener interview for the base DU. Note that a DU is defined as “a group of rooms or a single room occupied as separate living quarters (or if vacant, intended for occupancy as separate living quarters); that is, the occupants do not live with any other person in the structure and there is direct access to the base DU from the outside or through a common hall or area.” The term “household” includes all persons who occupy a DU. The hidden DU procedure aims to identify DUs that are attached to the base DU where the screener interview is taking place by having the same mailing address or that were not apparent to the canvasser during conventional listing of the segment. Once identified, the hidden DU(s) will be entered into the field interviewer’s computer-assisted personal interviewing (CAPI) application, and interviewing will take place within the newly identified unit(s).


At the fourth stage of selection, the sampling frame for a selected household completing the screener interview will consist of a roster of all the eligible persons in the household. All those 12 years of age and older on the roster are then eligible to be sampled for either the youth cohort or the adult cohort. In addition, a “shadow sample” of up to two children ages 9 to 11 at the time of screening will be selected from a household for use as a refresher sample for the youth cohort for later waves of the study. After the children in the “shadow sample” have turned 12, they will become eligible to be included in the refresher sample for the youth cohort.


B.1d Sample Design

As described earlier, the sample will be selected in a four-stage stratified probability design, with a two-phase sample design for sampling the adult cohort at the final stage. The selection processes for these stages are described in turn here.


At the first stage, a stratified sample of 156 PSUs will be selected using probability proportional to size (PPS) sampling. The measure of size (MOS) will be defined to be a weighted sum of estimated PSU population counts by the adult subgroups given in Table 1b where the weights used to construct the MOS are proportional to the expected overall sampling rates to be applied for each subgroup. The PSU population counts by age and race will be obtained from Census Bureau population estimates. The breakdowns of adult age/race groups by tobacco usage will be based on a simple model that takes account of the variability of current smoking rates across PSUs as indicated by the NCI small area estimates. Any PSU that is by itself more than 0.67 percent of the national population (about 2.1 million people) will be treated as an initial “certainty PSU”. Then additional certainty PSUs will be identified from any Core Based Statistical Area (CBSA) that is large enough to be treated as a stratum after the original certainty PSUs are removed. The total number of certainty PSUs will be 35. Each certainty PSU is in effect a separate stratum.


After accounting for the certainty PSUs, the remaining PSUs will be selected using a carefully stratified design in which the PSUs are selected without replacement and with probability proportionate to size. The stratification factors will include such variables as the geographic region, CBSA status, percent minority population, poverty rate, education, and other variables where appropriate. Approximately 57 equal sized strata (in terms of aggregate MOS) are expected to be formed.


Within the selected PSUs, segments will be formed, and a systematic PPS sample of about 40 segments will be drawn within each noncertainty PSU, more in the larger certainty PSUs, for a total of 6,000 segments. The systematic selection will be with respect to a sort of the segments. The sort variables to be considered include urban-rural status, percent of occupied housing that is owned, race/ethnicity, and possibly average tract-level household income (based on data from the American Community Survey) for the associated area segment.


At the third stage of sampling, a systematic sample of addresses will be drawn from the CDSF list frame for the list segments, and DUs from the conventional list frame constructed for the area segments for the PSUs for which the proportion of geocodeable CDSF addresses is very low. For segments in which the verification procedure is applied, a sample of any supplementary addresses will also be selected. The hidden DU procedure will be applied at all sampled DUs, with the end product being a sample of households.


A roster of all the members of each sampled household will then be constructed by interviewing a household informant, together with information on the person’s age and, for adults, on their race (Black/African-American vs. all others), and tobacco use. The three components of the sampling of household members are as follows:


  1. Shadow sample

If any children ages 9 to 11 are in a household, up to two will be selected at random for the shadow sample. Sampled children in this age range may enter the youth cohort in later waves of the study on reaching 12 years of age.

  1. Youth cohort

If any youth ages 12 to 17 are in a household, one or two will be selected at random for the youth cohort.

  1. Adult cohort

No more than two adults will be sampled for the adult cohort.

Given a special analytic interest in monozygotic and dizygotic twins in the youth cohort, the shadow and youth cohort sampling procedures are modified when households containing twins are encountered. In such households, a twin pair will be sampled, and if another youth or youths are in the same sample group (shadow or youth) as the twins, those youth will be given a probability of 1/3 of also being selected with at most one being selected.


The sampling method for selecting adults for the first phase of the adult cohort depends on the desired selection probabilities for each of the adult subgroups listed in Table 1b within the sampled PSU. To describe the method, let denote the desired within-household sampling rate for an adult l. This probability depends on the person’s age, race, and tobacco use. Let be the sum of these probabilities for all adults in the household. The following two classes of households can be distinguished:


  1. Households with . Select 0, 1 or 2 adults by a systematic PPS sample with measures of size and an interval of 1.

  2. Households with Select two adults by PPS with an interval of 1 and with adjusted probabilities .

The second phase of sampling is included to address classification errors in the responses of household screener respondents, in particular the misclassification of a sampled person as a non-tobacco user when the self-report would indicate the person is a user. At the first phase, the sampling rates for non-users are kept within reasonable bounds, compared to the rates for users, in order to ensure that the weights of any persons sampled at the first phase as non-users, who then report themselves at the second phase to be users, are not too much larger than the weights of those who are correctly classified as users at the first phase. Misclassification in the other direction ― with the household informant reporting the person as a user when the person then reports him- or her-self as a non-user ― will be handled by deselecting some members of this group so that those retained have the same sampling rates as other non-users.


B.1e Group Quarters

The types of noninstitutionalized group quarters that are of interest to the PATH Study include only college dormitories. College students living in dormitories and fraternity and sorority houses will be sampled through their permanent residence. If a student who lives in a dormitory for much of the year is identified as a sample person and is at home (their permanent residence) when the screening occurs, an attempt to administer the interview will be made before the student returns to the dormitory. Otherwise, a time will be found during the field period when the student will be at home and an interview will be scheduled for that time. If this is not possible, the student will be contacted and interviewed on campus if the dormitory is close to any sampled PSU (which need not be the PSU of the family residence). Identifying students in dormitories via their family residence is a simpler process than constructing a separate dormitory sampling frame from which to select students. It avoids the costs and complications of contacting and gaining permission from college and university officials, of obtaining and sampling from lists of dormitories, and of listing and sampling within selected dormitories.


B.1f Estimated Response Rates

Once a DU is selected, a household screener will be administered to determine the race, age, and tobacco usage of each adult, and the age of each child in the household. Based in large part on the PATH Study field test and informed by recent experience with large, national surveys, including the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), the PATH Study’s estimated screener response rate in the baseline wave is 70 percent. However, the 40 percent response rate obtained in the field test, though a worst case scenario, suggests that 70 percent may be optimistic, in spite of several protocol changes implemented since completion of the field test. Consequently, a range of response rates are presented in Table 2 , based on alternative assumptions. Section B.3 describes approaches the PATH Study is planning to employ to achieve this target.


In terms of screening, the expected eligibility rate for households may round to 100 percent, because a small percentage (less than 0.1 percent) of households in the U.S. are comprised solely of military personnel on active duty. The response rate assumptions include an adjustment for the expected small percentage (less than 2 percent) of households that would not respond because of language difficulties.3 Depending on the age, race, and tobacco usage of the adult persons in the household as reported by the household informant, up to two adults will be selected at the first phase of sampling. Selected adults will proceed to the second phase of sampling where they will be administered a short series of questions at the beginning of the extended interview to determine their self-reported tobacco usage. Based on these self-reports, approximately 72 percent of the adults selected at the first phase of sampling are expected to be retained at the second phase of sampling for the full extended interview. At the same time, up to two youth ages 12 to 17 will be sampled from the household (or in the case of multiple births, up to 4 youth per household). Within each household, independent sampling will be conducted for adults and for youth. For the baseline wave, response rates for the extended interviews are expected to be 85 percent for sampled adults and 75 percent for sampled youth. For both adults and youth, expected response rates for the extended interviews are 92 percent for Wave 2, 95 percent for Wave 3, and 96 percent for each of the remaining follow-up waves. These attrition rates are based on 2008-2011 Medical Expenditure Panel Survey (MEPS). A series of measures will be undertaken to achieve these response rates, as described in Section B.3. Thus, the overall baseline response rate is estimated to be 60 percent for adults and 53 percent for youth (i.e., the product of the expected screener response rate and the expected person-level response rate). Table 2 and Attachment 22 summarize the estimated response rates for the PATH Study, based on alternative assumptions.

Table 2. Range of estimated response rates for baseline wave, based on alternative assumptions


Sampling unit

Best estimate for response rate at each stage of recruitment

Estimated number

Worst-case scenario, based on field test results, for response rate at each stage of recruitment

Estimated number

Primary sampling unit (PSU)

–––

156

–––

156

Area segments/CDSF segments

40 per PSU

6,000

40 per PSU

6,000

Addresses

28.1 per segment

168,857

28.1 per segment

168,857

Occupied dwelling units

88.6%

149,607

88.6%

149,607

Households completing screener enumeration

70%

104,725

39.7%

59,394

Adult sample (persons ages 18+)





Eligible households with adults

100%*

104,725

100%*

59,394

Number of adults sampled at first stage

Up to 2 per HH

70,000

Up to 2 per HH

39,700

Number of adults completing second-phase sampling questions at beginning of extended interview

85%

59,500

58.1%

23,066

Number of adults retained at second phase of sampling and completing full extended interview

72%

42,730

72%

16,565

Number of adults completing extended interview who provide buccal cells

80%

34,184

73%

12,092

Number of adults completing extended interview who provide urine

80%

34,184

49%

8,075

Number of adults completing extended interview who provide blood

65%

27,775

39%

6,460

Number of adults completing extended interview who provide all biospecimens

65%

27,775

39%

6,460

Youth sample (persons ages 12-17)





Eligible households with youth

16%

16,756

16%

9,503

Eligible households reporting youth

100%

16,756

100%

9,503

Number of sampled youth

Up to 2 per HH

21,582

Up to 2 per HH

12,240

Number of youth completing extended interview

75%

16,186

62.6%

7,662






Forty segments will be selected from most PSUs; more from large certainty PSUs, and fewer from smaller PSUs, such that the expected total number of segments is 6,000.

* A very small number of screened households may contain only persons under 18 or on active duty.



B.2 Procedures for the Collection of Information

B.2a Overview

The PATH Study involves four main components. These components are: (1) an automated CAPI (Computer-Assisted Personal Interviewing) household screening instrument, (2) automated ACASI (Automated Computer-Assisted Self-Interview) extended instruments (separate instruments for youth and adults), (3) an automated CAPI parent instrument, and (4) collection of biospecimens from adults (three biospecimens, buccal cells, urine, and blood, will be collected at baseline). Collection of biospecimens is not a requirement for adult participation; however, completion of an extended interview at the first home visit is required.


The primary objective of the field interviewer working on the PATH Study is to obtain complete and accurate information from sampled persons in each eligible household in their assignment.. In addition to gathering the substantive information required for the study, meeting this objective will allow for proper non-response analysis. Obtaining complete and accurate information requires that the field interviewer have a thorough understanding of the PATH Study’s protocol, as well as an understanding of the techniques required to gain the respondent’s cooperation and maintain rapport through the interaction. All field interviewers working on the PATH Study will receive extensive in-person training on the exact procedures to be followed in the administration of the data collection instruments themselves, as well as techniques to gain cooperation, such as understanding the importance of the study, answering respondent questions, and addressing respondent concerns.


The training provided to field interviewers will be in two forms: home study and in-person. The 12-hour home study program will be designed to introduce trainees to the PATH Study, with a focus on the respondent contact materials. The home study will also provide field interviewers with practice in gaining cooperation and establishing rapport. In-person training techniques are designed to maximize trainee involvement, maintain the interest of the trainees, and produce well-trained field interviewers who have the necessary skills for gaining respondent cooperation, correctly answering questions about the study, and adeptly completing all components of the interviews. Training materials will be developed by experienced PATH Study team members. In the 5 and a half-day in-person session, field interviewers will be trained on techniques for obtaining consent; conducting the CAPI screener, ACASI extended interviews, and CAPI parent interview; collecting buccal cell and urine samples; issuing respondent incentives; and completing administrative procedures such as data transmission and reporting to the supervisor. Experienced phlebotomists also will be trained to return to the homes of consenting adults to collect blood samples. In addition to the in-person training, field interviewers will be provided with a field interviewer manual, providing detailed reference materials on locating sampled addresses, determining household membership, the interviewing process, questionnaire content, and biospecimen procedures. The phlebotomists will be provided with a study-specific phlebotomist manual on collecting blood.


During the data collection period, numerous quality control procedures will be used to ensure that field interviewers are following the specified procedures and protocols and that the data collected are of the highest quality. Field interviewers who successfully completed training, but show any area of potential weakness, will be observed in-person at least one time by a supervisor or home office staff members. Observing field interviewers conducting their job in the field is a very effective way of monitoring their skills to conduct the interview, as well as their adherence to the PATH Study’s procedures. It also provides the observer with an appreciation of the field interviewers’ tasks and provides the opportunity to experience first-hand the administration of the PATH Study instruments and biospecimen collection procedures. Observations will be concentrated in the early weeks of data collection so that problems are detected as early as possible, to provide corrective feedback to the field interviewers.


Brief quality control interviews will be conducted to verify that an interview was administered or attempted as reported by the field interviewer. Quality control procedures will be implemented to verify at least ten percent of each field interviewer’s finalized work to ensure that the interview was conducted according to study procedures. This includes cases finalized as complete, as well as those with non-complete dispositions, such as vacant or refusal. Quality control will begin early in the data collection period to allow for any identified problems to be addressed immediately. As part of quality control, selected items from the CAPI interviews will be audio-recorded (with the consent of respondents) and reviewed to assess interviewer performance. As needed (e.g., when a respondent refuses audio-recording), quality control interviews will be conducted by telephone with a sample of adult respondents. For some non-complete dispositions (e.g., the dwelling unit is vacant), an experienced, specially trained field interviewer will validate the disposition in person.


Additionally, throughout the field period, supervisors will remain in close contact with the field interviewers. Scheduled weekly telephone conferences will be held in which all non-finalized cases assigned to the field interviewer will be reviewed to determine the best approach for working the cases and the need for additional resources.


Management staff at all levels will have access to a supervisor management system, including automated management and production reports that will be used to monitor the data collection effort and ensure that the data collection and quality control goals are being attained. Field interviewers will be required to transmit data on a daily basis. Data will be transmitted to a secure server at the office of the prime contractor, which will then be used to update the automated management reports. These data are also used to produce weekly reports that might provide evidence of suspicious field interviewer behavior, such as overall interview administration length, individual instrument administration time, amount of time between interviews, interviews conducted very early in the morning or late in the evening, and number of interviews conducted per day.

B.2b Household Screener

The random selection of up to two adults and two youth (unless a household includes twins, in which case additional youth could be selected) per eligible household (as described in Section B.1) is conducted through the use of an automated screening instrument (see Attachment 3). The screener respondent will be an adult household member age 18 or older. The screener uses a full household enumeration process to collect information on age for each reported household member; and race, active military service status, ability to speak in English or Spanish, and tobacco use for each adult household member. The relationship of all household members to the screener respondent is also collected. In addition to household enumeration information, the household respondent and each sample person’s telephone numbers are collected to allow the recontact of the household for quality control purposes, or to set appointments for the extended and parent interviews if the sample person is unavailable at the time of the screening. Finally, if the mailing address differs from the street address, the household mailing address is collected. Mailing address allows written follow-up with nonresponse cases and regular contact with respondents between data and biospecimen collection waves, as discussed in Section B3.


The proposed sampling algorithm for selecting up to two adults and two youth (except in the case of twins) per household has been programmed within the CAPI screener software. To check that the screener is working properly, it will be tested extensively by professional software testers.


B.2c Extended Interview

The data collection procedures differ for adults and youth.



Adults

Following the administration of the screener, if the selected sample adult is available and has an adequate amount of time to complete the interview, the field interviewer: (1) obtains informed consent (see Attachment 13); (2) administers the adult extended interview, which includes gathering additional contact information about the adult; (3) obtains consent for the biospecimen collection; (4) gathers the biospecimens (buccal cell sample and urine); (5) arranges a follow-up appointment for a phlebotomist to collect a blood sample; and (6) pays the incentive to the respondent at the completion of the first home visit. (The biospecimen collection is discussed further in Section B.2d.) If a sample adult is unavailable or unable to complete the interview at that time, the field interviewer will attempt to schedule an appointment for a return visit or, at a minimum, determine the best time for a return visit.


After obtaining consent, the field interviewer provides a brief automated tutorial on using ACASI and launches the automated ACASI extended interview. The first part of the extended interview is the individual screener; these items may confirm or contradict the information provided in the first-phase household screener by the screener respondent. Depending on the individual’s self-reports (e.g., on tobacco usage), the sample person may be de-selected and not asked to complete the remainder of the extended instrument. As required throughout the interview, the field interviewer will aid the sample person in providing a response. At the end of the extended interview, the field interviewer gathers additional contact information for that person, and asks the respondent to consent to providing biospecimens. (See Section B.2d.)


The sample adult who completes the extended interview or is excluded based on his/her responses to the individual screener items (which constitutes the second phase of screening described in Section B.1d) will receive $35 (the adult extended interview incentive) as a thank you for completing the interview.. These respondents will also receive a thank you letter (Attachment 9). A refusal conversion letter will be sent to sample adults who initially decline to participate or are difficult to contact (Attachment 20).



Youth

Following the administration of the screener, if the parent or guardian of the selected youth is available and has an adequate amount of time, the field interviewer: (1) obtains parent permission for the youth to participate; (2) obtains consent for the short parent interview; and (3) administers the CAPI parent interview, which includes gathering additional contact information about the youth from the parent. If a parent of a sampled youth is unavailable or unable to participate at that time, the field interviewer will attempt to schedule an appointment for a return visit or, at a minimum, determine the best time for a return visit. The youth interview will not be conducted until parental informed consent has been obtained. The parent who completes a parent interview for the youth will receive $10 as a thank you for completing the interview.


For a selected youth with parental permission, if the youth is available and has an adequate amount of time to complete the interview, the field interviewer obtains youth assent (see Attachment 13) and then attempts to complete the automated ACASI extended instrument. If a sample youth is unavailable or unable to complete the interview at that time, the field interviewer will attempt to schedule an appointment for a return visit or, at a minimum, determine the best time for a return visit.


After obtaining assent from the selected youth, the field interviewer provides a brief automated tutorial on using ACASI and launches the automated ACASI extended interview. As required throughout the interview, the field interviewer will aid the sample person in providing a response.


The youth respondent who completes the extended interview will receive $25 (the youth extended interview incentive) as a thank you for completing the interview. The parents of youth respondents will receive a thank you letter (Attachment 9). A refusal conversion letter will also be sent to the parents of respondents who are difficult to contact (Attachment 20). A youth respondent will also receive $5 each time, up to two times, his/her parent updates the youth’s contact information, for a total of $10.


Burden Reduction by Avoiding Redundant Data Collection

The parent interview collects personal information about the parent of a sampled youth, some general characteristics of the household as a whole, and information about the youth, plus contact information to support reaching the parent and youth for future data collection activities. Because more than one youth may be sampled per household, one parent may be asked to respond to a parent interview in regard to more than one youth. In this instance, the parent will not be asked to again provide his or her personal information, the household information, or the contact information after the first instance of the parent interview. Only the information relevant to each sampled youth about whom the parent is providing information will be collected after the first administration of the parent interview.

In a few instances, the PATH Study will purposely collect some information that has been previously provided to validate previous information, to give respondents the opportunity to consider their answers in a private setting, and to collect information that provides broader context and background to the respondent on particular items. The main instance where this occurs is in the second-phase adult individual screener.


Among other purposes, the household screener collects a minimum amount of high-level information about each adult’s tobacco use in order to classify him/her sufficiently for potential selection to the study based on the PATH Study sampling algorithm.


The first-phase household screener obtains tobacco use information about all adults from the single household respondent. Various reasons why this approach may yield inconsistent or imperfect information are described in Section A.2b. To obtain more complete, consistent information from an individual adult, the second-phase screener (i.e., the adult individual screener) is used to ask a more extensive panel of tobacco use questions. Using a second-phase screener such as this helps to reduce bias and increase the validity of responses obtained from an individual respondent rather than from the household respondent who completed the first-phase screener. Even if the person who completed the first-phase screener is the same individual who completes the second-phase screener, the second-phase screener is designed to reduce bias and enhance the validity of responses because: (1) questions are asked in a more private setting using ACASI (rather than CAPI); (2) questions are more detailed and given a more detailed context; and (3) questions are asked in an open format such that it is clear to the respondent that answers are neither “right” nor “wrong.”


B.2d Biospecimens

The field interviewer will ask an adult who completes an extended interview to consent to provide biospecimens as part of the PATH Study. However, providing biospecimens is voluntary and not a condition of participation. Completion of the extended interview at the first home visit is required from all respondents who choose to join the longitudinal cohort.



Buccal Cells and Urine

For adults who consent to provide buccal cells and urine, the field interviewer will collect those specimens following the completion of the interview. The field interviewer will provide written and oral instructions to the respondent for collection of the buccal cells and urine specimen. The field interviewer will pack the specimen(s) and ship them to the PATH Study biorepository.


The respondent who provides biospecimens during a first home visit will receive $25 as a thank you for participating in the buccal cell and urine sample component of the study.



Blood

For adults who consent to provide blood, the field interviewer will schedule the appointment for the visit by the phlebotomist to obtain the blood specimen. After the initial home visit by the field interviewer, the phlebotomist will contact the adult to confirm the appointment for collecting a blood specimen.


Upon visiting the respondent’s home, the phlebotomist will administer the blood suitability exclusion questions (see Attachment 3) for blood collection (CAPI instrument) and request that respondents answer items about his/her recent use of tobacco products (CASI instrument) (see Attachment 3). The phlebotomist will then collect the blood specimen, and pack and ship it to the PATH Study biorepository.


The respondent who provides a blood specimen during a second home visit will receive $25 as a thank you for participating in the blood sample component of the study.


B.2e Weighting and Estimation Procedures

Sample weights will be developed for the PATH Study respondents to permit estimation for and inference about the population from which the sample is drawn. The sample weights will be produced to accomplish the following objectives:


  1. Permit the appropriate development of estimates, taking account of the fact that not all persons in the target population will have the same probability of selection;

  2. Limit the potential for biases arising from differences between cooperating and noncooperating sample persons and households;

  3. Use auxiliary data on known population characteristics in such a way as to reduce coverage biases and benchmark the PATH Study’s estimates to the corresponding population totals;

  4. Reduce the variation of the weights and prevent a small number of observations from dominating domain estimates; and

  5. Facilitate sampling error estimation appropriate to the complex sample design.

The data used in weighting will undergo careful edit, frequency, and consistency checks to prevent errors in the sample weights. The checks will be performed on items to be used in the weighting procedures and will be limited to records that require weights. These checks are important because errors in the weights can have sizable effects on the PATH Study’s estimates.


The first step in the weighting process is to compute the selection probabilities for all households sampled (responding households and nonresponding households). The household base weights are then the inverses of these selection probabilities. The household base weights of responding households (i.e., those for which the screener is completed) are then inflated to compensate for the nonresponding households. The adjusted household weights are then the starting point for the computation of the person weights.


Persons are selected with different probabilities within responding households in order to achieve required sample sizes by tobacco use, age, and race. The next step is then to modify the adjusted household weights to create person base weights that compensate for the unequal selection probabilities of sampled persons (respondents and nonrespondents). At this point, a decision will need to be made as to what constitutes a “response.” Persons who start the interview but break off early on are commonly treated as nonrespondents.


More significant for the PATH Study is the response classification of those adults who complete the interview but do not provide any of the biospecimens, and of those who complete the interview and provide the buccal cell and urine samples but not the blood samples. A complication here is that some of the biospecimens will turn out not to be analyzable (i.e., biospecimen nonresponses). However, because biospecimens will not be analyzed until later, those not analyzable will only be identified when the weighting is being conducted.


Two alternative approaches can be used for handling component nonresponse (here, biospecimen nonresponse). One is to treat the component nonresponse as a set of item nonresponses in a respondent record, using imputation as the means of compensation for the missing data. In this case, the analytic data file for the baseline data collection would comprise all sampled adults who completed the interview, irrespective of whether they provided any of the biospecimens, and all sampled youth who completed the interview.

The other approach for handling component nonresponse is to create separate sets of weights according to which components were completed. For example, for the PATH Study, one set of weights could be for all the adults who completed the interview and these weights would be used for analyses that are confined to the interview data. A second set of weights could be computed for all adults who completed the interview and the buccal cell and urine samples, for use in analyses that required those biospecimens only. A third set of weights could be computed for adults who completed the interview and provided all three biospecimens. Such an approach maximizes the sample size for each type of analysis. However, computing all sets of weights may not be worthwhile if the sample size for a data set that is more inclusive than the data used in a given analysis is not much smaller than the data set that contains only the data required.


Given that the biospecimens will be stored, and analysis undertaken at a later date, no immediate decision will be made between these two approaches. At this point, it is sufficient to note that only one set of adult weights will be constructed: the weights for all those who complete the interview. Decisions about whether to impute for missing biospecimens or whether to create separate sets of weights for different patterns of biospecimen response can be deferred until the biospecimen data are to be analyzed. This approach also allows for adults who may refuse to provide one or more biospecimens at baseline, but agree to do so at a subsequent wave of the study. Upon completion of the baseline, a nonsubstantive change request will be submitted to OMB that describes the decisions made as well as their implications. No biospecimen data will be made public, and no analyses of biospecimen data will be disseminated, until approval of the change request has been granted.


The steps described in detail in the next section indicate the weighting process to be used for the development of the baseline weights for the respondents to the baseline interviews. For subsequent waves, nonresponse adjustments that account for cohort attrition across waves will be undertaken to produce longitudinal weights. In addition, sampled persons who age into the youth or adult cohort study (i.e., reach age 12 or 18) in subsequent waves will need to be assigned weights for cross-sectional analyses for the wave they enter, and for cross-sectional and possibly also for some longitudinal analyses thereafter.



Development of the Sample Weights for Baseline Respondents

Screener Base Weights

The first step in the development of the baseline person weights is to calculate base weights for all sampled households. The screener base weight initially will be computed as the reciprocal of the product of the household’s selection probability. The final, adjusted, screener weight will include the adjustments needed to reflect the selection of nonresponding households.


The screener base weight is given by



where represents the overall probability of selection of household k in segment j of PSU i. For most cases, households will be sampled straightforwardly from the USPS address list, in which case is simply the product of the PSU, the segment-within-PSU, and the address-within-segment probabilities. The same probability also applies to households sampled though the address verification or hidden DU procedures provided all households “discovered” at the sampled address are sampled. If a sample of households is taken at the address, then the probability of sampling the household given the address has to be added as an additional multiplier. The value of for households sampled from the address verification procedure is the product of the PSU and segment-within-PSU probabilities multiplied by the probability of the address verification procedure being applied and the probability of the household being selected given that the address verification procedure was used.



Adjustment for Nonresponse to the Screener

The household base weights are computed for all sampled households. However, the screener will not be completed for all of them. Adjustments will therefore be made to the base weights of responding households to compensate for the nonresponding households. All adjustments will be made within weighting classes based on information available for both responding and nonresponding households, namely the segments in which they are located. Census 2010 data at the area segment level and geographical proximity will be used to group segments into weighting classes.


Then, within a weighting class, the base weights for the responding households will be inflated proportionately so that they produce the same sum as the sum of the base weights of the responding and nonresponding households combined.



Person Base Weights

To produce unbiased estimates, a weighting factor is needed to account for the within-household selection rate. The person base weights will be computed as the product of the screener nonresponse-adjusted weight and the reciprocal of the within-household probability of selection for person l within household k of PSU i and segment j, as shown in the following formula:


;


where


= the within-household probability of person l being selected into the sample, which will depend on the number of persons in household k, their ages, races, and tobacco use, and

= the screener nonresponse-adjusted weight.


Adjustment for Person-Level Nonresponse

Similar to the adjustment for screener nonresponse, a nonresponse adjustment will be performed to account for nonrespondents to the extended interview. The weights of respondents to the extended interview will be inflated to account for the nonrespondents. In addition to segment identification available for the screener nonresponse adjustment, screener variables such as age, gender, and race/ethnicity also can be used to form weighting classes. A variety of methods, such as CHAID (Chi-squared Automatic Interaction Detector), logistic modeling of response propensity, and data mining, exists for determining the weighting classes.



Trimming

Trimming is a process in which inordinately large weights are reduced. It is used because very large weights can substantially increase sampling errors. A trimming algorithm will be used to reduce the variation in the nonresponse-adjusted weights. In general, trimming procedures introduce some bias into the sample estimates. However, when the trimming adjustment is applied to only a very small number of weights, the expectation is that the reduction in the sampling error component of the overall mean square error will more than offset the increase in bias. A preassigned rule will be applied within prespecified sampling and analytical domains to determine which weights should be trimmed.



Computing Final Weights—Poststratification Through Raking

Undercoverage of the target population is a common problem in large, national research studies and surveys. Undercoverage occurs when some population units are not included in the sampling frame and have no chance of being selected into the sample. It also can arise in household enumeration when not all the eligible household members are enumerated for sampling. Techniques such as the address verification and hidden DU procedures are used to improve coverage rates for households. Methodologically sound approaches to screening households can limit the degree of undercoverage experienced during household enumeration. To account for undercoverage and other sources of bias remaining after the nonresponse adjustment, the PATH Study will adjust person weights resulting from the previously applied steps to independent estimates of population parameters. This will be accomplished by “raking” (as described here) the weights so that numerous totals calculated with the resulting full sample weights will agree with population totals from the Census Bureau’s Population Estimates Program and/or the American Community Survey (ACS).


A particular form of poststratification referred to as raking ratio adjustments is planned. The final sampling weights will be computed by raking the weights to known population totals. In poststratification, classes are formed from cross-tabulations of certain variables. In some instances, such cross-tabulations may lead to sparse cells, or population distributions may be known for the marginal but not the joint distributions of variables used to define the weighting classes. Weighting class adjustments based on small cell sizes can result in a large amount of variation in the adjusted weights. Raking ratio adjustments are useful for maintaining the weighted marginal distributions of variables used to define weighting classes. For this type of adjustment, population distributions are required for the marginal distributions of the weighting class variables and not for their joint distribution.


The weights of all persons who complete the interview will be included in the raking. Segment-level and screener variables can be used to form raking cells, as well as variables from the extended interview.


Specially developed SAS macros will be used to compute the weights for the PATH Study sample. These macros perform such tasks as cell weighting adjustments for nonresponse, poststratification, raking, generalized regression estimation, creation of replicates for variance estimation, and weighting adjustments (i.e., nonresponse adjustment, poststratification, generalized regression estimation, and raking) of the replicate weights.



Replicated Weights for Variance Estimation

Variance estimation must take into account the sample design. In particular, the estimate of sampling variance for any statistic should account for the effects of clustering, stratification, unequal selection probabilities, and the use of nonresponse, trimming, and poststratification adjustments. For the PATH Study, treating the data as having been selected by simple random sampling will produce underestimates of the true sampling variability.


Several alternative replication methods have been developed for computing valid variance estimates for estimates based on complex sample designs. The PATH Study plans to use the jackknife method. It can be used to estimate variances for most statistics. The jackknife method drops one PSU from a stratum and increases the base weights of the units in the other PSUs in the stratum to compensate. An estimate of the statistic of interest, say , is then computed from the reduced sample. This process is repeated, dropping PSUs in turn to create a series of estimates of , say for replicate A general version of the jackknife drops each of the PSUs in turn. For this version, the variance of the estimate of based on the full sample, , is computed as



where sampled PSUs are in stratum and .


After computing the base weights for each replicate, all the weight adjustment steps leading to the final person weight will be performed on each replicate. By repeating these adjustments on the revised base weights for each replicate, the impact of these procedures on the sampling variance of the estimator is appropriately reflected in the variance estimator


Various modifications may be made if the number of replicates with the general procedure is very large. In the case of the PATH Study, this issue will arise only with respect to the certainty PSUs (in reality, they are strata). For these PSUs, the actual PSUs are in fact the segments. A standard procedure of combining groups of segments will be applied to avoid an excessive number of replicates. This combining procedure does not lead to any bias in the variance estimation.


A number of programs can be used for computing variances with replication methods including the prime contractor’s “WesVar” software, which is freely downloadable from the web. Alternatively, strata and PSU identifiers enable a linearization approach to variance estimation to be used, if required.



Longitudinal Weighting

The previous discussion describes how the weighting will be carried out for respondents to the baseline wave. At the second wave of the cohort study, interview data will not be obtained for some of the baseline respondents. Some form of compensation is required for the resultant missing data. Those who respond at both waves will constitute the data set for longitudinal analyses. For cross-sectional analysis, those sampled 17 year-olds who have reached the age of 18 in the intervening year will be added to the adult cohort representing the adult population of inference.


Two alternative approaches can be used for compensating for baseline respondents who do not respond at the second wave: imputation and weighting adjustments (see, for example, Kalton, 1986). Each approach has its advantages and disadvantages – a recommendation for using one of them will be made after the data are collected and patterns of nonresponse can be assessed.


The imputation approach keeps the second wave nonrespondents in the analytic file, imputing all their missing second wave responses based on their baseline data. Performing all these imputations in an effective way that does not distort relationships between items both cross-sectionally and longitudinally is the major concern with this approach. Until recently, this concern has resulted in the weighting approach generally being preferred. However, recent developments in imputation software, such as the prime contractor’s “AutoImpute” software, make the imputation approach more competitive. With this approach, the baseline weights of interview respondents are not altered for longitudinal analyses.


To date, the usual way to compensate for wave nonresponse has been by a weighting adjustment. Because there is so much information about the second wave nonrespondents will be available from their baseline responses, the challenges with this approach are to select the auxiliary variables to be used in making the adjustments and to determine the form of adjustment to use. For example, Rizzo, Kalton, and Brick (1996) describe analyses they performed under a contract with the Census Bureau to examine these issues for handing panel attrition in the Survey of Income and Program Participation.


A complication arises in later waves if a respondent misses one wave but returns to the cohort in the following wave. With the imputation approach, the imputed values for the missing wave should be made consistent with the responses for the adjacent waves. With the weighting approach, those missing a wave can be incorporated in cross-sectional estimates for the later wave, but they will not provide data for longitudinal analyses involving the missing wave. A possible compromise approach is to apply weighting adjustments for second-wave respondents for analyses at that time, but then to impute responses for those nonrespondents at the second wave who respond at the third wave, incorporating both first and third-wave responses in the imputation model. NIDA and FDA will work with the prime contractor to determine the best approach to use in the analyses for handling respondents with a missing wave of data that is bounded by reported data for adjacent waves.


Those 9 to 11 year-olds selected as part of the shadow sample will be included in the baseline weighting process. Their weights will be the household base weights adjusted for nonresponse at the household level. These weights will serve as the “base weights” for the shadow sample members when they become 12 years old and join the youth cohort. Consideration may be given to doing a poststratification adjustment for 12 year-olds each year to help ensure that they are fully reflected among the group ages 12 to 17.


B.2f Expected Levels of Precision of the PATH Study

The PATH Study is designed to produce reliable estimates of between-person differences and within-person changes in tobacco-related attitudes, behaviors, and health conditions among various population subgroups and over time. Characteristics of most interest are dichotomous, having “yes” or “no” outcomes. The percentage of “yes” responses is denoted by p and represents the prevalence estimate for a particular characteristic (e.g., cigarette smoking). Based on past research and cumulative professional expertise, the majority of characteristics measured in the PATH Study are expected to have magnitudes of prevalence exceeding 10 percent, while the expected magnitude of a few characteristics (such as initiation of tobacco use) will lie between 1 and 5 percent.


One measure of the precision associated with cross-sectional prevalence rates is relative standard error (RSE), defined as the standard error divided by the prevalence estimate and expressed as a percentage. More specifically, , where the standard error is given by the square root of the variance of the estimate, taking into account the complex sample design of the PATH Study. A measure of power associated with longitudinal analyses of change in prevalence rates is the minimum detectable absolute difference (MDAD). Herein, the MDADs represent the smallest change (up or down) from a given baseline prevalence rate that can be detected with 80 percent power using a two-sided test at the 5 percent level of significance, taking into account the complex sample design of the study. The impact of the various complex features of the sample design on variances, and therefore on RSEs and MDADs, is reflected through inflation factors called design effects (DEFFs). The extent to which these design effects exceed one indicates the extent to which the variance of an estimate based on the complex sample design is greater than the corresponding variance based on a simple random sample (SRS) design. Several key features to the PATH Study sampling design contribute to the overall design effect.


The first feature is the clustering at both the PSU and segment levels. In general, for a fixed sample size, the greater the number of units to be sampled per cluster, and the more homogeneous the sampling units are with respect to a characteristic of interest within clusters, the greater the DEFF and hence the inflation in the variance (resulting in decreased precision). The level of homogeneity within a cluster is reflected through two types of intraclass correlations: for PSUs and for segments. Note that and will vary in value for different characteristics of interest. The expected standard errors for prevalence estimates for the PATH Study have been calculated taking into account the contributions due to clustering at both the PSU and segment levels under the assumptions that the intraclass correlations ( , ) are (.01, .05). These values were based on estimates taken from various sources in the survey research literature. The calculations reflect the fact that “certainty PSUs” are, in fact, strata not PSUs, so that no contribution to the variance from clustering at the PSU level occurs for these PSUs. Thirty-five of the 156 PSUs selected are certainties, representing 24 percent of the U.S. population.


A second feature of the PATH Study design that contributes to the overall sampling variability is the plan to sample adults with different selection probabilities according to their age, race, and tobacco use (the latter both as reported by the household screener respondent and as self-reported by the adult at the second phase of screening). The unequal weighting DEFFs due to this feature of the sample design are expected to range from 1.00 to 1.63, depending on the domain of interest. For analyses that combine all adult respondents, this component of the unequal weighting DEFF is expected to be approximately 1.76.


The third feature of the PATH Study design that contributes to the overall sampling variability is the restriction that no more than two adults be sampled from a participating household. This requirement contributes to the variability of weights because adults in some multi-person households will be sampled at lower rates than persons of the same age, race, and tobacco use group in single- or two-person households. The unequal weighting DEFFs due to this feature of the sample design are expected to range from 1.00 to 1.02, depending on the domain of interest. For analyses that combine all adult respondents, this component of the unequal weighting DEFF is expected to be negligible (i.e., approximately equal to 1). Note that for analyses of subgroups of race, say by age or sex, these DEFFs will diminish, because generally fewer members of the subgroups will contribute to the clustering effect.


Estimates of precision and power for the PATH Study were calculated (taking into account the DEFFs resulting from the three sample design features described previously) and are shown in Tables 3a-4b. Tables 3a and 3b are for adults and Table 4a and 4b are for youth. The projected RSEs are for a generic statistic estimating a prevalence rate of 15 percent (such as the percentage of the adult population who are every day cigarette smokers). The MDADs are for a generic statistic estimating change from a baseline prevalence rate of 10 percent (such as any non-cigarette tobacco use). Both the RSEs and MDADs presented here are for illustrative purposes.


In Tables 3a and 4a, the RSEs are for cross-sectional estimates at baseline and the MDADs are for a change from baseline to Wave 2. In Tables 3b and 4b, the RSEs are for cross-sectional estimates at Wave 4 and the MDADs are for a change from baseline to Wave 4. The subgroups of interest are defined in terms of tobacco-related behaviors, which are subject to change over time. This presents a challenge when trying to estimate the subgroup sample sizes in future waves of the PATH Study, particularly given the recent expansion of tobacco products on the market. Over time, participants sampled as youth will become young adults and those sampled as young adults (18 to 24 years of age) will move into the older age group. As a result, variation in weights among members of most subgroups will increase, and it is necessary to inflate the DEFFs assumed due to unequal weighting. It is not possible to predict the precise inflation factor for each subgroup given the complication of unknown, future rates of switching, substituting, or multiple tobacco product usage. For these reasons, one inflation factor was estimated for each follow-up wave and applied to all subgroups and the estimates of cross-sectional precision for later waves and of detectable changes across several waves are presented for a reduced set of subgroups (i.e., for those for which the estimates are expected to be fairly robust to the assumptions made). As a consequence, the estimates herein should be interpreted with caution.


Table 3a. Baseline adult sample sizes, relative standard errors (RSEs) and minimum detectable absolute differences (MDADs) at Wave 2 under assumption of 70 percent screener response rate and 72 percent baseline interview retention/response rate*


Group

Baseline sample size

RSE on 15% item

MDAD on 10% item

All adults

42,730

2.6

0.4

Current users

19,893

2.6

0.5

Current or experimental users

24,500

2.5

0.4

Experimental and potential users

11,853

3.0

0.5

Menthol smokers

5,769

3.8

0.7

Dual (smokers and smokeless tobacco users)

2,969

4.9

0.8

Daily users

15,914

2.8

0.5

Less-than-daily users

3,978

4.3

0.8

Current non-users

10,984

3.1

0.5

Adults ages 18-24

10,709

3.1

0.8

Current users

4,739

4.0

1.0

Current or experimental users

6,784

3.5

0.9

Experimental and potential users

3,911

4.3

1.1

Menthol smokers

1,612

6.2

1.6

Dual (smokers and smokeless tobacco users)

1,164

7.2

1.9

Daily users

3,413

4.5

1.2

Less-than-daily users

1,327

6.8

1.7

Current non-users

2,059

5.6

1.4

Black/African American adults

6,000

4.1

0.7

Current users

2,779

5.0

0.9

Current or experimental users

3,683

4.5

0.8

Experimental and potential users

1,406

5.8

1.2

Menthol smokers

2,029

5.8

1.0

Dual (smokers and smokeless tobacco users)

415

12.2

2.1

Daily users

2,056

5.7

1.0

Less-than-daily users

723

9.3

1.6

Current non-users

1,816

6.1

1.1

*As indicated in Table 2, the 72 percent retention/response rate is the percentage of adults retained at the second phase of sampling and completing the full extended interview.


Under the three definitions of “tobacco use” discussed in Section B.1b, the large sample of adult tobacco users expected in the baseline wave will allow analyses for many user subgroups as well as for persons who are considered at risk for becoming tobacco users. Table 3a highlights some subgroups of particular analytic interest by breaking out sample sizes and measures of precision and power for tobacco users, menthol smokers, users of both smoked and smokeless tobacco, and daily/non-daily tobacco users; the same statistics are shown for each of these subgroups among young adults (18 to 24 years of age) and among African Americans. The subgroup sample sizes for daily and less-than-daily users and for menthol smokers were estimated by multiplying projected sample sizes for all users by percentages estimated from the (averaged) 2001–2007 CPS-TUSs. Given how smokeless tobacco consumption has been increasing in recent years, dual use of cigarettes and smokeless tobacco were estimated amongst all users based on data from the 2010 NSDUH. The “current user” definition from Section B.1b was applied in estimating sample sizes for menthol smokers, dual and daily user, and less-than-daily user groups; the “wide net” definition was used to estimate the sample sizes for nonuser groups and the experimental and potential users; the “current or experimental user” definition was used to estimate sample sizes for current or experimental user groups. These are the definitions that give the smallest sample size, and hence the largest RSEs and MDADs, for each of these groups. The estimated RSEs and MDADs if another definition of tobacco user is employed will be smaller than those displayed in the tables. For both RSEs and the MDADs, smaller is better. The RSEs for a 15 percent prevalence rate are below 7 percent for most subgroups and at or below 5 percent for more than half of them. The MDADs for a 10 percent baseline prevalence rate are almost all below 2 percentage points and a one-year change of 1 percentage point or less can be reliably detected for more than half of the subgroups.


Table 3b suggests that the RSEs for a 15 percent prevalence rate at Wave 4 will be only slightly larger than at baseline. This is as expected because the number of households in the PATH Study was chosen to produce a youth sample of sufficient size to replenish the adult sample in future waves. A larger difference is seen when comparing the one-year versus three-year MDADs for a 10 percent baseline prevalence rate; this is due to the increased DEFFs and reduced correlations between samples by Wave 4.

Table 3b. Adult sample sizes, relative standard errors (RSEs) and minimum detectable absolute differences (MDADs) at Wave 4 under assumption of 70 percent screener response rate and 72 percent baseline interview retention/response rate*


Group

Wave 4 sample size

RSE on 15% item

MDAD on 10% item

All adults

43,064

2.6

0.6

Current users

20,048

2.8

0.6

Menthol smokers

5,845

4.0

0.9

Dual (smokers and smokeless tobacco users)

3,009

5.2

1.1

Daily users

16,125

2.9

0.7

Less-than-daily users

4,031

4.6

1.0

Current non-users

11,070

3.2

0.7

Adults ages 18-24

12,347

3.2

1.2

*As indicated in Table 2, the 72 percent retention/response rate is the percentage of adults retained at the second phase of sampling and completing the full extended interview.


The adult sample sizes considered in this section are based on completed interviews and therefore the estimates of precision and power apply to tobacco and health outcomes collected via the interview instrument. However, another important component of the PATH Study is the collection and analysis of biospecimens from those who consent to provide them. Blood specimens, for example, may be analyzed to detect markers of risk for tobacco-related disease. Table 2 in Section B.1f shows that 27,775 adults are expected to provide a blood sample at baseline. If one assumes that 76 percent of these adults will still be participating in the study at Wave 4 and will provide a blood sample if requested, a change in risk of about 0.42 percentage points could be detected with 80 percent power. As with all the estimates presented in this section, precision and power for finer divisions of the subgroups (e.g., by gender) are expected to be reduced. (Attachment 22 provides the power calculations for the projected and worst-case scenarios of response rates).


As stated above, the initial sample of 16,186 youth at baseline is necessary both to replenish the adult cohort in future waves of the PATH Study and to provide sufficient power for analyses of youth subgroups. Table 4a shows sample sizes and measures of precision and power for the youth sample overall and by subgroups of possible interest: tobacco users, smokers, menthol smokers, “experimenters”, never smokers, susceptible never smokers, and never users of tobacco; the same statistics are shown for each of these subgroups among 12 to 13 year-olds and among 14 to 17 year-olds. Subgroup sample sizes were estimated using data from the 2009 National Youth Tobacco Survey (NYTS). Experimenters were defined as youth who have ever smoked any cigarette, even one or two puffs, but fewer than 100 cigarettes. Susceptibility to initiate cigarette smoking among never smokers was defined as providing any response other than "no" to the question, "Do you think that you will try a cigarette soon?" and any response other than "definitely not" to the questions, "Do you think you will smoke a cigarette anytime during the next year?" and "If one of your best friends offered you a cigarette, would you smoke it?"


Table 4a. Baseline youth sample sizes, relative standard errors (RSEs), and minimum detectable absolute differences (MDADs) at Wave 2 under assumption of 70 percent screener response rate and 75 percent baseline interview response rate*


Group

Baseline sample size

RSE on
15% item

MDAD on 10% item

All youth

16,186

2.7

0.7

Current users

3,034

4.7

1.2

Current smokers

2,113

5.5

1.4

Menthol smokers

1,014

7.7

2.0

Experimenters

4,867

3.9

1.0

Never smokers

10,124

3.1

0.8

Susceptible never smokers

2,531

5.1

1.3

Never users

9,423

3.2

0.8

Youth ages 12 to 13

5,319

3.8

1.4

Current users

426

11.7

4.4

Current smokers

266

14.7

5.6

Menthol smokers

128

21.1

8.0

Experimenters

1,064

7.5

2.9

Never smokers

4,255

4.1

1.6

Susceptible never smokers

1,064

7.5

2.9

Never users

3,989

4.3

1.6

Youth ages 14 to 17

10,867

3.0

0.9

Current users

2,608

5.1

1.5

Current smokers

1,847

5.9

1.7

Menthol smokers

887

8.2

2.4

Experimenters

3,804

4.3

1.3

Never smokers

5,868

3.7

1.1

Susceptible never smokers

1,467

6.5

1.9

Never users

5,434

3.8

1.1

*As indicated in Table 2, the 75 percent response rate is the percentage of youth completing the extended interview.


Overall, there are large samples in many of the subgroups of interest. For example, there are approximately 9,423 never users for whom tobacco use initiation rates will be tracked. Tobacco cessation is more of an issue in the older adolescent group (ages 14 to 17) and there are about 2,608 tobacco users and 1,847 cigarette smokers whose quitting behavior over time will be monitored. The smallest subgroup summarized that may be of interest is menthol smokers. If some regulatory action relating to menthol cigarettes were to be taken, these youth might respond by quitting, switching brands, or switching to other forms of tobacco use. There are 887 such participants in the 14 to 17 age range, which provides statistical power to examine all but the rarest outcomes. The RSEs for a 15 percent prevalence rate are below 8 percent for most subgroups and at or below 5 percent for more than half of them. Among all youth 12 to 17 years of age, the sample size overall and in each of the subgroups except menthol smokers is sufficient to reliably detect a one-year change of 1.5 percentage points in a 10 percent baseline behavior overall. This is a critically important threshold because measures of quitting, initiation, and non-cigarette tobacco use tend to be in this 10 percent range (depending on the definitions used) and a statistically significant increase of one and a half percent would be of policy interest. However, Table 4b highlights the importance of recruiting a large sample of youths at baseline. For detecting change in a 10 percent baseline behavior across three years, the MDADs among all 12 to 17 year-olds are closer to 2 percentage points for several subgroups. Again, this is due to the increased DEFFs and reduced correlations between samples by Wave 4.


Table 4b. Youth sample sizes, relative standard errors (RSEs), and minimum detectable absolute differences (MDADs) at Wave 4 under assumption of 70 percent screener response rate and 75 percent baseline interview response rate*


Group

Wave 4 sample size

RSE on
15% item

MDAD on 10% item

All youth

14,000

3.1

1.1

Current users

2,624

5.3

1.9

Current smokers

1,828

6.2

2.2

Menthol smokers

877

8.7

3.1

Experimenters

4,210

4.4

1.6

Never smokers

8,756

3.5

1.2

Susceptible never smokers

2,189

5.8

2.1

Never users

8,150

3.6

1.3

Youth ages 12 to 13

4,964

4.1

2.0

Current users

397

12.7

6.1

Current smokers

248

15.9

7.6

Menthol smokers

119

22.9

11.0

Experimenters

993

8.2

3.9

Never smokers

3,971

4.5

2.2

Susceptible never smokers

993

8.2

3.9

Never users

3,723

4.6

2.2

Youth ages 14 to 17

9,036

3.5

1.4

Current users

2,169

5.8

2.4

Current smokers

1,536

6.7

2.8

Menthol smokers

737

9.4

3.9

Experimenters

3,163

5.0

2.1

Never smokers

4,880

4.2

1.8

Susceptible never smokers

1,220

7.5

3.1

Never users

4,518

4.4

1.8

*As indicated in Table 2, the 75 percent response rate is the percentage of youth completing the extended interview.



B.3 Methods to Maximize Response Rates and Deal with Nonresponse

The emphasis for the baseline wave will be on maximizing the participation of selected households and selected persons in the PATH Study. For annual follow-up waves, the focus will be on maintaining contact with respondents and maximizing their retention in the study. OMB generally requires nonresponse analysis when extended response rates fall below 80 percent. Furthermore, in the case of this study, if screener response rates are below 70 percent or subsequent participation rates and/or completion of the extended interview are below 70 percent of those who agree to participate, NIDA and FDA will submit a report to ASPE and a subsequent nonsubstantive change request to OMB that includes: (1) the results of that nonresponse analysis; (2) the planned statistical analysis approach; and (3) the implications of the response rates and non-response bias for the types of conclusions that can be drawn from this study.



Baseline Wave

As a large longitudinal cohort study, the PATH Study expects some attrition among study participants to occur, whether from loss to follow-up or to refusal to participate. However, methods to maximize response rates will be implemented by the PATH Study in advance of baseline data collection and throughout each of its follow-up waves.


The PATH Study will have a team of experienced field interviewers and field supervisors sufficient in size to work all cases thoroughly. These field staff will be strategically located within or in close proximity to PSUs, which will expedite visits to the sample dwelling units and will also ensure that they are familiar with the communities within which the cases are located. Field interviewers will also be thoroughly trained in gaining respondent cooperation through refusal aversion and conversion. Field management will ensure that data collection efforts are thoroughly planned down to the field interviewer level; for example, production goals will be developed that will set a pace for individual field interviewers, field supervisor teams, and the nation as a whole.


Several tools and approaches to address nonresponse and maximize response rates will be used, in addition to the respondent incentives described in Section A.9. First, the interviews will be conducted in English and Spanish; all of the instruments will be translated into Spanish and bilingual field staff will administer them. Second, extensive respondent materials will be developed to encourage participation, also translated into Spanish. These materials will include an advance letter to inform selected households of the study prior to in-person contact (Attachment 12). The advance letter will contain assurances of privacy, describe the voluntary nature of the PATH Study and principal purposes and uses of its data, emphasize the importance of the study, and underline the study’s interest in including tobacco users and non-users. A PATH Study website and respondent telephone call line to answer respondents’ questions and to reassure them of the credibility of the study will be established. Tailored letters will be developed for use with reluctant respondents/sample persons and with selected units located in limited-access situations (doorperson buildings, gated communities, etc.), which may be sent via FedEx or priority mail to reinforce the perceived importance of participation. (See Attachment 20 for an example of a refusal letter.)


In addition, the PATH Study will maximize the biospecimen response rates for the baseline wave by implementing the following approaches.


  1. Ensure that Interviewers are “On Board.” The PATH Study will hire and train interviewers who understand the importance of collecting biospecimens as part of this research effort. Early in the selection process, candidates will be required to view a short video that highlights this requirement and the importance of being comfortable with carrying it out.

  2. Phase the Consent for Biospecimens. The PATH Study will present information to respondents in phases to help minimize the amount of information to be simultaneously considered before consenting. This approach includes providing information about the interview immediately prior to obtaining consent for the interview; providing information on biospecimen collection before obtaining consent for biospecimen collection, etc. Moreover, because biospecimen collection follows completion of the interview, this approach also allows additional time for the development of rapport, trust, and comfort between the interviewer and the respondent, which will positively influence consent to provide the biospecimens.

  3. Present the Biospecimens in a Positive Light. Based on an effective approach used by the National Health and Nutrition Examination Surveys (NHANES), the PATH Study will include nicely-formatted consent pamphlets with messages that emphasize the importance of the respondent’s contributions of biospecimens to the study’s scientific success.

  4. Enhance Training of Interviewers. The PATH Study will provide extensive interviewer training on collecting biospecimens, including home-study training and practice in requesting consent and averting refusals. With classroom and home-study training and additional practice sessions, interviewers will be able to gain proficiency and comfort with the study protocol, including obtaining consent, averting refusals, and collecting biospecimens.

  5. Equip Interviewers with Refusal Conversion Tool. The PATH Study will use computer-assisted personal interviewing (CAPI) screens that, in real time, point interviewers to tailored responses to types of reasons respondents give for biospecimen refusals. Having these available at the moment they are needed can improve the interviewer’s ability to quickly allay respondent concerns about providing biospecimens.

  6. Streamline Biospecimen-Collection Procedures. The PATH Study will collect both urine and buccal cell samples at the time of the interview. This approach will simplify the procedures and decision-making for the visit for interviewers and respondents alike. Rapport that develops during this first visit between the interviewer and respondent may also have a positive influence on the respondent’s willingness to provide the biospecimens.

  7. Enhance Quality Control. The PATH Study quality-control approaches include closely monitoring interviewer-by-interviewer consent and collection rates for biospecimens, and providing rapid feedback to interviewers and refresher training to maximize performance.

A web-based Supervisor Management System (SMS) will allow field supervisors to closely monitor each field interviewer’s work, which facilitates the development of strategies to address nonresponse. These strategies will include reassigning difficult or reluctant cases among local field interviewers and the use of specially trained, traveling field interviewers who are highly skilled in refusal conversion.


Data collection efforts will also follow a phased approach that anticipates refusal conversion efforts. In this approach, new samples of households will be released to field interviewers approximately every 2 to 3 months. Therefore, it will not be necessary to close out cases from an earlier period in the same data wave before releasing a new sample, thus allowing additional time to complete challenging cases. Further, the number of new cases assigned to interviewers is expected to be lowest during later periods in the data wave, thereby ensuring interviewers will have additional time in those periods to complete open cases remaining from an earlier period. Front-loading the sample release in this manner allows field interviewers the opportunity to implement the full contact strategy, including nonresponse conversion as needed.


Adjustments will be performed as necessary for non-interviews that cannot be converted using the procedures described in Section B.2. The specific procedure selected will ensure the accuracy of resulting estimators and the suitability of the compensated data set for addressing the major objectives of the study.


The baseline response rate (including the household screener and extended interview response rates) is estimated to be 60 percent for adults and 55 percent for youth. (See Section B.1 for a discussion of these estimated response rates, as well as worst case scenarios based on field test results.) Response rates for the baseline extended interview are expected to be 85 percent for adults and 75 percent for youth; these will be calculated as the number of respondents divided by the number of eligible sample persons. Ineligible persons include persons under the age of 12 years; respondents whose mental and/or physical impairment preclude participation in the PATH Study; military personnel on active duty; and persons who are unable to conduct their interviews in English or Spanish.



Follow-up Waves

In the follow-up waves, the PATH Study team will seek to maintain respondent engagement as well as track respondents, so they can be contacted for follow-up data and biospecimen collection. In terms of maintaining engagement, many of the same activities conducted at the baseline wave will be completed, potentially by using one or more of the following:


  • Visit respondents who have moved up to 100 miles from a study PSU,

  • Offer a web-based version of the interview for movers who are located more than 100 miles from any PSU, and

  • Collect biospecimens (urine only) from movers via kits that can be mailed to the movers and returned to the biorepository.

With regard to locating respondents after the baseline wave, PATH Study staff will conduct ongoing tracking of study respondents--so they can be contacted for follow-up data and biospecimen collection--and tracing of those who become lost at follow-up. Management of tracking/tracing activities will be supported through the home office centralized Home Management System (HMS). This component of the study management system will house the database of contact information and provide for real-time access in the field and at the home office to the most current information available. Field and home office staff involved in tracking/tracing will provide updates, and supervisors will generate reports for monitoring purposes and to determine next steps.


Using the centralized HMS as a tool, PATH Study staff will implement the following routine tracking steps to minimize the number of cases requiring intensive tracing.


  • Collect contact information at baseline for tracing references. At baseline, respondents will be asked for the names, addresses, and telephone numbers of two people to serve as tracing references who will always know how to reach the respondent and do not live in the same household. Given that a sizeable percentage of respondents will be young adults, respondents also will be asked for information not traditionally requested (e.g., names of colleges attended) for use in social networking site searches.

  • Use interim contacts to determine if contact information has changed or if tracing is needed. Contacts by mail or email will ask respondents to report any address changes, and the study will provide a number of easy ways this can be done, including visiting the study website, calling a toll-free number, or sending updated information via mail or email. The PATH Study also will mail to respondents stamped “address correction requested,” requesting new address information for people who may have moved. In addition to supporting tracing, these interim contacts will help to maintain respondent motivation to cooperate and continued engagement with the study. PATH Study respondents will also be provided with an incentive ($5) as a thank you for updating their contact information.

  • At each in-person visit, update contact information. During household visits for each follow-up wave, the field interviewers will update contact information on the respondent as well as on relatives or persons not living in the household who are likely to know the whereabouts of the respondent.

For those respondents lost at follow-up, PATH Study staff will implement a systematic approach to tracing their whereabouts. If the current occupants of the last known address cannot guide the field interviewer to the respondent’s whereabouts, the field interviewer will carry out the first line of tracing, using the respondent’s last known telephone number(s), tracing references, directory assistance, and neighbors to locate the respondent. If unsuccessful, the case will be sent to the PATH Study home office for a second line of more intensive tracing. A small team of tracers at the home office will follow protocols to trace PATH Study respondents, using tracing resources such as the following.


  • Lexis Nexis. This database, compiled from public records, can return respondent address histories and telephone numbers. Submissions will be made at least quarterly, and the tracers will review and follow up on the results.

  • Internet searches. These searches include free and paid services. Examples of the services include online telephone directories and limited public information records.

  • In-person tracing. As the need arises and the resources are available, in-person tracing (i.e., “skip tracing”) will be used. This approach involves intensive in-person tracing at the respondent’s last known addresses and in his/her old neighborhoods to identify contact information or current whereabouts. Because it is expensive per case, in-person tracing will be used judiciously and only after more cost-effective approaches have been attempted.


B.4 Test of Procedures or Methods to be Undertaken

In preparation for the baseline wave of data and biospecimen collection, the PATH Study conducted a field test. This section presents an overview of the field test; highlights results of the field test on the performance of the study; and summarizes the nonsubstantive changes to the study protocol, procedures, instruments, and materials based on those results. In addition, this section describes plans for assessing the performance of the study during the baseline wave.


B.4a Field Test

Upon receipt of OMB approval (OMB # 0925-0664, expiration 11/30/2015), the PATH Study conducted a field test to assess the planned baseline data collection procedures and operations, as well as alternative incentive schemes and household screeners that might help reduce respondent burden and study costs. The field test report is presented in Attachment 2.


The field test was conducted from December 6, 2012 – February 17 2013. Fifteen PSUs were purposively selected to reflect the diversity of PSUs selected for the main study. For example, the field test PSUs included urban and rural PSUs, and PSUs within states that have relatively high and low tobacco use prevalence rates. Households and individual respondents were selected using the same methods planned for the main study. The field test included 1,170 household screener respondents, 480 adult interview respondents, 122 youth interview respondents, and 128 parent interview respondents. Field interviewers obtained informed consent from field test respondents (see Attachment 13).


The field test for the baseline was designed to fine-tune the data collection protocol and to inform decisions on the first-phase household screener incentive amount and length of the screener. Its objectives were to test: (1) the administration and performance of the data collection instruments; (2) biospecimen collection in a household setting (buccal cells, urine, and blood specimens were collected, packaged, shipped, and analyzed); (3) field interviewer training procedures and materials; (4) data processing and the interface between the biorepository and the prime contractor; (5) systems and security architectures; (6) alternative incentive levels for completing the household screener ($0 vs. $5 vs. $10) and incentive procedure; and (7) a short- and long-version of the household screener. Households were randomly assigned to receive the three alternative incentive levels and short vs. long household screener. The field test sample sizes were set to provide adequate power (.60 or better) to detect effects on screener response rates from the different levels of incentive and screener lengths.


To provide for a full test of the data collection procedures and operations to be used at baseline and throughout the follow-up waves, personally identifiable and contact information were collected from field test respondents but it will not be linked to respondent data or biospecimens. Similarly, the same confidentiality procedures described in Section A.10 for the baseline and follow-up waves were used in the field test.


Nonsubstantive changes based on the field test have been previously discussed with OMB and are incorporated into this supporting statement for the PATH Study baseline. Table 5 summarizes these changes and the field test findings on which they are based. OMB’s approval (OMB # 0925-0664, expiration 11/30/2015) of the field test included terms of clearance that the PATH Study is required to meet for approval to conduct the baseline wave. These are to clearly justify the structure of incentives, address OMB concerns regarding the wording for some items, and otherwise ensure that the protocol and instruments maximize the utility of the data collection, minimize burden on participants, avoid duplication with existing Federal surveys, and comply with HHS data standards.



Table 5. Summary of field test results and proposed changes to protocol and procedures for baseline wave


Field test result

Proposed change to protocol/procedures

Estimated effect on performance measure

Structure of incentives

Phase 1 screener--Results of experiment indicated only small differences in response rates to screener based on $10, $5, and no incentive.

No incentive proposed.

Not applicable.

Biospecimen collections--Urine collection rates were lower than expected.

Interviewer will always attempt to collect urine at Visit #1 (vs. field test procedure of potentially splitting collection between Visit #1 and Visit #2). Also, will enhance interviewer recruitment and training.

Increase urine collection rates and volume of urine collected (which enhances analytical power).

Biospecimen collections--Blood collection rates were lower than expected.

In addition to collecting only blood at Visit #2, offer $25 incentive for this visit (vs. $25 for blood and urine collection in field test).

Increase blood collection rates.

Table 5. Summary of field test results and proposed changes to protocol and procedures for baseline wave (contin.)


Field test result

Proposed change to protocol/procedures

Estimated effect on performance measure

Study items and materials

Phase 1 screener--Results of experiment on long vs. short form of screener identified strongest items in each form.

Blend strongest items from long and short forms.

Reduce respondent burden (compared to long version), and increase utility of data by including items that improve measurement of constructs.

Extended interviews--Results identified ways to improve individual questionnaire items. (In addition to field test results, cognitive testing results informed revisions to instruments.)

Revise instruments to improve or add questionnaire items. Changes include simplifying language, using fewer response categories for some items, and adding skip patterns to avoid less relevant items.

Increase utility of data by including items that improve measurement of constructs. Minimize burden on participants.

Study materials--Results identified ways to improve individual materials, including advance materials, consent materials, and field data collection materials.

Revise or add materials. Changes include simplifying advance materials, addressing additional reasons for refusal in conversion letters, phasing the introduction of information on study activities, and adding email messages to remind respondents about appointments or prompt responding.

Increase utility of data by enhancing response rates.

Sample design


Response rate assumptions—Results suggested original assumed response rates were too high.

Reduce assumed response rates for several collections. Increase number of addresses fielded to compensate for the lower assumed response rates.

Increase utility of data by achieving targeted number of responses.


Sampling rates within households—Results indicate tobacco use rates are higher than expected using the study criteria.

Modify within-household sampling rates, to account for the higher use rates found. This will change allocation of sample to major strata.

Increase utility of data by reducing the number of large sampling weights and capturing data on adults in some small user subgroups.


Selection of youth within households for potential future enrollment—Results indicate selecting 2 youth 9 to 11 years old per household has advantages.

Select 2 youth 9 to 11 years old per household (vs. 1 in field test).

Increase utility of data by maintaining selection probabilities as youth enter youth cohort. Effect on participant burden will be minimal for additional youth per household.


B.4b Assessment of Performance during Baseline Wave

As indicated in Section B.4a and Attachment 2, the nonsubstantive changes presented in this supporting statement are designed to improve the performance of the PATH Study during the baseline wave. The changes focus on boosting response rates achieved in the field test to meet the projections presented in Section B.1. Hence, the assessment of study performance during the baseline wave will include an examination of progress toward the projected response rates.


The PATH Study plans to assess performance during the baseline wave using results through up to the first 6 months of interview data and biospecimen collection. This plan will facilitate a timely report to OMB on the study’s performance regarding actual and projected response rates (see Table 2) for: (1) Phase 1 Screener, (2) Phase 2 Screener/Adult Extended Interview, (3) Parent Interview, (4) Youth Extended Interview, (5) cheek cell collection, (6) urine collection, and (8) blood collection. The PATH Study will base its estimates of the response rates achieved at 6 months on the finalized cases for a representative “predictor sample” of addresses, which will be a subsample (approximately 13,000 addresses) of the sample released at the launch of the baseline wave. Because 6 months is a short field period for this assessment, the predictor sample will be assigned a higher priority for data collection than the rest of the baseline wave sample. Cases in the predictor sample not finalized at 6 months will continue to be worked until the end of the baseline wave. Based on experience with other comparable studies, a significant number of additional interviews and bio-specimen collections will be completed after the 6-month reporting period, and the response rates will change accordingly. To assist with interpretation of PATH Study’s progress through up to the first 6 months, the assessment report will also provide information on the experience of similar studies at the same data collection point and will discuss the likely implications for the non-finalized cases.


Response rates will be monitored carefully over time in order to detect deviations from projected rates. Deviations from projected response rates will prompt a close examination of nonresponse bias. For example, if response rates fall short of projected response rates through up to the first 6 months after launch of the baseline wave, the PATH Study will compare responding and nonresponding dwelling units and sampled persons using available demographic and tobacco use information. For dwelling units, this information would be limited to that on the sampling strata from which addresses were selected; for the sampled persons, additional information would be available from the Phase 1 screener for the data collections and from the adult extended interview for the biospecimen collections.


In addition, the PATH Study’s interim report to OMB on the study’s performance will include results regarding actual and projected performance on other measures presented in the summary of field test results (see Attachment 2). Among others, these include consent rates, interview length, time between interview and blood collection visit, time between biospecimen collection and processing, number of aliquots obtained from collected biospecimens, housing unit eligibility rate, distribution of enumerated adults in race/age/tobacco use subgroups, tobacco use misclassification, and frame coverage rate. The projected rates or numbers for many of these measures are included in Supporting Statement A or B. Again, this interim assessment will focus on the study’s progress in up to the first 6 months post-launch; for comparison purposes, results will be discussed relative to the experience of large, national health studies at the same time point.




B.5 Individuals Consulted on Statistical Aspects and Individuals Collecting and/or Analyzing Data

A list of individuals who consulted on statistical aspects of the PATH Study design and will collect and/or analyze the PATH Study data is included in Attachment 23.


References

Dohrmann, S., Han, D., and Mohadjer, L. (2007). Improving coverage of residential address lists in multistage area samples. Proceedings of the Joint Statistical Meetings [CD-ROM], 3219–3226. Alexandria, VA: American Statistical Association.

Iannacchione, V., Staab, J., and Redden, D. (2003). Evaluating the use of residential mailing lists in a metropolitan household survey. Public Opinion Quarterly, 67(2): 202–210.

Kalton, G. (1986). Handling wave nonresponse in panel surveys. Journal of Official Statistics, 2: 303–314.

Montaquila, J.M., Hsu, V., Brick, J.M., English, N., and O’Muircheartaigh, C. (2009). A comparative evaluation of traditional listing vs. address-based sampling frames: Matching with field investigation of discrepancies. Proceedings of the Joint Statistical Meetings [CD-ROM], 4855–4862. Alexandria, VA: American Statistical Association.

O’Muircheartaigh, C., Eckman, S., and Weiss, C. (2003). Traditional and enhanced field listing for probability sampling. Proceedings of the Joint Statistical Meetings [CD-ROM], 2563–2567. Alexandria, VA: American Statistical Association.

Rizzo, L., Kalton, G., and Brick, J.M. (1996). A comparison of some weighting adjustment methods for panel nonresponse. Survey Methodology, 22(1): 43–53.



1 The definition of tobacco use in the Current Population Survey-Tobacco Use Supplement (CPS-TUS) encompasses items (1) and (2) of the “current user” definition, but not item (3). Note that, although the “current user” definition is considered to be scientifically more appropriate for most of the analyses of the PATH data, analysts wishing to employ the CPS-TUS definition in their analyses will have the data available to do so.

2 Questions in the PATH Study’s instruments that collect data on race or ethnicity will be consistent with the most recent revision of the OMB Statistical Policy Directive No. 15, Race and Ethnic Standards for Federal Statistics and Administrative Reporting. However, the term “Black/AA” as used here refers to anyone who chooses African American or Black as a race category (irrespective of whether one or more race categories are chosen and irrespective of their reported ethnicity).

3 Using the 2011 American Community Survey Public Use Microdata Files, fewer than 2 percent of households are estimated to be linguistically isolated (i.e., no one over age 14 speaks English very well) with no one speaking Spanish.


PATH Study Supporting Statement B

4

File Typeapplication/msword
File Modified2013-08-14
File Created2013-08-14

© 2024 OMB.report | Privacy Policy