1205-0453 Supporting Statement Part B 7.01.2015

1205-0453 Supporting Statement Part B 7.01.2015.docx

National Agricultural Workers Survey

OMB: 1205-0453

Document [docx]
Download: docx | pdf

[1205-0453: The National Agricultural Workers Survey, Part B]





B. Collection of Information Employing Statistical Methods


The objective of the National Agricultural Workers Survey (NAWS) is to provide descriptive statistics of the characteristics of crop workers using a statistical methodology designed to address the difficulties of surveying a mobile and seasonal population often living in non-standard and sometimes hidden housing. In addition, the NAWS is designed to address the information needs of various Federal agencies that oversee farm worker programs. These stakeholders include agencies concerned with occupational injury and health surveillance, Migrant and Seasonal Head Start, and Migrant Health. Another purpose of the NAWS is to produce accurate regional estimates of the share of farm workers who are eligible for training and employment services through the Employment and Training Administration’s (ETA) National Farmworker Jobs Program (NFJP).


1. Respondent Universe and Samples


a) Respondent Universe


The universe for the NAWS is the population of workers active in crop agriculture in the continental United States. Since the NAWS samples farm workers at the worksite, the definition of the respondent universe involves the definitions of both an eligible employer and an eligible worker.


The universe of eligible employers includes all employers in North American Industrial Classification System (NAICS) codes 111 and 1151. NAICS code 111 is Crop Agriculture, which includes employers hiring workers on farms and ranches or in greenhouses. NAICS code 1151 is Support Activities for Crop Production, which includes employers such as farm labor contractors, custom harvesters, and crop dusting companies who contract with agricultural producers to supply support services and hire workers to carry out these contracts at farms, ranches and greenhouses. Eligible employers must have workers who are actively engaged in crop production.


Eligible workers must be employed by an eligible employer and have worked for that employer for at least four hours on a single day in the prior 15 days. For an employer who has a packing operation, the workers in the packing operation are eligible if the canning or packing plant is adjacent to or located on a farm and at least 50 percent of the produce being packed or canned originated from the farm of the contacted grower.


The following criteria define ineligible employees at an otherwise eligible employer. The employee is ineligible if he/she:


  • Was interviewed in the NAWS within the last 12 months in the same location.

  • Holds an H-2A visa.

  • Has not worked for the contacted employer for four hours or more on at least one day in the last 15 days.

  • Does “non-farm work” for the employer (e.g., mechanic, sales, office).

  • Is a family member of the employer and does not receive a paycheck like other farm workers.

  • Is the employer (grower or contractor).

  • Is a sharecropper that makes all operational decisions such as when, where and how to plant or harvest.

  • Works for a landscaping company that only sells, installs, maintains or preserves trees or plants. This includes the planting of ornamental plants and placement of sod.


b) Samples


The NAWS will use a complex sampling design that includes both stratification and clustering. The sampling unit is an eligible crop worker. Multi-stage sampling will be implemented to randomly select and interview approximately 3,369 crop workers. Since migrant and seasonal farm workers are a mobile and time-sensitive population, the sample will utilize strata that incorporate seasonality and agricultural region. Within each stratum, there is clustering. The primary sampling unit is the Farm Labor Area (county cluster). Samples of employers and of workers within employers are also selected. This section describes the stratification, primary sampling units, and employer and worker universe size and samples for the survey (see Table 1). Section 2 provides more details on the statistical methods used in sampling.

Table 1: NAWS Stratification and Sampling Units


Entity

Universe

Sample

Cycle

Agricultural Region

3

12

3

12

Farm Labor Area

497

105

Crop Employer

422,000*

702

Hired Crop Worker

1,600,000*

3,369

*Estimate


Stratification

To account for seasonal and geographic variation in farm employment, the year is divided into three four-month cycles: October through January, February through May, and June through September. The NAWS sampling will use 12 distinct agricultural regions, which are based on the USDA’s 17 regions. Table 2 shows the correspondence between USDA and NAWS regions. At the start of the survey in 1988, the 17 USDA regions were collapsed into 12 NAWS regions by combining smaller regions that were similar (e.g., Mountain I and Mountain II) based on statistical analysis of cropping patterns. This reduced the number of regions and increased the size of the smallest strata.


Table 2: Correspondence between NAWS and USDA Regions

NAWS Sampling Region

USDA Region Code & Name

States in USDA Region

AP12

AP1   

Appalachian I

NC, VA

AP2   

Appalachian II

KY, TN, WV

CBNP

CB1   

Corn Belt I

IL, IN, OH

CB2   

Corn Belt II

IA, MO

NP    

Northern Plains

KS, NE, ND, SD

CA

CA    

California

CA

DLSE

DL    

Delta

AR, LA, MS

SE    

Southeast I

AL, GA, SC

 FL

FL    

Florida

FL

LK

LK    

Lake

MI, MN, WI

MN12

MN1   

Mountain I

ID, MT, WY

MN2   

Mountain II     

CO, NV, UT

MN3

MN3   

Mountain III

AZ, NM

NE1

NE1   

Northeast I

CT, ME, MA, NH, NY, RI, VT

NE2

NE2   

Northeast II

DE, MD, NJ, PA

PC

PC    

Pacific

OR, WA

SP

SP    

Southern Plains

OK, TX


The annual sample will include all 12 regions for each of the three cycles. The USDA’s Farm Labor Survey (FLS) provides the size of each region-cycle stratum. The NAWS region definitions and the FLS regions are congruent. USDA reports FLS data for each quarter. NAWS will prorate the quarterly data into the three cycles. The population of each region-cycle stratum is as follows:









Table 3. Estimated Crop Labor Force Population (1000s) by Region-Cycle Strata

Region

Fall

Winter/ Spring

Summer

NE1

22.0

13.9

35.2

NE2

23.3

19.0

49.0

AP12

51.8

35.7

57.6

FL

40.5

50.7

34.0

DLSE

43.3

36.4

55.4

LK

48.8

27.3

41.7

CBNP

87.0

70.2

98.1

SP

29.2

31.4

34.6

MN12

23.8

19.8

32.9

MN3

18.1

17.3

21.1

PC

77.7

43.2

116.4

CA

255.1

190.5

261.6

TOTALS

720.6

555.0

837.6

Table derived from USDA FLS numbers for 2011-2012


Primary Sampling Unit

The next level of sampling is to select the Farm Labor Areas (FLAs), which are multi-county units that form the primary sampling units, within the cycle-region strata. FLAs consist of either large single counties or, more often, aggregates of counties that form primary sampling units with similar farm labor usage patterns and that are roughly similar in labor force size. The FLAs were originally developed for the FY 1993 NAWS interviewing year using 1987 Census of Agriculture (CoA) data on hired and contract farm labor expenditures as the size data, plus ETA mappings of seasonal farm labor concentrations for potential location grouping. The ETA maps were prepared for field offices advising migrant farm workers on locating seasonal employment. This procedure resulted in 497 FLAs. The FLA definitions are reviewed every five years when new CoA data become available. The reason for developing the FLAs was to have similar- sized primary sampling units within stratum. Each of the approximately 3,000 counties in the continental United States is assigned to one of the 497 FLAs.


In each of the three cycles per year, NAWS staff will select a sample of FLAs in each of the 12 regions using methods described in Section 2c below. For the FY 2016 data collection, visits to 105 FLAs are planned. The actual number of unique FLAs may be lower, as the sampling plan may include a single FLA in more than one cycle during the year.


Employer and Worker Samples


The universe of crop employers is estimated to be 422,000. This estimate is derived by adding the number of agricultural employers from the 2012 CoA that directly hired crop workers (410,994) with the maximum number of agricultural support firms (farm labor contractors and custom harvesters) across the four quarters of the 2012 Quarterly Census of Employment and Wages (10,654) and rounding the sum to the nearest 1,000. The sample of crop employers is estimated to be 702. This estimate is derived by dividing the target farm worker sample size (3,369) by the average number of workers interviewed per farm in FY 2014 (4.8).


Although there isn’t a census of farm workers, the universe of the crop farm labor force can be estimated using labor expenditure data from the CoA and the FLS, and hours per worker data from a combination of FLS and NAWS data. Based on 2012 data from these sources, there are an estimated of 1.6 million hired crop farm workers, including workers who are brought to farms by labor intermediaries. As described below, the proposed sample size for FY 2016 is 3,369 farm workers.


Sample size


The NAWS faces several challenges in defining an optimal design. First, the NAWS has a complex sampling design that includes both stratification and clustering. Second, the NAWS uses administrative data that comes in clumps that are not optimal for some variables of interest. Third, surveys are generally optimized for one variable of interest and the NAWS collects data on over 250 variables of interest.


For complex survey designs, the design effect measures the efficiency of a sampling design that deviates from simple random sampling. The design effect for a variable of interest is defined as the ratio of the variance calculated under the complex design divided by the variance calculated under the assumption of a simple random sample. Design effects greater than one mean that the complex survey design delivers higher variance than a simple random sample. While it is possible to achieve a design effect of less than one, many factors can make a design less efficient. Lower design effects are generally achieved when strata are homogeneous and clusters are heterogeneous.

We calculated design effects for demographic and employment characteristics using the data collected for the NAWS in FY 2011-2012. NAWS design effects varied from less than one to 10, depending on the variable. Generally, higher design effects are associated with both heterogeneous strata and relatively homogeneous clusters. This usually happens for variables that tend to be relatively similar within farms and FLAs, but heterogeneous in some regions, particularly in the East and Midwest. Examples of such variables include work force characteristics, place of birth, and visa status.

The NAWS design is more efficient for collecting information on worker’s household composition, key variables for estimating service program size, and immigration policy impacts. NAWS design effects for these variables usually range from less than one to three, with a more typical range of two to three. These types of variables have more heterogeneity within clusters and are more homogeneous across strata.


Originally, the NAWS design was driven by a single variable, the exit rate of newly legalized Special Agricultural Workers. Currently, the NAWS collects data for a variety of Federal agencies and Federal farm worker programs, two of which – NFJP and Migrant and Seasonal Head Start (MSHS) – draw on specific subpopulations. The NAWS collects data that are used in the NFJP funding formula, and also collects congressionally mandated information on barriers to participating in MSHS. In FY 2016, the NAWS will field a questionnaire supplement to collect information on farm worker education and training to provide DOL with a greater understanding of this service population.


The FY 2016 sampling calculations examine the sample sizes needed to report characteristics of farm workers who fall into four subpopulations:

  • The MSHS-eligible subpopulation, which consists of farm workers who have children under the age of six and family income below the Federal poverty level;

  • The NFJP subpopulation, which consists of poor farm workers who have work authorization;

  • The subpopulation of workers who have taken or are taking adult education or training classes;

  • The subpopulation of workers who have taken English or ESL classes, because these are the most common types of courses taken by farm workers and there is a desire for more information on this group.

For each of these subpopulations, the first step in estimating the desired sample size was to calculate the size of the simple random sample (SRS) needed to produce estimates with the desired precisions (see Table 4) using the formula:

n= z2(p(1-p)/e2,

where

n= desired sample size under SRS,

z= z statistic for desired confidence interval (1.96 for 95%),

p= proportion being estimated and

e=desired half width of the confidence interval.


The proportion was set to 0.50 because that is the proportion that requires the highest sample size to achieve a confidence interval with a desired precision. At that sample size, all other proportions will meet the desired precision.


Table 4, below, shows the additional calculations needed to adjust the subpopulation sample sizes to account for the NAWS’ complex sampling design. First, the desired sample size for the subpopulation is calculated by multiplying the SRS sample size by the design effect for the subpopulation. The design effects are calculated from several years of data and are fairly stable. The desired NAWS sample size then takes into account the expected proportion of the subpopulation within the NAWS sample. This number is based on previous NAWS data. The final number in the calculation is the number of years of data to be combined for reporting purposes. This number reflects the agency’s desired reporting frequency, which varies by subpopulation.


The desired sample size can then be calculated using the following formula


Desired Sample size =n*deff/(S*Y)

where

n= S=the proportion of the subpopulation in the NAWS sample, and

Y=the number of years of data to be combined for reporting.


For example, in Table 4 below, the SRS sample desired for children age six and below is 384. With a design effect of 2.6, the subpopulation size within the NAWS sample (S) is 10 percent and the reporting frequency (Y) is every three years. Using the formula above, the result is a desired annual NAWS sample size of 3,298.


Table 4 shows that the overall design-adjusted sample sizes needed to meet the data collection goals for these four subpopulations range from 3,141 to 3,369. The desired NAWS sample size for FY 2016 is based on the highest of these sample sizes, which is 3,369. This sample size would achieve all of the sampling objectives.

Table 4. Sample Size Calculations*


Sub population

Planned Confidence Interval (e)

Proportion being estimated (p)

Sample size for 95% confidence interval under SRS (n)

Design Effect

(deff)

Desired Sub population Sample Size within NAWS (n*deff)

Current Proportion of Sub population  in the NAWS Sample (S)

Years of Data to Combine for Reporting

(Y)

Desired Annual Sample Size  

Child under age six and below poverty

5%

50.00%

384

2.6

1,000

10%

3

3,298

Adult education or training

4%

50.00%

600

3.55

2,127

34%

2

3,141

English / ESL classes

4%

50.00%

600

3.17

1,900

19%

3

3,369

Authorized and below poverty

4%

50.00%

600

4.13

2,477

50%

4

3,143

* The values in the table do not multiply exactly due to rounding.


The desired annual NAWS sample size is distributed across the region-cycle strata proportionate to the farm worker population numbers from the FLS. See Table 5 below.


Table 5. Estimated Sample Size by Region-Cycle Strata for FY 2016*


Region

Fall

Winter/ Spring

Summer

NE1

35

22

56

NE2

37

30

77

AP12

83

57

92

FL

65

81

54

DLSE

70

58

89

LK

77

44

66

CBNP

138

112

156

SP

46

51

55

MN12

38

31

53

MN3

29

28

34

PC

124

69

185

CA

407

303

417

TOTALS

1149

887

1334

*Note that numbers include a small amount of rounding error.


2. Statistical Methods for Sample Selection


a) Overview


The goal of the NAWS sampling methodology is to select a nationally representative, random sample of crop workers. Stratified multi-stage sampling will be used to account for seasonal and regional fluctuations in the level of farm employment. There are two levels of stratification: three four-month cycles and 12 geographic regions, resulting in 36 time-by-space strata. For each cycle, within each region, NAWS staff will draw a random sample of FLAs. Within each FLA, counties are the secondary level of sampling units, ZIP Code regions are the third, agricultural employers are the fourth, and workers are the fifth.

For each cycle, the number of interviews allocated to each region is proportional to the estimated seasonal number of farm workers employed in the region. The regional allocation is distributed proportionately across the sampled FLAs. Within each FLA, interviewers will visit the sampled counties and ZIP Code regions to contact employers and select a random sample of eligible workers employed on the day of the visit.



b) Stratification


Interviewing Cycles


As mentioned above, interviews are conducted in three cycles each year with each cycle lasting four months. The reason for this is to account for agricultural seasonality. The number of interviews conducted in each cycle is proportional to the estimated number of crop workers employed during the cycle. The seasonal agricultural employment figures are based on the USDA FLS. The FLS provides quarterly employment figures for the continental United States. These quarterly figures are pro-rated into the three interview cycles.


Regions


As mentioned in Section 1, in each cycle, all 12 regions are included in the sample. The number of interviews per region is proportional to the size of the seasonal farm labor force in a region at a given time of the year. The size of the seasonal labor force in each region is derived from FLS quarterly regional data, which are pro-rated into the three cycles.


c) Sampling within Strata


Farm Labor Areas


FLAs serve two purposes. First, they reduce travel costs by providing larger groupings of crop workers in areas where crop workers are sparse, so that regional allocations can be completed efficiently. The second purpose is to produce similar-sized primary sampling units within a region by accounting for varying sizes of the counties in the region, as measured by CoA farm labor expenditures. In the East, for example, FLAs may be comprised of several adjacent counties that have low concentrations of farm labor expenditures. Areas with fewer crop workers may have more counties per FLA. In the West, a FLA may include only a single agriculture-intensive county.


Sampling FLAs will be a two-stage process. In the first step, a roster of ten FLAs will be drawn in each cycle-region stratum using probabilities proportional to the seasonal labor force for that stratum. NAWS staff will conduct systematic probability proportional to size (PPS) sampling of FLAs within regions using SAS PROC SURVEYSELECT. The size measure for the FLA seasonal labor force will be calculated using farm labor expenditure data obtained from the CoA and seasonal adjustment factors derived from the Bureau of Labor Statistics (BLS) Quarterly Census of Employment and Wages (QCEW). The seasonal adjustment factors will be made by aggregating the QCEW’s reported monthly employment figures for the months that correspond to each of the NAWS cycles (e.g., June, July, August, and September for the summer cycle). The percentage of annual employment corresponding to each cycle is the FLA’s seasonal adjustment factor. The size measure for the FLA labor force will be calculated by multiplying the FLA’s hired and contract labor expenditures from the CoA and the farm seasonal adjustment factor from the QCEW.


There is no data source containing accurate details of when and where local crop work is occurring that could provide accurate planning of the NAWS interviewing locations. The local timing and location of seasonal farm employment depends on changes in cropping patterns and weather conditions, including disasters such as a drought or flood. As a result, crop workers may not be found at the same times and locations as previous years. It is not unusual to visit a location, even after consulting local experts, and be unable to complete the interview allocation. Thus, the number of FLAs needed to complete the interview allocation within a cycle-region stratum cannot be determined in advance.


The second stage in the two-stage FLA sampling process is the progression through the roster of ten FLAs drawn in the first stage, to complete the interview allocation. The roster of ten FLAs will be randomly sorted, and the interviewers will begin in the FLA at the top of the list. They will move to additional FLAs as needed to complete the interview allocation for the region by proceeding down the list, in order.


For planning purposes, the starting number of FLA visits for the NAWS annual sample will be 105. To ensure there will be an adequate number of FLAs visited in each region, a minimum of two FLAs will be assigned in each region per cycle. Thus, 12 regions x 2 FLAs x 3 cycles = 72 FLAs. Most of the remaining FLAs will be assigned to regions proportionate to the size of the regions’ seasonal farm labor force for a particular cycle, according to the FLS size numbers. Additional FLAs will be allocated to regions where difficulty in meeting interview allocations is anticipated, usually due to seasonal factors. The starting number of FLAs selected for each cycle-region stratum is anticipated to range from two to five.


While each cycle-region stratum has its own roster of FLAs, there will likely be some overlap because a FLA can be visited more than once per year. Since the exact number of FLAs visited each cycle is unknown until after a cycle is completed, the number of unique FLAs visited in a year is unknown until the end of the year.


Counties


Similarly, it is not possible to know in advance how crop workers are distributed within a FLA and the exact number of counties needed to encounter enough crop workers to complete the FLA allocation. In most cases, interviews are completed in the first county and no additional counties are needed. However, because there is tremendous uncertainty about the number of workers in a county, additional counties may be needed to complete the FLA interview allocation. Counties will be selected one at a time, without replacement, using probabilities proportional to the size of the farm labor expenditures in a county during a given season. Seasonality is considered constant within a FLA.


The process of selecting counties will begin with a randomly sorted list of the counties within the FLA. A cumulative sum of the size of the hired and contract labor expenditures, derived from CoA data, will be constructed for this list. When selecting a county, the selection number is the product of a random number selected from the uniform distribution, multiplied by the cumulative sum. The county that includes the selection number is chosen. Table 6 illustrates an example of the algorithm used for selecting counties within FLAs.


Table 6: Example Showing Algorithm for Selecting Counties within FLAs


Counties in FLA

Hired and Contract Labor Expenditures

Cumulative Sum of Hired and Contract Labor Expenditures

A

100,000

100,000

B

300,000

400,000

C

800,000

1,200,000

D

450,000

1,650,000

E

600,000

2,250,000


Random number selected from uniform distribution

0.657

Selection number (random number * cumulative sum of hired and contract labor expenditures)

1,478,250

(0.657 * 2,250,000)

County selected

D


As shown in the example in Table 6, the cumulative sum of hired and contract labor expenditures for all counties in the FLA is 2,250,000. The random number selected from the uniform distribution is 0.657. The random number is multiplied by the cumulative sum of hired and contract labor expenditures to produce a selection number of 1,478,250. This selection number is included in the cumulative sum of county D, so county D is selected.

So as not to delay field operations, in situations where NAWS staff anticipate visiting more than one county, several counties will be selected using the selection method above. Each county will be marked and ordered by its selection number (e.g., 1st, 2nd, 3rd). Interviews will begin in the first selected county, and when the list is exhausted, interviewers will move to the next randomly selected county on the list until all the allocated interviews in the FLA have been completed. In FLAs where farm work is sparse, interviewers may need to travel to several counties to encounter sufficient workers to complete the FLA’s allocation.


ZIP Code Regions


Sampled counties are divided into ZIP Code regions, which are smaller areas based on geographic proximity and the number of employers in the area. The purpose of ZIP Code regions is to group together employers that are close in proximity to reduce the cost of driving from employer to employer within a county. Counties can be comprised of a single ZIP Code region (for example, in the case of a small county) or multiple regions (for example, when a county is large). In a county with multiple ZIP Code regions, the goal is for the ZIP Code regions to be roughly equal in size.


The process of constructing the regions begins by randomly sorting the employers in the county by Zip Code and by a random number. Beginning at the top of the list, groups of seven employers are assigned as a Zip Code Region. Some Zip Code Regions may include growers from more than one Zip Code. For example, the last three employers in one Zip Code and the first three in the next Zip Code could comprise a Zip Code Region. The final Zip Code region will be of unequal size if the number of growers in the county is not evenly divisible by seven. If there are five or six growers in the final group, it will stand alone as a Zip Code Region. If the final group is four or fewer employers, it will be combined with the previous group. Thus, the final Zip Code Region could vary in size from five to 11 growers.


When there are multiple ZIP Code regions in a county, the regions will be randomly sorted to produce a list that determines the order in which the areas will be visited. Interviewers will make three attempts to contact each agricultural employer in the first ZIP Code region on the list and then move down the list, following the random order, until the interview allocation for the FLA is filled or the county’s workforce is exhausted.


Employers


One of the challenges of the NAWS is that there is no agreed-upon universe list of crop employers. NAWS staff will compile a crop employer universe list using administrative lists, marketing lists, and an online search. The BLS provides names of agricultural employers in NAICS codes 111 and 1151 directly to the NAWS contractor per the terms of an agreement between the ETA and the BLS. Employers on the BLS list are those who pay unemployment insurance (UI) taxes. In states where UI is not mandatory for all agricultural employers, the list of employers from BLS will be supplemented with other sources.


It is not possible to know in advance which employers will be active employers at the time of sampling. While NAWS staff relies on the best available information when preparing the employer sampling lists, several factors make it difficult to get an accurate list. First, there is a great deal of turnover in agricultural enterprises and lists are easily out of date. Second, as discussed above in the ZIP Code Regions section, seasons vary from year to year and employers change cropping patterns and practices, which in turn modifies labor utilization. Third, sources of farm employers are incomplete.


The NAWS policy for developing the employer list is to be more inclusive and allow employers on to the list who may have a low probability of eligibility. NAWS staff balances bias from exclusion of potentially eligible employers against lower response rates arising from the difficulty of screening and excluding ineligible employers. The NAWS employer list inclusion procedures tend to err on the side of inclusion of possibly inactive employers.


Employers will be selected using simple random sampling, for several reasons. First, there is no reliable information on employers’ workforce size before the interviewing cycle starts. As such, using PPS to select employers is not possible. Second, simple random sampling results in selection of a greater variety of farm sizes, whereas PPS favors larger farms.


Because of uncertainty about the conditions of local seasonal farm labor, the number of eligible employers in a specific area cannot be known in advance. Interviewers will receive a randomly sorted list of all employers in the ZIP Code region (as described above). Interviewers will start with the first employer on the list to determine that employer’s eligibility for the survey. As mentioned above, interviewers will continue making three attempts to contact employers as they move down the list following the randomized order. They will do this until either they complete the allocation for the FLA or the list is exhausted.


Workers


The maximum number of interviews allocated to each employer is roughly proportional to the FLA allocation. Were the allocation to be based on employer size, the NAWS would run the chance of collecting all interviews from a single employer if a FLA allocation was small and the first participating employer had a large workforce. To ensure that interviews come from more than one employer per FLA, the following schedule is used.


If the total number of interviews allocated for the FLA is:

  • Less than 25, the maximum number of interviews allowed per employer is five.

  • 26-40, the maximum number of interviews allowed per employer is eight.

  • 41-75, the maximum number of interviews allowed per employer is ten.

  • More than 75, the maximum number of interviews allowed per employer is twelve.


If the number of workers at an employer is less than the maximum number allowed per the criteria listed above, then all workers at the employer will be interviewed.


Most of the NAWS interviews take place on farms where there is only one group of workers. On some farms, however, workers are organized into crews consisting of several workers and a supervisor. Crew size can range from a handful of workers to more than 100, but crews of 30 workers or less are most typical based on prior years’ data. When the number of crews is large, randomly selecting workers from each crew is not feasible, and can be an imposition on the farm employer. For this reason, on farms where there are multiple crews, interviewers will first select one crew. They will then select workers only from within that crew for interview.


When a crew has to be selected from multiple crews, the crew will be selected randomly. Under some field conditions, the crew selection cannot be done using simple random sampling. In these situations, the crew will be selected using a structured approach, employing a sampling rule based on factors such as proximity or scheduling. For example, the interviewer might select the crew that is next scheduled to take a lunch break. In cases where the crew cannot be selected using simple random sampling, interviewers will record the factors that determined crew selection. As in prior years of the survey, crew selection is expected to be relatively rare.


Worker selection


When the number of workers at an establishment is greater than the maximum number that may be interviewed, interviewers follow procedures that are designed to ensure the selection of a random sample of workers. A lottery system will be used, though in some cases where workers are evenly spaced and assembling them for a lottery is impractical, workers will be selected by interval. For example, the interviewer might identify a random point at which to start, and then select every third row in a field or every fourth packing table.


In the case of lottery selection, consider that lower-case n is the number of interviews allowed for an employer (e.g., 8) and upper-case N (N) is the total number of workers available for interview (e.g., 20). Interviewers place ‘n’ marked tags (8) and ‘N-n’ (20-8, or 12) unmarked tags in a pouch and shuffle them. Workers then draw a tag and those who draw the marked tags will be interviewed. A refusal is noted if someone who selected a marked tag is not interviewed, e.g. because the person walked away after getting a marked tag or stating that he/she does not wish to be interviewed. A refusal would also be noted if a marked tag is left in the bag after workers select tags.


d) Calculating the Weights


The NAWS will use sampling weights, non-response weights, and post sampling population weights.

  • Sampling weights provide each sampled worker’s probability of selection within the cycle-region stratum, including probabilities of being selected at the FLA, county, ZIP Code region, employer, and worker level.

  • Non-response weights correct sampling weights for deviations from the sampling plan, such as discrepancies in the number of interviews planned and completed in specific locations.

  • Post-sampling adjustment adjusts the weights given to each interview in order to compute unbiased population estimates from the sample data.


The data used for calculating weights will come from several sources. For the sampling weights, the number of crews and workers at the farm on the day of sampling will be collected from the employer by the interviewer as part of the sampling documentation. Employer weights calculations will use information from the employer universe list and employer response codes recorded by interviewers. The county and FLA size information will come from CoA farm labor expenditure data. Data for post-sampling weights to adjust for part-time and seasonal work will come from the NAWS questionnaire. Stratum weight data will come from the USDA FLS.


Calculations of the non-response weights at the worker and employer level will be done as part of calculating the sampling weights, as explained below; non-response weights at the cycle and region level will be calculated simultaneously with cycle and region post-sampling adjustment weights.


Sampling Weights


Each worker in the sample has a known probability of selection. Information collected at each stage of sampling is used to construct the sampling weights.


The worker probability of selection is:





where the number selected at the employer is the minimum of either the total number of workers in the crew or the FLA allocation per employer, as described on page 10.


.


It is anticipated that the numerator will be one as the interviewers are instructed to select workers from a single crew where there are multiple crews. If there is only one crew of workers at an employer, then crewprob = 1.


The number of employers selected includes all employers from the beginning of the randomly sorted list until the last employer where interviews were completed.


Calculating counprob, the county within FLA weight, is more complicated as counties are selected sequentially using probabilities proportional to size. For example, if one of the sampled counties is larger than another, then its probability of selection should be higher than the other county. If several counties are selected from a particular FLA, then the selection probability for a particular county is calculated as: (1) its probability of selection on the first draw, plus (2) the probability of its selection on the second draw, plus (3) the probability of its selection on the third draw.


For the standard method of sampling several counties with probabilities proportional to size, without replacement, closed-form formulas for the exact inclusion probabilities do not exist. However, these probabilities can be calculated exactly using multiple summations. This procedure can be implemented in SAS within PROC IML.


Suppose that the population at a particular sampling stage consists of N objects with sizes , having total size . Let be the probability that the jth item is selected on the ith draw. Then for ,

,

,

,

, and so forth.

These ith-draw probabilities each have the property that . Finally, the probability that the jth item is included in a sample of size n is . These inclusion probabilities have the property that .

The county selection probabilities can be calculated exactly using these formulas.


The calculation for flaprob, which is the probability that the FLA was selected within the region, has two components: the probability that the FLA was selected for the roster and the probability that a FLA on the roster was selected for the cycle.


For the probability that a FLA was selected from the roster, consider that:


N is the number of FLAs in the region.


s1 through sN are the sizes of the FLAs.

S is the sum of the FLA sizes, so .

n is the number of the FLAs to be selected with probabilities proportional to size.

In selecting the FLAs, they were listed in a random order. A column of cumulative FLA sizes was constructed. That is, the cumulative size at the jth FLA will be .

A random starting point, k0, was chosen between 1 and . The integers can be listed. The jth FLA will be selected if one of these integers falls between and (where is interpreted to be 0).


Without loss of generality, consider the first FLA on the randomized list. It will be selected if k0 lies between 1 and s1. Thus, its probability of selection is .

In general, the probability that the jth FLA is selected for the roster is .


The flacycprob is the probability that the FLA is selected in a specific cycle. The calculation is as follows:




Non-Response Weighting


Non-response weights adjust the probabilities and sampling weights for deviations from the sampling design. If, for example, ten interviews should have been completed at a farm but only two interviews were completed, those two interviews could be given five times the weight they would have received otherwise. Thus, each interviewee’s probability will be adjusted for deviations in the number of interviews completed at the farm. The adjusted probabilities are composite factors calculated by multiplying the worker non-response by the worker probability of being selected.


The calculation of the worker probability adjustment is as follows. The response rate for workers is:




and the adjusted worker probability is:


workprobadj = workprob * workerresprate


Non-response adjustment will also be calculated at the employer level. The region is the geographic level at which the interviews are allocated. Many of the features of the NAWS sampling were set up to overcome the lack of reliable information on seasonal employment at the county and ZIP Code level. Each cycle, the NAWS confronts glaring discrepancies in the number of predicted and actual eligible employers and workers at the FLA, county and ZIP Code region level.


Given these demonstrated data issues, non-response adjustment at the region level will be done to account for employer non-response as well as non-response within the cycle-region stratum. This is because the region is the level at which the interviews are allocated. All other allocations are derivative as the regional allocation is distributed across FLAs, counties and ZIP Code regions in a rolling manner so that non-response in one area is made up for in another FLA, in order to meet the regional allocation. Additionally, by calculating a non-response adjustment at the region level overall, size information will, generally, be based on better quality data. This is due to the availability of more recent data and the lower likelihood of the absence or suppression of data due to privacy considerations. In addition, the region is the lowest level with enough interview coverage to adjust the weights for non-response because if, for some reason, there are too few interviews in a region, the region can be combined with adjacent regions for weighting purposes.


Employer non-response adjustment at the region level also takes into account ZIP Code region where no eligible employers were found. The probability of selecting the ZIP Code region within county, county within FLA, and FLA within region includes non-responding units. Adjusting at the ZIP Code region level would result in the omission of employer non-response in ZIP Code regions where no interviews were done. Since the sampling process allows for the possibility that there might only be one ZIP Code region selected in a FLA, the region level is the preferred level where the non-response adjustment can be calculated reliably.


It is important to account for the two stages of the employer selection process. First, employers are contacted and screened to determine employer eligibility. The second phase is persuading eligible employers to allow interviewers to access and interview their workers. The potential for nonresponse exists at both stages. Interviewers may be unable to contact the employer or the employer may refuse to provide the information needed to determine employer eligibility. Eligible employers may refuse to allow access to their workers.

For the first stage in the employer selection process, we will calculate an employer screening adjustment:


,


where employers with completed eligibility screening include all contacted employers where NAWS field staff are able to determine whether the employer is eligible or ineligible.


For the second stage of employer selection, the formula for the response rate among eligible employers is:



The adjusted employer probability is:


emplprobadj = emplprob * emplscreenrate * emplresprate


Non-response adjustment at the cycle-region allocation will also be calculated. The calculation of the region non-response adjustments will be done simultaneous with the post-sampling weights to take advantage of the most recent USDA FLS data used for population weighting.


Sampling Weight


The combined probability of selection within a cycle-region,


prob = workprobadj*crewprob*emplprobadj*zipprob*counprob*

flycycprob* flarosterprob.


The individual worker sampling weight, WT k, equals the inverse of the selection probability:


WT k= 1/prob.


Post-Sampling Weights


Post-sampling weights will adjust the relative value of each interview in order for national estimates to be obtained from the sample. There are five post-sampling weights. Two of the weights adjust for unequal probabilities of selection that can only be determined after the interviews are conducted. These include the unequal probabilities of finding part-time versus full-time workers (day weight) and the unequal probabilities of finding seasonal versus year-round workers (seasonal weight). The other three weights (region, cycle, and year) adjust for the relative importance of a region’s data, a sampling cycle, and a sampling year. As discussed below, the calculation for the region weight will be done simultaneously with the region non-response adjustment weights. The cycle weight and year weight allow different cycles and sampling years to be combined for statistical analysis.


The region and cycle weights will use measures of size obtained from the USDA FLS that are reported by quarter and region. The USDA FLS is the only information source on levels of farm worker employment. The CoA, for instance, collects data annually rather than quarterly, and provides the desired statistic once every five years. By using USDA FLS figures to make the size adjustment, the NAWS can adjust the weights by stratum (cycle and region) and construct unbiased population estimates. Non-response adjustments for size, therefore, take place at the region-within-cycle level to create corrected region weights.


The NAWS sampling plan calculates sampling allocations using USDA FLS data collected in the year before the interviews. For example, fiscal year 2012 data is used to plan the NAWS 2013 sample. The weights, however, will use FLS data collected during the interview year. This corrects for any discrepancies in allocations due to projecting farm worker distributions based on past years’ data.


The Day Weight


The day weight adjusts for the probability of finding part-time versus full-time crop workers. Interviewers will conduct interviews during one to two week visits to a specific FLA. A part-time worker, who works only two or three days per week, has a lower likelihood of being encountered than a worker employed full time. The day weights reflect these different probabilities of selection.


It is assumed that a worker has an equal likelihood of being sampled on each day worked. Thus, the probability of sampling a worker is related to the number of days worked by individual workers. It is therefore possible to calculate a day weight that is simply the inverse of the number of days the worker did farm work during the week.

A respondent is always present on the day he\she was sampled. From the NAWS interview form, it can be determined how many days the respondent worked during the week. A worker who worked one day a week would have a day weight of one. A worker who worked two days per week would have a sampling probability twice that of someone working one day per week, thus a day weight of 1/2.


The day weight (DWTS) is computed as:



The days per week worked is reported by the farm worker. In prior surveys, almost all workers sampled work five or six days per week. The NAWS will not sample on Sundays; therefore, workers at establishments reporting at least six workdays per week have the maximum chance of selection and the minimum day weight of one-sixth.


The few workers who do not report a number of days worked per week will receive a default value of one-sixth, the most commonly reported value.


The Season Weight


The calculation of the worker weights is complicated by the fact that workers could, in general, be sampled several times a year. Furthermore, neither the USDA CoA nor the FLS provides figures that can be used for the annual number of crop workers. The USDA CoA reports the number of directly-hired crop workers employed on each farm, but does not adjust for the fact that some workers are employed on more than one farm in the census years. In addition, CoA farm worker counts exclude employees of farm labor contractors. Similarly, the FLS is administered quarterly and reports the number of crop workers employed each quarter, so the same worker could be reported in multiple quarters. Because of this repetition of workers across seasons, it would be invalid to derive the total number of persons working in agriculture during the year by summing quarterly figures from the FLS.


As employment information is not available for every worker for each quarter of the year, the only way to avoid double-counting of crop workers is to use the 12-month retrospective work history collected in the NAWS. Specifically, predicting future-period employment is achieved by imposing the assumption that workers who report having worked in a previous season would work in the next corresponding season. For example, a worker sampled in spring 2015 who reported working the previous summer 2014 is assumed to work in the following summer 2015. For some purposes, including the calculation of year-to-year work history changes, this assumption cannot be used. For purposes such as obtaining demographic descriptions of the worker population, however, this assumption provides satisfactory estimates.

Furthermore, it is assumed that a worker has an equal likelihood of being sampled in each season worked. Thus, the probability of sampling a worker is related to the number of seasons worked by individual workers. It is therefore possible to calculate a seasonal weight that is simply the inverse of the number of seasons the respondent did farm work during the previous year.

For the purposes of the NAWS, there are only three seasons per year. An interviewee always performed farm work during the trimester he\she was sampled. From the NAWS interview, it can be determined during which of the two previous trimesters the respondent also did farm work. If the interviewee only worked during the current trimester, the season weight is 1/1 or 1.00. If the interviewee worked during the current trimester and only one of the two prior trimesters, the season weight is 1/2 or 0.50. Finally, if the interviewee worked during the current and both of the prior trimesters, the season weight is 1/3 or 0.33.


This season weight is similar to the day weight in the sense that respondents who spend more time (seasons) working in agriculture have a greater chance of being sampled. Therefore, the weighting has to be inversely proportional to the number of seasons worked in order to account for the unequal sampling probability.


The Region Weight


The region weight adjusts the relative weight of a region’s data in relation to the number of interviews completed in that region. If the number of interviews completed is smaller than the regional allocation in the sampling plan, an adjustment weight greater than one will be assigned to each interview in the region, and vice versa. These adjustments ensure that the population estimates are unbiased.


The region weight will be based on USDA FLS measures of regional farm employment activity. This is the best source of information available about crop workers. The USDA FLS figures reported by region and quarter allow the weight to be sensitive to seasonal fluctuations.


Correspondence between USDA Data and the NAWS Sampling Cycles


The calculation of the region weight will rely on two pieces of information: the USDA FLS regional measures of size and the number of interviews completed in each region. The first step in the process of calculating the region weight is to apportion the USDA quarterly size figures among the three NAWS sampling cycles.


The USDA figures are reported quarterly. The NAWS sampling years, however, will cover non-overlapping 12-month periods (from September to August), which are divided into three cycles. Accordingly, it is necessary to adjust the USDA figures to fit the NAWS sampling frame by apportioning the four quarters into three cycles.


For example, the number of crop workers in the fall cycle for a region is assumed to be the total number of workers for that region in USDA Quarter 4 (October FLS data) of the current fiscal year (FYc) plus one‑third the number of workers for that region in USDA Quarter 1 (January FLS data) of the next calendar year (FYp). The formula for the winter, spring, and summer cycles is constructed similarly.


Determining the NAWS Region Grouping According to Interview Coverage


The calculation of the region weight (within cycle) is as follows for each region j ( ) in cycle i:

,

where USDAij is the USDA estimate for region j in cycle i, Xij is the sum of the sampling weights for region j in cycle i, DWTSij is the sum of farm worker day weights for region j in cycle i. Also, 1/6≤DWTSijk≤1 (where k refers to a farm worker), so that if all crop workers in region j in cycle i are working one day per week and DWTSij=1/6*Xij if all crop workers are working six days per week in region j in cycle i.


The Cycle Weight


The NAWS will combine data from the different sampling cycles (seasons) within the same sampling year in order to generate more observations for statistical analysis. In order to combine cycles, it is necessary to adjust for the number of crop workers represented in each cycle in relation to the number of interviews completed in the cycle. For instance, suppose the NAWS does not do proportional sampling as explained above, but rather interviewed the same number of people in all three cycles in the 2016 fiscal year. If the USDA reported more workers for the fall and spring/summer cycles, as compared to the winter cycle, then the interviews in the fall and spring/summer would be weighted relatively more in terms of size than the interviews conducted in the winter cycle. Accordingly, the interviews in the winter would have to be down-weighted in relation to the interviews in the other seasons (cycles) before the cycles could be combined.


The cycle weight is calculated similarly to the region weight, but at the cycle-level rather than region-level. The sum of the USDA size for a cycle is divided by the number of interviews in that cycle. The calculation of the cycle weight (or region weight within year) is as follows for each region j , cycle i in year Y:


where

and


(k refers to a farm worker) and if the farm worker worked only one cycle during the year, so that if all crop workers for region j in cycle i worked one day per week and only one cycle in the corresponding year and .


The Year Weight


The year weight allows different sampling years to be combined for statistical analysis. It follows the same rationale as the cycle weight, but at the sampling-year level. If the same number of interviews are collected in each sampling year, those interviews taking place in years with more farm work activity are weighted more heavily in the combined sample.

Sampling years cannot be combined if the interviews are not comparable in terms of agricultural representation. In an extreme case, suppose that the NAWS budget tripled for one of the sampling years, consequently tripling the number of interviews. If the two sampling years were joined without adjustment, the larger sampling year would have an unduly large effect on the results.


To avoid this, the year weight is calculated as a ratio of the total number of crop workers reported in the USDA FLS for each sampling year to the number of interviews in that sampling year. This is done on a cycle-by-cycle basis, but the intent is to even out annual allocations that do not represent similar proportions of the population. The year weight calculation (or region weight related to all years of interviews) is as follows for each region j ( ), cycle i (the sum over i, j means all crop workers, all cycles all years):

with the same notations as the preceding weights.


Obtaining the Final Weights


Once the individual weight components are calculated, final composite weights are calculated as the product of the day weight, the season weight, the region weight, the cycle weight, the year weight, and the sampling weight. The cycle and year are also factored into the composite weights when multiple cycles or sampling years are used. The composite weights are adjusted so that the sum of the weights is equal to the total number of interviews at the next higher level of stratification. These adjusted composite weights based on crop workers are then used for calculating the estimated proportion of workers with various attributes.


The individual observation weights are obtained at the farm worker level:


This is the weight within cycle; it includes an adjustment for the length of the workweek, but no seasonal adjustment.


This is the weight within a year; it includes both the length of the workweek and seasonal adjustment. This weight may be used for the analysis of one particular year of interviews.


The composite weight (PWTYCRD) is used for almost all NAWS analysis. This weight allows merging several years of analysis together. It is included in the public access dataset.


3. Statistical Reliability


a) Maximizing Response Rates


The NAWS response rate is the product of the employer response rate and the worker response rate. In FY 2014, NAWS interviewers attempted to contact 5,568 agricultural employers, of which 22 percent were eligible to participate in the survey, 35 percent were ineligible, and 43 percent could not be contacted and screened. A large number of potentially eligible employers have undetermined eligibility despite multiple contacts. The likelihood that an employer on the sampling list is eligible varies considerably. Many issues are responsible for the problems contacting and screening an employer. First, there is a lag in receiving employer information from BLS, so some information is out-of-date. Other sources of employer information vary in terms of completeness and the degree to which the list is vetted. At the same time, agricultural operations are not static and changes to crops, technology, or labor practices may affect the employment and timing of agricultural workers, and thus the establishment’s eligibility to participate in the survey. Also, there are businesses that are bought and sold, as well as those that start up, liquidate, or cease production. Even employers that have agricultural workers may not be eligible at all times of the year since agriculture is a seasonal industry and workers may only be needed for tasks at certain times of the year. Interviewers code the reasons for inability to screen as well as reasons for ineligibility for each selected grower.


In FY 2014, 54 percent of the randomly selected eligible employers (or their surrogates) who employed workers on the day they were contacted agreed to cooperate with the survey, and interviews were conducted at 48 percent of these eligible establishments. Taking into account employers for whom eligibility could not be determined, the employer response rate in FY 2014 was 27 percent using the unweighted response rate (URR) formula from the OMB Standards and Guidelines for Statistical Surveys (2006). 


Once interviewers have the employers’ agreement to cooperate with the survey, a random sample of workers is selected. In FY 2014, 96 percent of the sampled workers at eligible establishments agreed to be interviewed. The overall response rate, including employer and worker response, was 26 percent.


Previous NAWS data shows that item response rates are high. Item response rates are lowest for the three income questions. Using 2011-2012 data, family income has the lowest response rate at 92.7 percent. Non-response includes 0.6 percent that are missing, 4.1 percent who do not recall their income and 2.6 percent who had no income in the year in question, for a total of 7.3 percent of unusable responses.


The NAWS is expected to have similar unit and item response rates for FY 2016.


Employer Response


To maximize employer response, the NAWS contractor will send an advance letter to agricultural employers and provide them with a brochure explaining the survey. The letter will be signed by the survey director and will include the names of the interviewers and their contact information. For further information or questions, the letter and brochure will direct employers to contact either the survey contractor at a toll free number, or the Department of Labor’s (DOL) Contracting Officer’s Technical Representative (COTR). Employer calls will be returned quickly. In addition, the NAWS contractor will provide the COTR a list of scheduled interview trips. The list will include the counties and states where interviews will be conducted, the names of the interviewers who will be visiting the selected counties, and the dates the interviewers will be in the selected counties. The COTR will refer to the list whenever an employer calls to confirm the interviewers’ association with the survey.


Both DOL and the contractor will make presentations on the survey and will provide survey information (e.g., questionnaires) to officials and organizations that work with agricultural employers. The NAWS has received the endorsement of several employer organizations. This improves the response rate since agricultural employers sometimes call their employer organization when considering survey participation.


Before interviewers receive employer lists, office workers attempt to verify the employer’s address and contact information and search for addresses and phone numbers for employers for whom address and phone information is missing. In addition, office staff searches for physical addresses for employers with listed addresses that are post office boxes, lawyers or accountants’ offices. Results of successful searches along with information received when advance letters are returned is incorporated in the employer contact list that is distributed to interviewers.


To increase employer response, interviewers will be instructed to make at least three contact attempts at different times of the day and on different days of the week. At least one of these contacts is to be an in-person attempt at the employer address. Interviewer contact attempts will be logged and the logs will be monitored for compliance. Interviewers will be instructed to accommodate an employer’s preference for scheduling surveys and, if needed, the interviewer can request an extension of the field period.


Intensive and frequent interviewer training will also be conducted as a means to increase employer response rates. Interviewers will be trained in pitching the survey in various situations and they will be trained to understand the history, purpose, and use of the questionnaire. They will be prepared to easily answer any questions or address any concerns an employer might have. In addition, when explaining the purpose of the survey to employers, interviewers will be trained to clearly distinguish the survey from enforcement efforts by the Department of Homeland Security, DOL and other Federal agencies, and assure employers that their information will be kept private.


Worker Response

The survey’s methodology has been adapted to maximize response from this hard-to-survey population. Interviewers will pitch workers in English or Spanish, as necessary. All interviewers are bilingual. In addition, interviewers will make sure that potential respondents know that they are not associated with any enforcement agency (e.g., Immigration and Customs Enforcement). Interviewers will explain the survey to workers and obtain their informed consent verbally.


Crop workers receive a $20 honorarium to enable the survey to achieve an estimated worker response rate above 90 percent. Research indicates incentives increase response rates in social research (Ryu, Cooper, & Marans, 2006). According to the National Science Foundation, monetary incentives improve study participation and offset the costs of follow-up and recruitment of non-respondents (Zhang, 2010).



b) Addressing Non-Response


Possible worker non-response bias


Based on previous years of NAWS data collection, it is anticipated that the worker response rate will be above 80 percent. This high level of worker response exceeds the level at which a non-response bias analysis is needed.


Possible employer non-response bias.


High rates of employer non-response are a concern. It is important to determine if non-response is random or if there may be bias due to systematic differences in characteristics among respondents and non-respondents. NAWS staff conducted several analyses of sampling frame data and paradata to examine whether there was the potential for employer non-response bias due to lower-than-recommended employer response rates (see the supplemental report Summary of NAWS Non-Response Bias Studies included in this submission). The results of these analyses did not support the need for non-response bias adjustment beyond what is already incorporated in the non-response weights.


NAWS staff will assess possible non-response bias using four different analyses, as listed below. These analyses will utilize NAWS sampling frame data, paradata, 2012 Census of Agriculture data, and data from the QCEW.


  1. The first analysis will assess NAWS non-response bias by comparing information in the sampling frame on eligible respondents and non-respondents. While the sampling data is somewhat sparse for non-respondents, three pieces of information are useful: geographic location, NAICS code, and the source used to obtain employer names. The NAWS will use three sources of employer names: a) the BLS UI list, b) marketing lists, and c) internet searches and contacts with knowledgeable local individuals. Geographic area and source lists are available for all employers, while NAICS codes are available for all employers who pay UI taxes, marketing list employers, and some additional employers. Employers without a NAICS code will be analyzed as a distinct group.


Using all three variables (source, NAICS, and geography), we will make the following comparisons:

  1. Employers allowing interviews compared to sampled employers that refused or unable to be screened (i.e., excluding the ineligible),

  2. Employers allowing interviews compared to eligible employers who refused, and

  3. Eligible employers compared to unscreened sample members (employers whose eligibility could not be determined).



Nonresponse bias will be calculated using the bias calculation formula from OMB’s Standard and Guidelines for Statistical Surveys (2006). The formula defines bias for a particular estimate, , as the following:

where:

yt = the mean based on all sample cases;

yr = the mean based only on respondent cases;

= the mean based only on the nonrespondent cases;

n = the number of cases in the sample; and

nnr = the number of nonrespondent cases


  1. The second analysis will compare the characteristics of respondents to national data on NAICS and geographic distribution separately, and where sample size and data allows, on NAICS and geographic region combined. While there are no exact matches to the NAWS universe in a single Federal data source, some comparisons can be made. To learn more about possible differences between employers allowing interviews and the universe of employers, we will compare the following:

  1. NAICS 111 employers allowing interviews with the 2012 CoA data on farms with hired farm labor, and

  2. NAICS 1151 employers allowing interviews with QCEW data on NAICS 1151 employers.



  1. The third analysis will examine employers who are not successfully screened during the data collection cycle. After interviewers leave a location, NAWS staff will continue contact attempts with unscreened employers to determine their eligibility status. Some growers may be unavailable during the interview period, either because they are too busy or because they are out of season. The goal will be to determine whether further contact attempts or lengthening the onsite data collection period would result in finding eligible employers who are characteristically different from the responding employers, and thus the extent of any non-response bias.



  1. The fourth analysis will use a Markov chain analysis to incorporate information from prior data periods about growers’ states – whether eligible, ineligible, or unable to be determined – and will look at the impact on response rates. A small number of agricultural employers appear on the survey’s sampling list in multiple administrations of the survey. Attempts to contact these employers may have had different outcomes at different time periods. Predicted states from the model can be used to examine possible bias.



Possible item non-response bias


As discussed previously, the NAWS has item response rates that exceed 90 percent, eliminating the need for a non-response bias analysis for specific items. Due to the low rates of missing data, NAWS data analysis generally uses case-wise deletion. No imputations are included in the public access data.


c) Reliability


A probability sampling methodology will be used and estimates of the sampling errors will be calculated from the survey data.


d) Estimation Procedure


At the highest level of the sampling design, the region/cycle level, stratified sampling is used. Sampling is then carried out at the lower levels, independently within each stratum.


The following description is excerpted from Obenauf1:



The stratified sampling technique divides the entire population into relatively homogenous groups that are mutually exclusive and exhaustive. Samples are then drawn from each of these groups (strata) by simple random sampling or an alternate method. The entire sample is a compilation of these independent samples from each of the strata.

In stratified sampling, an estimate of the population mean can be made for each of the strata.


Estimate of population mean:

,

where Nk is the population size of stratum k and L is the number of strata into which the population is divided.


If a simple random sample is taken within each stratum (recall that other schemes can be used to draw a sample from each of the strata), the following represents an unbiased estimate of the variance of :

.


The standard error of the estimator is the square root of this estimated variance, or

.


At the second stage of the sampling design, within each stratum, counties (or groups of counties) are treated as clusters (FLAs).



The following description is another excerpt from Obenauf2:


The population is again divided into exhaustive, mutually exclusive subgroups and samples are taken according to this grouping. Once the population has been appropriately divided into clusters, one or more clusters are selected … to comprise the sample. There are several methods of estimating the population mean for a cluster sample. The method most pertinent to this study is that involving cluster sampling proportional to size (PPS).


With PPS sampling, the probability (zj) that a cluster j is chosen on a specific draw is given by , where Mj is the size of the jth cluster and M is the population size. An unbiased estimate of the population total is given by

,

where yj is the sample total for y in the jth cluster, n is the number of clusters in the sample and represents the average of the cluster means.


To estimate the population mean, this estimate must be divided by M, the population size.


The variance of the estimator of the population total is given by

,

This is estimated by , where is the sample variance of the values.


For an estimate of the population mean,

and .

In two-stage cluster sampling, the estimated variance of the estimator is then given by an iterative formula:

.

This iterative formula is then generalized to compute the variance of the estimators in multi-stage sampling schemes with three or more levels. Exact formulas become intractable at this point, and the various statistical software packages rely upon either re-sampling methodology or linear approximations in order to estimate the variances and standard errors of the estimators.


The following is an excerpt from the SAS documentation for PROC SURVEYMEANS3.


The SURVEYMEANS procedure produces estimates of survey population means and totals from sample survey data. The procedure also produces variance estimates, confidence limits, and other descriptive statistics. When computing these estimates, the procedure takes into account the sample design used to select the survey sample. The sample design can be a complex survey sample design with stratification, clustering, and unequal weighting.


PROC SURVEYMEANS uses the Taylor expansion method to estimate sampling errors of estimators based on complex sample designs. This method obtains a linear approximation for the estimator and then uses the variance estimate for this approximation to estimate the variance of the estimate itself (Woodruff 1971, Fuller 1975)4,5.


SAS (e.g., PROC SURVEYMEANS) allows the user to specify the details of the first two stages of a complex sampling plan. In the present case, the stratification and clustering at the first two levels are specified in PROC SURVEYMEANS (strata cycle and region; cluster FLA). At the lower levels of the sampling scheme, the design attempts to mimic, as closely as is practical, simple random sampling. The software is not able to calculate exact standard errors, since it presumes true simple random sampling beyond the first two levels. The sampling weights will remedy any differences in selection probabilities so that the estimators will be unbiased.


In the SURVEYMEANS procedure, the STRATA, CLUSTER, and WEIGHT statements are used to specify the variables containing the stratum identifiers, the cluster identifiers, and the variable containing the individual weights.


For the NAWS, the STRATA are defined as the cycle/region combinations used for the first level of sampling. The CLUSTER statement contains the primary sampling unit, which is the FLA.


The WEIGHT statement references a variable that is for each observation i, the product of both the sampling weight Wti and the non-response weight PWTYCRDi. This variable is called PWTYCRD for historic reasons. The PWT refers to a weight for the population and thus includes the season weight, and the YCRD means that the weight includes year, region, cycle, and day components.


The SURVEYMEANS procedure also allows for a finite population correction. This option is selected using the TOTAL option on the PROC statement. The TOTAL statement allows for the inclusion of the total number of PSUs in each stratum. SAS then determines the number of PSUs selected per region from the data and calculates the sampling rate. In cases such as the NAWS, where the sampling rate is different for each stratum, the TOTAL option includes a reference to a data set that contains information on all the strata and a variable _TOTAL_ that contains the total number of PSUs in each stratum.


We include here the sample code for PROC SURVEYMEANS to calculate the standard errors for our key estimator WAGET1.


proc surveymeans data=naws.crtdvars total=naws.regioninfo;

strata region12 cycle;

cluster FLA;

var waget1;

weight pwtycrd;


e) Precision of Key Estimators


The NAWS is primarily a surveillance survey that provides descriptive statistics about the United States crop worker population. Periodic reports posted to the website and presentations at conferences and stakeholder meetings are used to disseminate the survey results. In addition, the data are used by researchers, policy analysts and service program staff primarily for program planning and policy analysis. Two key variables of interest to these groups are FWRDAYS, which is the number of days worked per year by a respondent, and WAGET1, which is the average hourly wage of a respondent. Based on data collected in fiscal years 2011 through 2012, a combined sample size of 3,025 respondents, using the NAWS current weights, the 2-standard-error confidence interval for FWRDAYS was 191 days ± 9.4 days. That is, with approximately 95 percent confidence, the average number of days annually worked, per person, lies between 182 and 200 days. This constitutes a margin of error of ±4.7 percent of the estimated value.


For average wage (WAGET1), the 2-standard-error confidence interval was 9.31 ± 0.23. With approximately 95 percent confidence, the average wage lies between $9.08 and $9.54. This yields a margin of error of ±2.5 percent of the estimated value.


There are numerous other variables of interest, whose standard errors vary greatly. These two are offered as examples that show some of the range of possible precisions obtained.


4. Tests


The questionnaire to be used in the survey was developed by the DOL with input from various Federal agencies. Most of the survey questions will be unchanged from the version that OMB approved in the last submission. The majority of these questions have been used for over 20 years, are well understood by the sampled respondents, and the data they provide are of high quality.


This submission seeks approval for questionnaire supplements sponsored by the Health Resources and Services Administration, and the DOL’s Office of the Assistant Secretary for Policy, Chief Evaluation Office. To increase utility, each supplement has been developed in concert with the sponsoring agency and experts in the question domains. In addition, questions on the supplements not previously piloted undergo pilot testing with no more than nine respondents answering identical questions. None of the pilot testing will involve having 10 or more respondents answer identical questions. Results of the pilot tests completed to date will be provided in a separate document included with this submission.


5. Statistical Consultation


The following individuals have been consulted on statistical aspects of the survey design:

Stephen Reder and Robert Fountain, Professors, Portland State University, (503) 725-3999 and 503-725-5204; Phillip Martin, Professor, University of California at Davis (916) 752-1530; Jeff Perloff, Professor, University of California at Berkeley (510) 642-9574; John Eltinge, the Bureau of Labor Statistics (BLS) (202) 691-7404; Daniel Kasprzyk, Frank Potter, Eric Grau, Steve Williams, Amang Sukasih, Raquel af Ursin, and Yuhong Zheng of Mathematica Policy Research (609) 799-3535; China Layne, Summit Consulting, LLC (202) 407-8328, and Richard Valliant, Joint Program for Survey Methodology, University of Maryland (301) 405-0932.


The data will be collected under contract to JBS International, Inc. (650) 373-4900. Analyses of the data will be conducted by Daniel Carroll, ETA (202) 693-2795, and JBS International, Inc.


REFERENCES


Fuller, W.A. (1975). Regression Analysis for Sample Survey, Sankhyā, 37, Series C, Pt. 3, 117-132.


Obenauf, W. (2003). An Application of Sampling Theory to a Large Federal Survey. Portland State University, Department of Mathematics and Statistics. 2003.


Ryu, E., Couper, M, & Marans, R. (2006) Survey incentives: Cash vs. in-kind; Face-to-face vs. mail; Response rate vs. nonresponse error. International Journal of Public Opinion Research, 18 (1): 89-106.


SAS Institute Inc., SAS/STAT® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999, 61, 3.


Woodruff, R. S. (1971). A Simple Method for Approximating the Variance of a Complicated Estimate, Journal of the American Statistical Association, 66, 411–414.


Zhang, F. (2010). Incentive experiments: NSF experiences. (Working Paper SRS 11-200). Arlington, VA: National Science Foundation, Division of Science Resources Statistics.


1 Obenauf, W. (2003), “An Application of Sampling Theory to a Large Federal Survey”, Portland State University Department of Mathematics and Statistics.

2 Obenauf, W. (2003), “An Application of Sampling Theory to a Large Federal Survey”, Portland State University Department of Mathematics and Statistics.

3 SAS Institute Inc., SAS/STAT® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999, 61, 3.

4 Woodruff, R. S. (1971). A Simple Method for Approximating the Variance of a Complicated Estimate, Journal of the American Statistical Association, 66, 411–414.

5 Fuller, W. A. (1975). Regression Analysis for Sample Survey, Sankhyā, 37, Series C, Pt. 3, 117–132.


32







File Typeapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
Authorcarroll.daniel.j
File Modified0000-00-00
File Created2021-01-24

© 2024 OMB.report | Privacy Policy