[1205-0453: The National Agricultural Workers Survey, Part B]
The objective of the National Agricultural Workers Survey (NAWS) is to provide descriptive statistics of the characteristics of crop workers using a statistical methodology designed to address the difficulties of surveying a mobile and seasonal population often living in non-standard and sometimes hidden housing. In addition, the NAWS is designed to address the information needs of various Federal agencies that oversee farm worker programs. These stakeholders include agencies concerned with assessing agricultural productivity, international transaction accounts, and migrant and seasonal farm worker health and education. Another purpose of the NAWS is to produce accurate regional estimates of the share of farm workers who are eligible for training and employment services through the Employment and Training Administration’s (ETA) National Farmworker Jobs Program (NFJP).
1. Respondent Universe and Samples
a) Respondent Universe
The universe for the NAWS is the population of workers active in crop agriculture in the continental United States. Since the NAWS samples farm workers at the worksite, the definition of the respondent universe involves the definitions of both an eligible employer and an eligible worker.
The universe of eligible employers includes all employers in North American Industry Classification System (NAICS) codes 111 and 1151. NAICS code 111 is Crop Agriculture, which includes employers hiring workers on farms and ranches or in greenhouses. NAICS code 1151 is Support Activities for Crop Production, which includes employers such as farm labor contractors, custom harvesters, and crop-dusting companies who contract with agricultural producers to supply support services and hire workers to carry out these contracts at farms, ranches and greenhouses. Eligible employers must have workers who are actively engaged in crop production.
Eligible workers must be employed by an eligible employer and have worked for that employer for at least four hours on a single day in the prior 15 days. For an employer who has a packing operation, the workers in the packing operation are eligible if the canning or packing plant is adjacent to or located on a farm and at least 50 percent of the produce being packed or canned originated from the farm of the contacted grower.
The following criteria define ineligible employees at an otherwise eligible employer. The employee is ineligible if he/she:
Was interviewed in the NAWS within the last 12 months in the same location.
Holds an H-2A visa.
Has not worked for the contacted employer for four hours or more on at least one day in the last 15 days.
Does “non-farm work” for the employer (e.g., mechanic, sales, office).
Is a family member of the employer and does not receive a paycheck like other farm workers.
Is the employer (grower or contractor).
Is a sharecropper that makes all operational decisions such as when, where and how to plant or harvest.
Works for a landscaping company that only sells, installs, maintains or preserves trees or plants. This includes the planting of ornamental plants and placement of sod.
b) Samples
The NAWS will use a complex sampling design that includes both stratification and clustering. The sampling unit is an eligible crop worker. Multi-stage sampling will be implemented to randomly select and interview approximately 1,500 crop workers. Since migrant and seasonal farm workers are a mobile and time-sensitive population, the sample will utilize strata that incorporate seasonality and agricultural region. Within each stratum, there is clustering. The primary sampling unit is the Farm Labor Area (FLA) (county cluster). Samples of employers and of workers within employers are also selected. This section describes the stratification, primary sampling units, and employer and worker universe size and samples for the survey (see Table 1). Section 2 provides more details on the statistical methods used in sampling.
Table 1: NAWS Stratification and Sampling Units
Entity |
Universe |
Sample |
Cycle |
3 |
3 |
Agricultural Region |
12 |
12 |
Farm Labor Area |
928 |
105 |
Crop Employer |
525,000* |
395 |
Hired Crop Worker |
1,600,000* |
1,500 |
*Estimate
Stratification
To account for seasonal and geographic variation in farm employment, the year is divided into three four-month cycles: October through January, February through May, and June through September. The NAWS sampling will use 12 distinct agricultural regions, which are based on the United States Department of Agriculture’s (USDA) 17 regions. Table 2 shows the correspondence between USDA and NAWS regions. At the start of the survey in 1988, the 17 USDA regions were collapsed into 12 NAWS regions by combining smaller regions that were similar (e.g., Mountain I and Mountain II) based on statistical analysis of cropping patterns. This reduced the number of regions and increased the size of the smallest strata.
Table 2: Correspondence between NAWS and USDA Regions
NAWS Sampling Region |
USDA Region Code & Name |
States in USDA Region |
Appalachian I, II (AP12) |
Appalachian I |
NC, VA |
Appalachian I, II (AP12) |
Appalachian II |
KY, TN, WV |
Corn Belt Norther Plains (CBNP) |
Corn Belt I |
IL, IN, OH |
Corn Belt Norther Plains (CBNP) |
Corn Belt II |
IA, MO |
Corn Belt Norther Plains (CBNP) |
Northern Plains |
KS, NE, ND, SD |
California (CA) |
California |
CA |
Delta Southeast (DLSE) |
Delta |
AR, LA, MS |
Delta Southeast (DLSE) |
Southeast |
AL, GA, SC |
Florida (FL) |
Florida |
FL |
Lake (LK) |
Lake |
MI, MN, WI |
Mountain I, II (MN12) |
Mountain I |
ID, MT, WY |
Mountain I, II (MN12) |
Mountain II |
CO, NV, UT |
Mountain III (MN3) |
Mountain III |
AZ, NM |
Northeast I (NE1) |
Northeast I |
CT, ME, MA, NH, NY, RI, VT |
Northeast II (NE2) |
Northeast II |
DE, MD, NJ, PA |
Pacific (PC) |
Pacific |
OR, WA |
Southern Plains (SP) |
Southern Plains |
OK, TX |
The annual sample will include all 12 regions for each of the three cycles. The USDA’s Agricultural Labor Survey (ALS) provides the size of the hired crop labor force and the Bureau of Labor Statistics’ Quarterly Census of Employment and Wages (QCEW) provides the size of the contract crop labor force. Added together these numbers form the size estimate for each region-cycle stratum. The NAWS region definitions and the ALS regions are congruent. USDA reports ALS data for each quarter. The ALS quarterly data will be prorated into three NAWS cycles. The population of each region-cycle stratum is as follows:
Table 3. Estimated Crop Labor Force Population (1000s) by Region-Cycle Strata*
Region |
Fall |
Winter/ Spring |
Summer |
NE1 |
30.7 |
16.8 |
25.2 |
NE2 |
37.3 |
25.8 |
42.6 |
AP12 |
59.1 |
32.5 |
64.9 |
FL |
47.4 |
63.2 |
44.6 |
DLSE |
76.7 |
45.3 |
63.0 |
LK |
40.6 |
24.6 |
46.1 |
CBNP |
96.0 |
65.2 |
105.6 |
SP |
38.8 |
35.7 |
38.8 |
MN12 |
42.8 |
21.4 |
38.0 |
MN3 |
32.5 |
31.1 |
22.1 |
PC |
126.6 |
75.9 |
172.8 |
CA |
446.0 |
383.3 |
485.6 |
TOTALS |
1,074.3 |
820.7 |
1,149.2 |
*Table derived from 2017-2018 ALS data and 2018 QCEW data.
Values may not add exactly due to rounding.
Primary Sampling Unit
The next level of sampling is to select the FLAs, which are multi-county units that form the primary sampling units, within the cycle-region strata. FLAs consist of either large single counties or, more often, aggregates of counties that form primary sampling units with similar farm labor usage patterns. Each of the approximately 3,000 counties in the continental United States is assigned to one of the 928 FLAs.
FLAs reduce travel costs by providing larger groupings of crop workers in areas where crop workers are sparse, so that regional allocations can be completed efficiently. Areas with fewer crop workers and smaller sized counties may have more counties per FLA. In the West, a FLA may include only a single large agriculture-intensive county.
The FLA definitions are reviewed every five years when new Census of Agriculture (CoA) data become available. FLAs were originally developed for the fiscal year (FY) 1999 NAWS interviewing year using 1992 CoA data on hired and contract farm labor expenditures as the size data, plus 1970s era ETA mappings of seasonal farm labor concentrations for potential location grouping. One reason for developing the FLAs was to have similar- sized primary sampling units within stratum when simple random sampling was used to select the FLAs.
When the 2017 CoA data became available, a more extensive update was conducted on the FLAs. This was done for several reasons. First, cropping and labor patterns have changed since the FLAs were originally developed. Second, the NAWS now utilizes probability proportional to size (PPS) sampling for the FLA roster, and having roughly equal size units did not allow the sample to take advantage of the properties of PPS. Furthermore, some units were unwieldy since making equal size FLAs for sparsely populated counties meant that some FLAs were extremely large, encompassing half a state. These FLAs caused increased travel costs and also resulted in instances where interviewers went to the first county in the FLA but found few or no farm workers and then had to travel a far distance to the second county in the FLA.
The new FLAs were defined using CoA and QCEW data as well as maps showing major geographic barriers and freeways to ensure the ability to travel within a FLA. In certain cases, the definitions were not changed since there were some regions where FLAs were working well. For the following areas the old FLAs were kept: California, Mountain I, II, Mountain III, and Pacific. The expectation is that making FLAs with smaller groupings of counties with sparse farm work and regrouping based on new USDA data will improve the efficiency of sampling and result in lower survey costs.
To test the new FLAs, FLA rosters were drawn 30 times for both the new and old FLAs. The efficiency of the roster was measured in the dollar amount of farm labor of all the FLAs on the roster. For each cycle a two-sided, two-sample t-test between the difference of means in FLA labor expenditures was conducted for each cycle. The result yielded two statistically significant cases in fall and spring where the null hypothesis (there is no difference between the means of old and new FLAs) was rejected at the 95 percent confidence level. Although, all three were statistically significant at the 90 percent confidence level.
For each of the three cycles per year, NAWS staff will select a sample of FLAs in each of the 12 regions using methods described in Section 2c below. For the FY 2020 data collection, visits to 105 FLAs are planned. The actual number of unique FLAs may be lower, as the sampling plan may include a single FLA in more than one cycle during the year.
Employer and Worker Samples
The universe of crop employers is estimated to be 525,000. This estimate is derived by adding the number of agricultural employers from the 2017 CoA that directly hired crop workers (513,137) with the maximum number of agricultural support firms (farm labor contractors and custom harvesters) across the four quarters of the 2018 QCEW (11,819) and rounding the sum to the nearest 1,000. The sample of crop employers is estimated to be 395. This estimate is derived by dividing the target farm worker sample size (1,500) by the average number of workers interviewed per farm in FY 2017 (3.8).
Although there isn’t a census of farm workers, the universe of the crop farm labor force can be estimated using labor expenditure data from the CoA and the ALS, and hours per worker data from a combination of ALS and NAWS data. Based on 2012 data from these sources, there are an estimated of 1.6 million hired crop farm workers, including workers who are brought to farms by labor intermediaries. As described below, the proposed sample size for FY 2020 is 1,500 farm workers.
Sample size
The NAWS faces several challenges in defining an optimal design. First, the NAWS has a complex sampling design that includes both stratification and clustering. Second, the NAWS uses administrative data that comes in clumps that are not optimal for some variables of interest. Third, surveys are generally optimized for one variable of interest and the NAWS collects data on over 250 variables of interest.
For complex survey designs, the design effect measures the efficiency of a sampling design that deviates from simple random sampling. The design effect for a variable of interest is defined as the ratio of the variance calculated under the complex design divided by the variance calculated under the assumption of a simple random sample. Design effects greater than one mean that the complex survey design delivers higher variance than a simple random sample. While it is possible to achieve a design effect of less than one, many factors can make a design less efficient. Lower design effects are generally achieved when strata are homogeneous and clusters are heterogeneous.
We calculated design effects for demographic and employment characteristics using the data collected for the NAWS in fiscal years 2013-2016. NAWS design effects varied from less than one to 10, depending on the variable. Generally, higher design effects are associated with both heterogeneous strata and relatively homogeneous clusters. This usually happens for variables that tend to be relatively similar within farms and FLAs, but heterogeneous in some regions, particularly in the East and Midwest. Examples of such variables include work force characteristics, place of birth, and visa status.
The NAWS design is more efficient for collecting information on worker’s household composition, key variables for estimating service program size, and immigration policy impacts. These types of variables have more heterogeneity within clusters and are more homogeneous within strata.
Originally, the NAWS design was driven by a single variable, the exit rate of newly legalized Special Agricultural Workers. Based on this requirement, the NAWS sample size was set at 2,500 interviews annually. This sample size was also sufficient to report national numbers on an annual basis. After the exit rate requirement was dropped, the sample size fluctuated between 1,500 and 3,200 interviews per year, which allowed for reporting national and regional information using two and four years of data, respectively.
Currently, the NAWS collects data for a variety of Federal agencies and Federal farm worker programs, two of which – NFJP and Migrant and Seasonal Head Start– draw on specific subpopulations. The NAWS has, since its inception, collected data that are used in the NFJP funding formula. Beginning in FY 2018, new questions on education and training, use of digital information devices, and preventive health were added to the survey to inform ETA and Department of Health and Human Services programs and services for agricultural worker populations. The FY 2020 sampling calculations examine the sample sizes needed to report characteristics of farm workers who fall into four subpopulations:
The subpopulation of workers who are poor;
The subpopulation of workers who are authorized to work in the United States;
The subpopulation of workers who have taken or are taking adult education or training classes; and
The subpopulation of workers who have taken English or ESL classes, the most common types of courses taken by farm workers.
For each of these subpopulations, the first step in estimating the desired sample size was to calculate the size of the simple random sample (SRS) needed to produce estimates with the desired precisions (see Table 4) using the formula:
n= z2(p(1-p)/e2,
where
n= desired sample size under SRS,
z= z statistic for desired confidence interval (1.96 for 95%),
p= proportion being estimated and
e=desired half width of the confidence interval.
The proportion was set to 0.50 because that is the proportion that requires the highest sample size to achieve a confidence interval with a desired precision. At that sample size, all other proportions will meet the desired precision.
Table 4, below, shows the additional calculations needed to adjust the subpopulation sample sizes to account for the NAWS’s complex sampling design. First, the desired sample size for the subpopulation is calculated by multiplying the SRS sample size by the design effect for the subpopulation. The design effects are calculated from several years of data and are fairly stable. The desired NAWS sample size then takes into account the expected proportion of the subpopulation within the NAWS sample. This number is based on previous NAWS data. The final number in the calculation is the number of years of data that would ideally be combined for reporting purposes for each subpopulation.
The desired sample size can then be calculated using the following formula
Desired Sample size =n*deff/(S*Y)
where
n= Sample Size for 95% Confidence Interval under SRS
S=the proportion of the subpopulation in the NAWS sample, and
Y=the number of years of data to be combined for reporting.
For example, in Table 4 below, the SRS desired sample size desired for workers who are authorized to work in the United States is 384. With a design effect of 8.43, the subpopulation size within the NAWS sample (S) is 52 percent and the reporting frequency (Y) is every five years. Using the formula above, the result is a desired annual NAWS sample 1,245.
Table 4 shows that the overall design-adjusted sample sizes needed to meet the data collection goals for these four subpopulations range from 1,245 to 1,441. The desired NAWS sample size for FY 2020 is based on the highest of these sample sizes and rounded up to 1,500. This sample size would achieve all of the sampling objectives.
Table 4. Sample Size Calculations*
Sub- population |
Planned Confidence Interval (e) |
Proportion Being Estimated (p) |
Sample Size for 95% Confidence Interval Under SRS (n) |
Design Effect (deff) |
Desired Sub- population Sample Size within NAWS (n*deff) |
Current Proportion of Sub- population in the NAWS Sample (S) |
Years of Data to Combine for Reporting (Y) |
Desired Annual Sample Size |
Adult Education or training |
5% |
50.00% |
384 |
6.57 |
2,523 |
36% |
5 |
1,395 |
English/ ESL classes |
5% |
50.00% |
384 |
5.41 |
2,077 |
14% |
10 |
1,441 |
Work-Authorized |
5% |
50.00% |
384 |
8.43 |
3,237 |
52% |
5 |
1,245 |
Below Poverty |
5% |
50.00% |
384 |
8.011 |
5,772 |
31% |
7 |
1,414 |
* The values in the table do not multiply exactly due to rounding.
The desired annual NAWS sample size is distributed across the region-cycle strata proportionate to the farm worker population numbers from the ALS. See Table 5 below.
Table 5. Estimated Sample Size by Region-Cycle Strata for FY 2020*
Region |
Fall |
Winter/ Spring |
Summer |
NE1 |
15 |
8 |
12 |
NE2 |
18 |
13 |
21 |
AP12 |
29 |
16 |
32 |
FL |
23 |
31 |
22 |
DLSE |
38 |
22 |
31 |
LK |
20 |
12 |
23 |
CBNP |
47 |
32 |
52 |
SP |
19 |
18 |
19 |
MN12 |
21 |
11 |
19 |
MN3 |
16 |
15 |
11 |
PC |
62 |
37 |
85 |
CA |
220 |
189 |
239 |
TOTALS |
528 |
404 |
566 |
*Note that numbers include a small amount of rounding error.
2. Statistical Methods for Sample Selection
a) Overview
The goal of the NAWS sampling methodology is to select a nationally representative, random sample of crop workers. Stratified multi-stage sampling will be used to account for seasonal and regional fluctuations in the level of farm employment. There are two levels of stratification: three four-month cycles and 12 geographic regions, resulting in 36 time-by-space strata. For each cycle, within each region, NAWS staff will draw a random sample of FLAs. Within each FLA, counties are the secondary level of sampling units, ZIP Code regions are the third, agricultural employers are the fourth, and workers are the fifth.
For each cycle, the number of interviews allocated to each region is proportional to the estimated seasonal number of farm workers employed in the region. The regional allocation is distributed proportionately across the sampled FLAs expected to be visited. Within each FLA, interviewers will visit the sampled counties and ZIP Code regions to contact employers and select a random sample of eligible workers employed on the day of the visit.
b) Stratification
Interviewing Cycles
As mentioned above, interviews are conducted in three cycles each year with each cycle lasting four months. The reason for this is to account for agricultural seasonality. The number of interviews conducted in each cycle is proportional to the estimated number of crop workers employed during the cycle. The seasonal agricultural employment figures are based on the ALS and QCEW. The ALS provides quarterly employment figures for the continental United States. These quarterly figures are pro-rated into the three interview cycles.
Regions
As mentioned in Section 1, in each cycle, all 12 regions are included in the sample. The number of interviews per region is proportional to the size of the seasonal farm labor force in a region at a given time of the year. The size of the seasonal labor force in each region is derived from ALS and QCEW quarterly regional data, which are pro-rated into the three cycles before being distributed to the regions.
c) Sampling within Strata
Farm Labor Areas
Sampling FLAs is a two-stage process. In the first step, a roster of ten FLAs will be drawn in each cycle-region stratum using probabilities proportional to the seasonal labor force for that stratum. NAWS staff will conduct systematic probability proportional to size (PPS) sampling of FLAs within regions using SAS PROC SURVEYSELECT. The size measure for the FLA seasonal labor force will be calculated using farm labor expenditure data obtained from the CoA and seasonal adjustment factors derived from the QCEW. The seasonal adjustment factors will be made by aggregating the QCEW’s reported monthly employment figures for the months that correspond to each of the NAWS cycles (e.g., June, July, August, and September for the summer cycle). The percentage of annual employment corresponding to each cycle is the FLA’s seasonal adjustment factor. The size measure for the FLA labor force will be calculated by multiplying the FLA’s annual hired and contract labor expenditures from the 2017 CoA by the seasonal adjustment factor from the 2018 QCEW.
There is no data source containing accurate details of when and where local crop work is occurring that could provide accurate planning of the NAWS interviewing locations. The local timing and location of seasonal farm employment depends on changes in cropping patterns and weather conditions, including disasters such as a drought or flood. As a result, crop workers may not be found at the same times and locations as previous years. It is not unusual to visit a location, even after consulting local experts, and be unable to complete the interview allocation. Thus, the number of FLAs needed to complete the interview allocation within a cycle-region stratum cannot be determined in advance.
The second stage in the two-stage FLA sampling process is the progression through the roster of ten FLAs drawn in the first stage, to complete the regional interview allocation. The roster of ten FLAs will be randomly sorted, and the interviewers will begin in the FLA at the top of the list. They will move to additional FLAs as needed to complete the interview allocation for the region by proceeding down the list, in order.
For planning purposes, the starting number of FLA visits for the NAWS annual sample will be 105. To ensure there will be an adequate number of FLAs visited in each region, a minimum of two FLAs will be assigned in each region per cycle. Thus, 12 regions x 2 FLAs x 3 cycles = 72 FLAs. Most of the remaining FLAs will be assigned to regions proportionate to the size of the regions’ seasonal farm labor force for a particular cycle, according to the ALS size numbers. Additional FLAs will be allocated to regions where difficulty in meeting interview allocations is anticipated, usually due to seasonal factors. The starting number of FLAs selected for each cycle-region stratum is anticipated to range from two to five.
While each cycle-region stratum has its own roster of FLAs, there will likely be some overlap because a FLA can be visited more than once per year. Since the exact number of FLAs visited each cycle is unknown until after a cycle is completed, the number of unique FLAs visited in a year is unknown until the end of the year.
Counties
Similarly, it is not possible to know in advance how crop workers are distributed within a FLA and the exact number of counties needed to encounter enough crop workers to complete the interview allocation. In most cases, interviews are completed in the first county and no additional counties are needed. However, because there is tremendous uncertainty about the number of workers in a county, additional counties may be needed to complete the interview allocation. Counties will be selected one at a time, without replacement, using probabilities proportional to the size of the farm labor expenditures in a county during a given season. Seasonality is considered constant within a FLA.
The process of selecting counties will begin with a randomly sorted list of the counties within the FLA. A cumulative sum of the size of the hired and contract labor expenditures, derived from CoA data, will be constructed for this list. When selecting a county, the selection number is the product of a random number selected from the uniform distribution, multiplied by the cumulative sum. The county that includes the selection number is chosen. Tables 6 and 7 illustrate an example of the algorithm used for selecting counties within FLAs.
Table 6: Example Counties and Labor Expenditures within an FLA
Counties in FLA |
Hired and Contract Labor Expenditures |
Cumulative Sum of Hired and Contract Labor Expenditures |
A |
100,000 |
100,000 |
B |
300,000 |
400,000 |
C |
800,000 |
1,200,000 |
D |
450,000 |
1,650,000 |
E |
600,000 |
2,250,000 |
Table 7: Example Showing Algorithm for Selecting Counties within FLAs
Step in the Algorithm |
Result |
Random number selected from uniform distribution |
0.657 |
Selection number (random number * cumulative sum of hired and contract labor expenditures) |
1,478,250 (0.657 * 2,250,000) |
County selected |
D |
As shown in the example in Tables 6 and 7, the cumulative sum of hired and contract labor expenditures for all counties in the FLA is 2,250,000. The random number selected from the uniform distribution is 0.657. The random number is multiplied by the cumulative sum of hired and contract labor expenditures to produce a selection number of 1,478,250. This selection number is included in the cumulative sum of county D, so county D is selected.
So as not to delay field operations, in situations where NAWS staff anticipate visiting more than one county, several counties will be selected using the selection method above. Each county will be marked and ordered by its selection number (e.g., 1st, 2nd, 3rd). Interviews will begin in the first selected county, and when the list is exhausted, interviewers will move to the next randomly selected county on the list until all the allocated interviews in the FLA have been completed. In FLAs where farm work is sparse, interviewers may need to travel to several counties to encounter sufficient workers to complete the FLA’s allocation.
Sampled counties are divided into ZIP Code regions, which are smaller areas based on geographic proximity and the number of employers in the area. The purpose of ZIP Code regions is to group together employers that are close in proximity to reduce the cost of driving from employer to employer within a county. Counties can be comprised of a single ZIP Code region (for example, in the case of a small county) or multiple regions (for example, when a county is large). In a county with multiple ZIP Code regions, the goal is for the ZIP Code regions to be roughly equal in size.
The process of constructing the regions begins by randomly sorting the employers in the county by Zip Code and by a random number. Beginning at the top of the list, groups of seven employers are assigned as a Zip Code region. Some Zip Code regions may include growers from more than one Zip Code. For example, the last three employers in one Zip Code and the first three in the next Zip Code could comprise a Zip Code region. The final Zip Code region will be of unequal size if the number of growers in the county is not evenly divisible by seven. If there are five or six growers in the final group, it will stand alone as a Zip Code region. If the final group is four or fewer employers, it will be combined with the previous group. Thus, the final Zip Code region could vary in size from five to 11 growers.
When there are multiple ZIP Code regions in a county, the regions will be randomly sorted to produce a list that determines the order in which the areas will be visited. Interviewers will make three attempts to contact each agricultural employer in the first ZIP Code region on the list and then move down the list, following the random order, until the interview allocation is filled or the county’s workforce is exhausted.
Employers
One of the challenges of the NAWS is that there is no agreed-upon universe list of crop employers. NAWS staff will compile a crop employer universe list using administrative lists, marketing lists, and an online search. The BLS provides names of agricultural employers in NAICS codes 111 and 1151 directly to the NAWS contractor per the terms of an agreement between the ETA and the BLS. Employers on the BLS list are those who pay unemployment insurance (UI) taxes. In states where UI is not mandatory for all agricultural employers, the list of employers from BLS will be supplemented with other sources.
One issue with the BLS lists is that farm labor contractors (FLCs) report payroll taxes from one location but work in multiple areas. Research has shown this is true for California FLCs, and in the last year the list of FLCs was updated for counties in California. On the BLS list, FLCs report wages and pay unemployment insurance taxes in one location, which neglects that they may work in several counties throughout the state. The list only provides addresses and locations of FLCs in which they report, not the county/counties they primarily work in. To enrich the sampling frame, lists of California farm labor contractors (from the Agricultural Commissioner’s office) were added to employer lists for counties that were selected as interview sites. This helps enrich the employer sampling list by providing a more accurate depiction of FLCs in California.
It is not possible to know in advance which employers will be active employers at the time of sampling. While NAWS staff relies on the best available information when preparing the employer sampling lists, several factors make it difficult to get an accurate list. First, there is a great deal of turnover in agricultural enterprises and lists are easily out of date. Second, as discussed above in the ZIP Code Regions section, seasons vary from year to year and employers change cropping patterns and practices, which in turn modifies labor utilization. Third, sources of farm employers are incomplete.
The policy for developing the employer list is to be more inclusive and allow employers on to the list who may have a low probability of eligibility. NAWS staff balances bias from exclusion of potentially eligible employers against lower response rates arising from the difficulty of screening and excluding ineligible employers. The employer list inclusion procedures tend to err on the side of inclusion of possibly inactive employers.
Employers will be selected using simple random sampling, for several reasons. First, there is no reliable information on employers’ workforce size before the interviewing cycle starts. As such, using PPS to select employers is not possible. Second, simple random sampling results in selection of a greater variety of farm sizes, whereas PPS favors larger farms.
Because of uncertainty about the conditions of local seasonal farm labor, the number of eligible employers in a specific area cannot be known in advance. Interviewers will receive a randomly sorted list of all employers in the ZIP Code region (as described above). Interviewers will start with the first employer on the list to determine that employer’s eligibility for the survey. As mentioned above, interviewers will continue making three attempts to contact employers as they move down the list following the randomized order. They will do this until either they complete the allocation for the FLA, or the list is exhausted.
Workers
The maximum number of interviews allocated to each employer is roughly proportional to the FLA allocation. Were the allocation to be based on employer size, all interviews could be conducted at a single employer if a FLA allocation was small and the first participating employer had a large workforce. To ensure that interviews come from more than one employer per FLA, the following schedule is used.
If the total number of interviews allocated for the FLA is:
Less than 25, the maximum number of interviews allowed per employer is five.
26-40, the maximum number of interviews allowed per employer is eight.
41-75, the maximum number of interviews allowed per employer is ten.
More than 75, the maximum number of interviews allowed per employer is twelve.
If the number of workers at an employer is less than the maximum number allowed per the criteria listed above, then all workers at the employer will be interviewed.
Most of the NAWS interviews take place on farms where there is only one group of workers. On some farms, however, workers are organized into crews consisting of several workers and a supervisor. Crew size can range from a handful of workers to more than 100, but crews of 30 workers or less are most typical based on prior years’ data. When the number of crews is large, randomly selecting workers from each crew is not feasible, and can be an imposition on the farm employer. For this reason, on farms where there are multiple crews, interviewers will first select one crew. They will then select workers to interview only from within that crew.
When a crew has to be selected from multiple crews, the crew will be selected randomly. Under some field conditions, the crew selection cannot be done using simple random sampling. In these situations, the crew will be selected using a structured approach, employing a sampling rule based on factors such as proximity or scheduling. For example, the interviewer might select the crew that is next scheduled to take a lunch break. In cases where the crew cannot be selected using simple random sampling, interviewers will record the factors that determined crew selection. As in prior years of the survey, crew selection is expected to be relatively rare.
Worker selection
When the number of workers at an establishment is greater than the maximum number that may be interviewed, interviewers follow procedures that are designed to ensure the selection of a random sample of workers. A lottery system is the preferred method
In the case of lottery selection, consider that lower-case n is the number of interviews allowed for an employer (e.g., 8) and upper-case N (N) is the total number of workers available for interview (e.g., 20). Interviewers place ‘n’ marked tags (8) and ‘N-n’ (20-8, or 12) unmarked tags in a container and shuffle them. Workers then draw a tag and those who draw the marked tags will be interviewed. A refusal is noted if someone who selected a marked tag is not interviewed, e.g. because the person walked away after getting a marked tag or stated that he/she does not wish to be interviewed. A refusal would also be noted if a marked tag is left in the bag after workers select tags.
Though the lottery is the preferred method it is not always feasible due to the variation in crop farm labor use patterns. In cases where a lottery is impractical, interviewers use an alternate method to select workers. Interval sampling is the endorsed alternate method. For interval sampling, the interviewer identifies a random point at which to start, and then selects workers at evenly spaced intervals. In cases where workers are not visible because they are inside an object such as housing units or working inside several greenhouses, an additional method is available. The interviewer selects a random starting object and begins interviewing in the first object and proceeds through the objects in order (e.g. clockwise) until the allocation is filled. If there are more workers in an object than the remaining allocation, a raffle is conducted. Thus, the objects have a known probability of being sampled and each worker has a known probability of being sampled.
Sampling weights, nonresponse weights, and post sampling population weights will be used when analyzing NAWS data.
Sampling weights provide each sampled worker’s probability of selection within the cycle-region stratum, including probabilities of being selected at the FLA, county, ZIP Code region, employer, and worker level.
Nonresponse weights correct sampling weights for deviations from the sampling plan, such as discrepancies in the number of interviews planned and completed in specific locations.
Post-sampling adjustment adjusts the weights given to each interview in order to compute unbiased population estimates from the sample data.
The data used for calculating weights will come from several sources. For the sampling weights, the number of crews and workers at the farm on the day of sampling will be collected from the employer by the interviewer as part of the sampling documentation. Employer weights calculations will use information from the employer universe list and employer response codes recorded by interviewers. The county and FLA size information will come from CoA farm labor expenditure data. Data for post-sampling weights to adjust for part-time and seasonal work will come from the NAWS questionnaire. Stratum weight data will come from the ALS and QCEW.
Calculations of the nonresponse weights at the worker and employer level will be done as part of calculating the sampling weights, as explained below; nonresponse weights at the cycle and region level will be calculated simultaneously with cycle and region post-sampling adjustment weights.
Sampling Weights
Each worker in the sample has a known probability of selection. Information collected at each stage of sampling is used to construct the sampling weights.
The worker probability of selection is:
where the number selected at the employer is the minimum of either the total number of workers in the crew or the FLA allocation per employer, as described on page 10.
.
It is anticipated that the numerator will be one as the interviewers are instructed to select workers from a single crew where there are multiple crews. If there is only one crew of workers at an employer, then crewprob = 1.
The number of employers selected includes all employers from the beginning of the randomly sorted list until the last employer where interviews were completed.
Calculating counprob, the county within FLA weight, is more complicated as counties are selected sequentially using probabilities proportional to size. For example, if one of the sampled counties is larger than another, then its probability of selection should be higher than the other county. If several counties are selected from a particular FLA, then the selection probability for a particular county is calculated as: (1) its probability of selection on the first draw, plus (2) the probability of its selection on the second draw, plus (3) the probability of its selection on the third draw.
For the standard method of sampling several counties with probabilities proportional to size, without replacement, closed-form formulas for the exact inclusion probabilities do not exist. However, these probabilities can be calculated exactly using multiple summations. This procedure can be implemented in SAS within PROC IML.
Suppose that the population at a particular sampling stage consists of N objects with sizes si,s2,…,sN , having total size . Let be the probability that the jth item is selected on the ith draw. Then for j=1,2,…,N,
,
,
, and so forth.
These ith-draw probabilities each have the property that . Finally, the probability that the jth item is included in a sample of size n is . These inclusion probabilities have the property that .
The county selection probabilities can be calculated exactly using these formulas.
The calculation for flaprob, which is the probability that the FLA was selected within the region, has two components: the probability that the FLA was selected for the roster and the probability that a FLA on the roster was selected for the cycle.
For the probability that a FLA was selected from the roster, consider that:
N is the number of FLAs in the region.
s1 through sN are the sizes of the FLAs.
S is the sum of the FLA sizes, so .
n is the number of the FLAs to be selected with probabilities proportional to size.
In selecting the FLAs, they were listed in a random order. A column of cumulative FLA sizes was constructed. That is, the cumulative size at the jth FLA will be .
A random starting point, k0, was chosen between 1 and . The integers k0, k0,+ k, k0,+2k, k0,+ (n-1)k can be listed. The jth FLA will be selected if one of these integers falls between Sj-1+1 and Sj (where S0 is interpreted to be 0).
Without loss of generality, consider the first FLA on the randomized list. It will be selected if k0 lies between 1 and s1. Thus, its probability of selection is .
In general, the probability that the jth FLA is selected for the roster is .
The flacycprob is the probability that the FLA is selected in a specific cycle. The calculation is as follows:
Nonresponse Weighting
Nonresponse weights adjust the probabilities and sampling weights for deviations from the sampling design. If, for example, ten interviews should have been completed at a farm but only two interviews were completed, those two interviews could be given five times the weight they would have received otherwise. Thus, each interviewee’s probability will be adjusted for deviations in the number of interviews completed at the farm. The adjusted probabilities are composite factors calculated by multiplying the worker nonresponse by the worker probability of being selected.
The calculation of the worker probability adjustment is as follows. The response rate for workers is:
and the adjusted worker probability is:
workprobadj = workprob * workerresprate
Nonresponse adjustment will also be calculated at the employer level. The region is the geographic level at which the interviews are allocated. Many of the features of the NAWS sampling were set up to overcome the lack of reliable information on seasonal employment at the county and ZIP Code level. Each cycle, there are glaring discrepancies in the number of predicted and actual eligible employers and workers at the FLA, county and ZIP Code region level.
Given these demonstrated data issues, nonresponse adjustment at the region level will be done to account for employer nonresponse as well as nonresponse within the cycle-region stratum. This is because the region is the level at which the interviews are allocated. All other allocations are derivative, as the regional allocation is distributed across FLAs, counties and ZIP Code regions in a rolling manner. In this way, nonresponse in one area is made up for in another FLA, in order to meet the regional allocation. Additionally, by calculating a nonresponse adjustment at the region level overall, size information will, generally, be based on better quality data. This is due to the availability of more recent data and the lower likelihood of the absence or suppression of data due to privacy considerations. In addition, the region is the lowest level with enough interview coverage to adjust the weights for nonresponse because if, for some reason, there are too few interviews in a region, the region can be combined with adjacent regions for weighting purposes.
Employer nonresponse adjustment at the region level also takes into account ZIP Code region where no eligible employers were found. The probability of selecting the ZIP Code region within county, county within FLA, and FLA within region includes non-responding units. Adjusting at the ZIP Code region level would result in the omission of employer nonresponse in ZIP Code regions where no interviews were done. Since the sampling process allows for the possibility that there might only be one ZIP Code region selected in a FLA, the region level is the preferred level where the nonresponse adjustment can be calculated reliably.
It is important to account for the two stages of the employer selection process. First, employers are contacted and screened to determine employer eligibility. The second phase is persuading eligible employers to allow interviewers to access and interview their workers. The potential for nonresponse exists at both stages. Interviewers may be unable to contact the employer, or the employer may refuse to provide the information needed to determine employer eligibility. Eligible employers may refuse to allow access to their workers.
For the first stage in the employer selection process, we will calculate an employer screening adjustment:
,
where employers with completed eligibility screening include all contacted employers where NAWS field staff are able to determine whether the employer is eligible or ineligible.
For the second stage of employer selection, the formula for the response rate among eligible employers is:
The adjusted employer probability is:
emplprobadj = emplprob * emplscreenrate * emplresprate
Nonresponse adjustment at the cycle-region allocation will also be calculated. The calculation of the region nonresponse adjustments will be done simultaneous with the post-sampling weights to take advantage of the most recent ALS data used for population weighting.
Sampling Weight
The combined probability of selection within a cycle-region,
The individual worker sampling weight, WT k, equals the inverse of the selection probability:
WT k= 1/prob.
Post-Sampling Weights
Post-sampling weights will adjust the relative value of each interview in order for national estimates to be obtained from the sample. There are five post-sampling weights. Two of the weights adjust for unequal probabilities of selection that can only be determined after the interviews are conducted. These include the unequal probabilities of finding part-time versus full-time workers (day weight) and the unequal probabilities of finding seasonal versus year-round workers (seasonal weight). The other three weights (region, cycle, and year) adjust for the relative importance of a region’s data, a sampling cycle, and a sampling year. As discussed below, the calculation for the region weight will be done simultaneously with the region nonresponse adjustment weights. The cycle weight and year weight allow different cycles and sampling years to be combined for statistical analysis.
The region and cycle weights will use measures of size obtained from the ALS that are reported by quarter and region. The ALS is the only information source on levels of farm worker employment. The CoA, for instance, collects data annually rather than quarterly, and provides the desired statistic once every five years. By using ALS figures to make the size adjustment, the NAWS can adjust the weights by stratum (cycle and region) and construct unbiased population estimates. Nonresponse adjustments for size, therefore, take place at the region-within-cycle level to create corrected region weights.
The NAWS sampling plan calculates sampling allocations using ALS data collected in the year before the interviews. For example, FY 2012 data is used to plan the NAWS 2013 sample. The weights, however, will use ALS data collected during the interview year. This corrects for any discrepancies in allocations due to projecting farm worker distributions based on past years’ data.
The day weight adjusts for the probability of finding part-time versus full-time crop workers. Interviewers will conduct interviews during one to two week visits to a specific FLA. A part-time worker, who works only two or three days per week, has a lower likelihood of being encountered than a worker employed full time. The day weights reflect these different probabilities of selection.
It is assumed that a worker has an equal likelihood of being sampled on each day worked. Thus, the probability of sampling a worker is related to the number of days worked by individual workers. It is therefore possible to calculate a day weight that is simply the inverse of the number of days the worker did farm work during the week.
A respondent is always present on the day he\she was sampled. From the NAWS interview form, it can be determined how many days the respondent worked during the week. A worker who worked one day a week would have a day weight of one. A worker who worked two days per week would have a sampling probability twice that of someone working one day per week, thus a day weight of 1/2.
The day weight (DWTS) is computed as:
The days per week worked is reported by the farm worker. In prior surveys, almost all workers sampled worked five or six days per week. The NAWS will not sample on Sundays; therefore, workers at establishments reporting at least six workdays per week have the maximum chance of selection and the minimum day weight of one-sixth.
The few workers who do not report a number of days worked per week will receive a default value of one-sixth, the most commonly reported value.
The Season Weight
The calculation of the worker weights is complicated by the fact that workers could, in general, be sampled several times a year. Furthermore, neither the CoA nor the ALS provides figures that can be used for the annual number of crop workers. The CoA reports the number of directly-hired crop workers employed on each farm, but does not adjust for the fact that some workers are employed on more than one farm in the census years. In addition, CoA farm worker counts exclude employees of farm labor contractors. Similarly, the ALS is administered quarterly and reports the number of crop workers employed each quarter, so the same worker could be reported in multiple quarters. Because of this repetition of workers across seasons, it would be invalid to derive the total number of persons working in agriculture during the year by summing quarterly figures from the ALS.
As employment information is not available for every worker for each quarter of the year, the only way to avoid double-counting of crop workers is to use the 12-month retrospective work history collected in the NAWS. Specifically, predicting future-period employment is achieved by imposing the assumption that workers who report having worked in a previous season would work in the next corresponding season. For example, a worker sampled in spring 2015 who reported working the previous summer 2014 is assumed to work in the following summer 2015. For some purposes, including the calculation of year-to-year work history changes, this assumption cannot be used. For purposes such as obtaining demographic descriptions of the worker population, however, this assumption provides satisfactory estimates.
Furthermore, it is assumed that a worker has an equal likelihood of being sampled in each season worked. Thus, the probability of sampling a worker is related to the number of seasons worked by individual workers. It is therefore possible to calculate a seasonal weight that is simply the inverse of the number of seasons the respondent did farm work during the previous year.
For the purposes of the NAWS, there are only three seasons per year. An interviewee always performed farm work during the trimester he\she was sampled. From the NAWS interview, it can be determined during which of the two previous trimesters the respondent also did farm work. If the interviewee only worked during the current trimester, the season weight is 1/1 or 1.00. If the interviewee worked during the current trimester and only one of the two prior trimesters, the season weight is 1/2 or 0.50. Finally, if the interviewee worked during the current and both of the prior trimesters, the season weight is 1/3 or 0.33.
This season weight is similar to the day weight in the sense that respondents who spend more time (seasons) working in agriculture have a greater chance of being sampled. Therefore, the weighting has to be inversely proportional to the number of seasons worked in order to account for the unequal sampling probability.
The region weight adjusts the relative weight of a region’s data in relation to the number of interviews completed in that region. If the number of interviews completed is smaller than the regional allocation in the sampling plan, an adjustment weight greater than one will be assigned to each interview in the region, and vice versa. These adjustments ensure that the population estimates are unbiased.
The region weight will be based on ALS measures of regional farm employment activity. This is the best source of information available about crop workers. The ALS figures reported by region and quarter allow the weight to be sensitive to seasonal fluctuations.
Correspondence between USDA Data and the NAWS Sampling Cycles
The calculation of the region weight will rely on two pieces of information: the ALS regional measures of size and the number of interviews completed in each region. The first step in the process of calculating the region weight is to apportion the ALS quarterly size figures among the three NAWS sampling cycles.
The USDA (ALS) figures are reported quarterly. The NAWS sampling years, however, will cover non-overlapping 12-month periods (from September to August), which are divided into three cycles. Accordingly, it is necessary to adjust the USDA figures to fit the NAWS sampling frame by apportioning the four quarters into three cycles.
For example, the number of crop workers in the fall cycle for a region is assumed to be the total number of workers for that region in USDA Quarter 4 (October ALS data) of the current fiscal year (FYc) plus one‑third the number of workers for that region in USDA Quarter 1 (January ALS data) of the next calendar year (FYp). The formula for the winter, spring, and summer cycles is constructed similarly.
Determining the NAWS Region Grouping According to Interview Coverage
The calculation of the region weight (within cycle) is as follows for each region j (1,…,ni) in cycle i:
,
where USDAij is the USDA estimate for region j in cycle i, Xij is the sum of the sampling weights for region j in cycle i, DWTSij is the sum of farm worker day weights for region j in cycle i. Also, 1/6≤DWTSijk≤1 (where k refers to a farm worker), so that DWTSij = Xij if all crop workers in region j in cycle i are working one day per week and DWTSij=1/6*Xij if all crop workers are working six days per week in region j in cycle i.
The Cycle Weight
NAWS data will be combined from the different sampling cycles (seasons) within the same sampling year in order to generate more observations for statistical analysis. In order to combine cycles, it is necessary to adjust for the number of crop workers represented in each cycle in relation to the number of interviews completed in the cycle. For instance, suppose sampling is not proportional, as explained above, but rather the same number of crop workers is interviewed in all three cycles in the fiscal year. If the USDA reported more workers for the fall and spring/summer cycles, as compared to the winter cycle, then the interviews in the fall and spring/summer would be weighted relatively more in terms of size than the interviews conducted in the winter cycle. Accordingly, the interviews in the winter would have to be down weighted in relation to the interviews in the other seasons (cycles) before the cycles could be combined.
The cycle weight is calculated similarly to the region weight, but at the cycle-level rather than region-level. The sum of the USDA size for a cycle is divided by the number of interviews in that cycle. The calculation of the cycle weight (or region weight within year) is as follows for each region j 1,…,ni, cycle i in year Y:
where
and
0.33<=SEASWTSk<=1 (k refers to a farm worker) and SEASWTSk=1if the farm worker worked only one cycle during the year, so that if all crop workers for region j in cycle i worked one day per week and only one cycle in the corresponding year Kij = 1 and SSEADWTSij = Xij.
The year weight allows different sampling years to be combined for statistical analysis. It follows the same rationale as the cycle weight, but at the sampling-year level. If the same number of interviews are collected in each sampling year, those interviews taking place in years with more farm work activity are weighted more heavily in the combined sample.
Sampling years cannot be combined if the interviews are not comparable in terms of agricultural representation. In an extreme case, suppose that the NAWS interview budget tripled for one of the sampling years, consequently tripling the number of interviews. If the two sampling years were joined without adjustment, the larger sampling year would have an unduly large effect on the results.
To avoid this, the year weight is calculated as a ratio of the total number of crop workers reported in the USDA ALS for each sampling year to the number of interviews in that sampling year. This is done on a cycle-by-cycle basis, but the intent is to even out annual allocations that do not represent similar proportions of the population. The year weight calculation (or region weight related to all years of interviews) is as follows for each region j (1,…,ni), cycle i (the sum over i, j means all crop workers, all cycles all years):
with the same notations as the preceding weights.
Obtaining the Final Weights
Once the individual weight components are calculated, final composite weights are calculated as the product of the day weight, the season weight, the region weight, the cycle weight, the year weight, and the sampling weight. The cycle and year are also factored into the composite weights when multiple cycles or sampling years are used. The composite weights are adjusted so that the sum of the weights is equal to the total number of interviews at the next higher level of stratification. These adjusted composite weights based on crop workers are then used for calculating the estimated proportion of workers with various attributes.
The individual observation weights are obtained at the farm worker level:
This is the weight within cycle; it includes an adjustment for the length of the workweek, but no seasonal adjustment.
This is the weight within a year; it includes both the length of the workweek and seasonal adjustment. This weight may be used for the analysis of one particular year of interviews.
The composite weight (PWTYCRD) is used for almost all NAWS analysis. This weight allows merging several years of analysis together. It is included in the public access dataset.
4. Statistical Reliability
a) Maximizing Response Rates
The NAWS response rate is the product of the employer response rate and the worker response rate times the response rates for all other levels of sampling. Since in FY 2018, all regions-cycle strata were used and all FLAs, counties, Zip Code regions in the sample were visited in order, the response rates for these levels of sampling are 100%. The employer response rate was 24% and the worker response rate was 91%. Thus the overall response rate was 21 percent.
NAWS interviewers attempted to contact 4,256 agricultural employers, of which 22 percent were eligible to participate in the survey, 37 percent were ineligible, and 41 percent could not be contacted and screened. A large number of potentially eligible employers have undetermined eligibility despite multiple contacts. The likelihood that an employer on the sampling list is eligible varies considerably. Many issues are responsible for the problems contacting and screening an employer. First, there is a lag in receiving employer information from BLS, so some information is out-of-date. Other sources of employer information vary in terms of completeness and the degree to which the list is vetted. At the same time, agricultural operations are not static and changes to crops, technology, or labor practices may affect the employment and timing of agricultural workers, and thus the establishment’s eligibility to participate in the survey. Also, there are businesses that are bought and sold, as well as those that start up, liquidate, or cease production. Even employers that have agricultural workers may not be eligible at all times of the year since agriculture is a seasonal industry and workers may only be needed for tasks at certain times of the year. Interviewers code the reasons for inability to screen as well as reasons for ineligibility for each selected grower.
In FY 2018, 48 percent of the randomly selected eligible employers (or their surrogates) who employed workers on the day they were contacted agreed to cooperate with the survey, and interviews were conducted at 50 percent of these eligible establishments. Taking into account employers for whom eligibility could not be determined, the employer response rate in FY 2018 was 24 percent using the unweighted response rate (URR) formula from the OMB Standards and Guidelines for Statistical Surveys (2006).
Once interviewers have the employers’ agreement to cooperate with the survey, a random sample of workers is selected. In FY 2018, 91 percent of the sampled workers at eligible establishments agreed to be interviewed.
Previous NAWS data shows that item response rates are high (i.e., low nonresponse rates). Using fiscal year 2011-2016 data, the average nonresponse rate for items that do not have a skip pattern is less than 0.5 percent for each fiscal year, ranging from 0 percent to 2.4 percent. The item “When was the last time your parents did hired farm work in the U.S.A?” had the highest nonresponse rate, ranging from 1.1 in fiscal years 2014 and 2016 to 3.5 percent in FY 2011.
The average nonresponse rate for items that have a skip pattern was less than 2 percent for each fiscal year, ranging from 0 to 9.4 percent. The item “Does this employer keep in contact with you about future employment before leaving at the end of the season?” had the highest nonresponse rate, ranging from 5.1 percent in FY 2016 to 9.4 percent in FY 2013.
The NAWS is expected to have similar unit and item response rates for FY 2020.
Employer Response
To maximize employer response, the NAWS contractor will send an advance letter to agricultural employers and provide them with a brochure explaining the survey. The letter will be signed by the survey director and will include the names of the interviewers and their contact information. For further information or questions, the letter and brochure will direct employers to contact either the survey contractor at a toll-free number, or the Department of Labor’s (the Department) Contracting Officer’s Technical Representative (COTR). Employer calls will be returned quickly. In addition, the NAWS contractor will provide the COTR a list of scheduled interview trips. The list will include the counties and states where interviews will be conducted, the names of the interviewers who will be visiting the selected counties, and the dates the interviewers will be in the selected counties. The COTR will refer to the list whenever an employer calls to confirm the interviewers’ association with the survey.
Both the Department and the contractor will make presentations on the survey and will provide survey information (e.g., questionnaires) to officials and organizations that work with agricultural employers. The NAWS has received the endorsement of several employer organizations. This improves the response rate since agricultural employers sometimes call their employer organization when considering survey participation.
Before interviewers receive employer lists, NAWS office staff attempts to verify the employer’s address and contact information and searches for addresses and phone numbers for employers for whom address and phone information is missing. In addition, NAWS office staff searches for physical addresses for employers with listed addresses that are post office boxes, or lawyers’ or accountants’ offices. Results of successful searches along with information received when advance letters are returned are incorporated in the employer contact list that is distributed to interviewers.
To increase employer response, interviewers will be instructed to make at least three contact attempts at different times of the day and on different days of the week. At least one of these contacts is to be an in-person attempt at the employer address. Interviewer contact attempts will be logged and the logs will be monitored for compliance. Interviewers will be instructed to accommodate an employer’s preference for scheduling surveys and, if needed, the interviewer can request an extension of the field period.
Intensive and frequent interviewer training will also be conducted as a means to increase employer response rates. Interviewers will be trained in pitching the survey in various situations and they will be trained to understand the history, purpose, and use of the questionnaire. They will be prepared to easily answer any questions or address any concerns an employer might have. In addition, when explaining the purpose of the survey to employers, interviewers will be trained to clearly distinguish the survey from enforcement efforts by the Department of Homeland Security, DOL and other Federal agencies, and assure employers that their information will be kept private.
Worker Response
The survey’s methodology has been adapted to maximize response from this hard-to-survey population. Interviewers will pitch the survey to workers in English or Spanish, as necessary. All interviewers are bilingual. In addition, interviewers will make sure that potential respondents know that they are not associated with any enforcement agency (e.g., Immigration and Customs Enforcement). Interviewers will explain the survey to workers and obtain their informed consent verbally.
Crop workers receive a $20 honorarium to enable the survey to achieve an estimated worker response rate above 90 percent. Research indicates incentives increase response rates in social research (Ryu, Cooper, & Marans, 2006). According to the National Science Foundation, monetary incentives improve study participation and offset the costs of follow-up and recruitment of non-respondents (Zhang, 2010).
b) Addressing Nonresponse
Possible worker nonresponse bias
Based on previous years of NAWS data collection, it is anticipated that the worker response rate will be above 80 percent. This high level of worker response exceeds the level at which a nonresponse bias analysis is needed. The NAWS weights include a worker nonresponse adjustment since worker nonresponse may be high within an employer.
Possible employer nonresponse bias
High rates of employer nonresponse are a concern. It is important to determine if non-response is random or if there may be bias due to systematic differences in characteristics among respondents and nonrespondents. NAWS staff conducted several analyses of sampling frame data and paradata to examine whether there was the potential for employer nonresponse bias due to lower-than-recommended employer response rates (see the supplemental report Summary of Nonresponse and Design Studies included in this submission). The results of these analyses did not support the need for nonresponse bias adjustment beyond what is already incorporated in the nonresponse weights.
NAWS staff will continue to assess possible nonresponse bias using the different analyses listed below. These analyses will utilize NAWS sampling frame data, paradata, 2017 Census of Agriculture data, and data from the QCEW.
The first analysis will assess NAWS nonresponse bias by comparing information in the sampling frame on eligible respondents and non-respondents. While the sampling data is somewhat sparse for non-respondents, three pieces of information are useful: geographic location, NAICS code, and the source used to obtain employer names. The NAWS will use three sources of employer names: a) the BLS UI list, b) marketing lists, and c) internet searches and contacts with knowledgeable local individuals. Geographic area and source lists are available for all employers, while NAICS codes are available for all employers who pay UI taxes, marketing list employers, and some additional employers. Employers without a NAICS code will be analyzed as a distinct group if there are sufficient numbers of these employers.
Using all three variables (source, NAICS, and geography), we will make the following comparisons:
Employers allowing interviews compared to sampled employers that refused or could not be screened (i.e., excluding the ineligible),
Employers allowing interviews compared to eligible employers who refused, and
Eligible employers compared to unscreened sample members (employers whose eligibility could not be determined).
Nonresponse bias will be calculated using the bias calculation formula from OMB’s Standard and Guidelines for Statistical Surveys (2006). The formula defines bias for a particular estimate, , as the following:
where:
yt = the mean based on all sample cases;
yr = the mean based only on respondent cases;
= the mean based only on the nonresponding cases;
n = the number of cases in the sample; and
nnr = the number of nonresponding cases
The second analysis will examine employers who are not successfully screened during the data collection cycle. After interviewers leave a location, NAWS staff will continue contact attempts with unscreened employers to determine their eligibility status. Some growers may be unavailable during the interview period, either because they are too busy or because they are out of season. The goal will be to determine whether further contact attempts or lengthening the onsite data collection period would result in finding eligible employers who are characteristically different from the responding employers, and thus the extent of any nonresponse bias.
The third analysis will use a Markov chain analysis to incorporate information from prior data periods about growers’ states – whether eligible, ineligible, or unable to be determined – and will look at the impact on response rates. A small number of agricultural employers appear on the survey’s sampling list in multiple administrations of the survey. Attempts to contact these employers may have had different outcomes at different time periods. Predicted states from the model can be used to examine possible bias.
While the first three studies look at employer nonresponse, the question remains whether non-responding employers have workers with different characteristics than responding employers. Previous Markov chain analyses have shown that employers do change state from survey refusal to participation and vice-versa. The fourth study will compare the survey responses of workers whose employers always participate in the NAWS to the responses of workers whose employers sometimes participate and sometimes refuse.
Possible item nonresponse bias
As discussed previously, the NAWS has item response rates that exceed 90 percent, eliminating the need for a nonresponse bias analysis for specific items. Due to the low rates of missing data, NAWS data analysis generally uses case-wise deletion. No imputations are included in the public access data.
c) Reliability
A probability sampling methodology will be used and estimates of the sampling errors will be calculated from the survey data.
d) Estimation Procedure
At the highest level of the sampling design, the region/cycle level, stratified sampling is used. Sampling is then carried out at the lower levels, independently within each stratum.
The following description is excerpted from Obenauf1:
The stratified sampling technique divides the entire population into relatively homogenous groups that are mutually exclusive and exhaustive. Samples are then drawn from each of these groups (strata) by simple random sampling or an alternate method. The entire sample is a compilation of these independent samples from each of the strata. In stratified sampling, an estimate of the population mean can be made for each of the strata.
Estimate of population mean:
,
where Nk is the population size of stratum k and L is the number of strata into which the population is divided.
If a simple random sample is taken within each stratum (recall that other schemes can be used to draw a sample from each of the strata), the following represents an unbiased estimate of the variance of :
.
The standard error of the estimator is the square root of this estimated variance, or
.
At the second stage of the sampling design, within each stratum, counties (or groups of counties) are treated as clusters (FLAs). The following description is another excerpt from Obenauf2:
The population is again divided into exhaustive, mutually exclusive subgroups and samples are taken according to this grouping. Once the population has been appropriately divided into clusters, one or more clusters are selected … to comprise the sample. There are several methods of estimating the population mean for a cluster sample. The method most pertinent to this study is that involving cluster sampling proportional to size (PPS).
With PPS sampling, the probability (zj) that a cluster j is chosen on a specific draw is given by , where Mj is the size of the jth cluster and M is the population size. An unbiased estimate of the population total is given by
,
where yj is the sample total for y in the jth cluster, n is the number of clusters in the sample and 2 represents the average of the cluster means.
To estimate the population mean, this estimate must be divided by M, the population size.
The variance of the estimator of the population total is given by
This is estimated by , where is the sample variance of the values.
For an estimate of the population mean,
and .
In two-stage cluster sampling, the estimated variance of the estimator is then given by an iterative formula:
.
This iterative formula is then generalized to compute the variance of the estimators in multi-stage sampling schemes with three or more levels. Exact formulas become intractable at this point, and the various statistical software packages rely upon either re-sampling methodology or linear approximations in order to estimate the variances and standard errors of the estimators.
The following is an excerpt from the SAS documentation for PROC SURVEYMEANS3.
The SURVEYMEANS procedure produces estimates of survey population means and totals from sample survey data. The procedure also produces variance estimates, confidence limits, and other descriptive statistics. When computing these estimates, the procedure takes into account the sample design used to select the survey sample. The sample design can be a complex survey sample design with stratification, clustering, and unequal weighting.
PROC SURVEYMEANS uses the Taylor expansion method to estimate sampling errors of estimators based on complex sample designs. This method obtains a linear approximation for the estimator and then uses the variance estimate for this approximation to estimate the variance of the estimate itself (Woodruff 1971, Fuller 1975)4,5.
SAS (e.g., PROC SURVEYMEANS) allows the user to specify the details of the first two stages of a complex sampling plan. In the present case, the stratification and clustering at the first two levels are specified in PROC SURVEYMEANS (strata cycle and region; cluster FLA). At the lower levels of the sampling scheme, the design attempts to mimic, as closely as is practical, simple random sampling. The software is not able to calculate exact standard errors, since it presumes true simple random sampling beyond the first two levels. The sampling weights will remedy any differences in selection probabilities so that the estimators will be unbiased.
In the SURVEYMEANS procedure, the STRATA, CLUSTER, and WEIGHT statements are used to specify the variables containing the stratum identifiers, the cluster identifiers, and the variable containing the individual weights.
For the NAWS, the STRATA are defined as the cycle/region combinations used for the first level of sampling. The CLUSTER statement contains the primary sampling unit, which is the FLA.
The WEIGHT statement references a variable that is for each observation i, the product of both the sampling weight Wti and the nonresponse weight PWTYCRDi. This variable is called PWTYCRD for historic reasons. The PWT refers to a weight for the population and thus includes the season weight, and the YCRD means that the weight includes year, region, cycle, and day components.
The SURVEYMEANS procedure also allows for a finite population correction. This option is selected using the TOTAL option on the PROC statement. The TOTAL statement allows for the inclusion of the total number of PSUs in each stratum. SAS then determines the number of PSUs selected per region from the data and calculates the sampling rate. In cases such as the NAWS, where the sampling rate is different for each stratum, the TOTAL option includes a reference to a data set that contains information on all the strata and a variable _TOTAL_ that contains the total number of PSUs in each stratum.
We include here the sample code for PROC SURVEYMEANS to calculate the standard errors for our key estimator WAGET1.
proc surveymeans data=naws.crtdvars total=naws.regioninfo;
strata region12 cycle;
cluster FLA;
var waget1;
weight pwtycrd;
e) Precision of Key Estimators
The NAWS is primarily a surveillance survey that provides descriptive statistics about the United States crop worker population. Periodic reports posted to the website and presentations at conferences and stakeholder meetings are used to disseminate the survey results. In addition, the data are used by researchers, policy analysts and service program staff primarily for program planning and policy analysis. Two key variables of interest to these groups are FWRDAYS, which is the number of days worked per year by a respondent, and WAGET1, which is the average hourly wage of a respondent. Based on data collected in fiscal years 2015 through 2016, a combined sample size of 5,342 respondents, using the NAWS current weights, the 2-standard-error confidence interval for FWRDAYS was 192 days ± 11.4 days. That is, with approximately 95 percent confidence, the average number of days annually worked, per person, lies between 180 and 203 days. This constitutes a margin of error of ±5.9 percent of the estimated value.
For average wage (WAGET1), the 2-standard-error confidence interval was 10.60 ± 0.25. With approximately 95 percent confidence, the average wage lies between $10.35 and $10.85. This yields a margin of error of ±2.4 percent of the estimated value.
There are numerous other variables of interest, whose standard errors vary greatly. These two are offered as examples that show some of the range of possible precisions obtained.
The questionnaire to be used in the survey was developed by the DOL with input from various Federal agencies over many years. The survey questions will be unchanged from the version that OMB approved in the last submission. The majority of these questions have been used for over 20 years, are well understood by the sampled respondents, and the data they provide are of high quality.
This submission seeks approval to continue administering previously-approved Health Resources and Services Administration and National Institute for Occupational Safety and Health– sponsored questions on preventive health and mental health, respectively, as well as previously-approved ETA questions on education and training and access to and use of digital information devices. As discussed in the previous submission, these questions were developed in concert with the sponsoring agency and experts in the question domains.
6. Statistical Consultation
The following individuals have been consulted on statistical aspects of the survey design:
Stephen Reder and Robert Fountain, Professors, Portland State University, (503) 725-3999 and 503-725-5204; Phillip Martin, Professor, University of California at Davis (916) 752-1530; Jeff Perloff, Professor, University of California at Berkeley (510) 642-9574; John Eltinge, the Bureau of Labor Statistics (BLS) (202) 691-7404; Daniel Kasprzyk, Frank Potter, Eric Grau, Steve Williams, Amang Sukasih, Raquel af Ursin, and Yuhong Zheng of Mathematica Policy Research (609) 799-3535; China Layne, Summit Consulting, LLC (202) 407-8328, and Richard Valliant, Joint Program for Survey Methodology, University of Maryland (301) 405-0932.
The data will be collected under contract to JBS International, Inc. (650) 373-4900. Analyses of the data will be conducted by Daniel Carroll, ETA (202) 693-2795, and JBS International, Inc.
REFERENCES
Fuller, W.A. (1975). Regression Analysis for Sample Survey, Sankhyā, 37, Series C, Pt. 3, 117-132.
Obenauf, W. (2003). An Application of Sampling Theory to a Large Federal Survey. Portland State University, Department of Mathematics and Statistics. 2003.
Ryu, E., Couper, M, & Marans, R. (2006) Survey incentives: Cash vs. in-kind; Face-to-face vs. mail; Response rate vs. nonresponse error. International Journal of Public Opinion Research, 18 (1): 89-106.
SAS Institute Inc., SAS/STAT® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999, 61, 3.
Woodruff, R. S. (1971). A Simple Method for Approximating the Variance of a Complicated Estimate, Journal of the American Statistical Association, 66, 411–414.
Zhang, F. (2010). Incentive experiments: NSF experiences. (Working Paper SRS 11-200). Arlington, VA: National Science Foundation, Division of Science Resources Statistics.
1 Obenauf, W. (2003), “An Application of Sampling Theory to a Large Federal Survey”, Portland State University Department of Mathematics and Statistics.
2 Obenauf, W. (2003), “An Application of Sampling Theory to a Large Federal Survey”, Portland State University Department of Mathematics and Statistics.
3 SAS Institute Inc., SAS/STAT® User’s Guide, Version 8, Cary, NC: SAS Institute Inc., 1999, 61, 3.
4 Woodruff, R. S. (1971). A Simple Method for Approximating the Variance of a Complicated Estimate, Journal of the American Statistical Association, 66, 411–414.
5 Fuller, W. A. (1975). Regression Analysis for Sample Survey, Sankhyā, 37, Series C, Pt. 3, 117–132.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Author | carroll.daniel.j |
File Modified | 0000-00-00 |
File Created | 2021-01-15 |