Statistical Methodology for the Conservation Effects Assessment Project Cropland Farmer National Surveys

0245 - Statistical Methodology for the Conservation Effects Assessment Project Cropland Farmer National Surveys - Feb 2021.pdf

Conservation Effects Assessment Project

Statistical Methodology for the Conservation Effects Assessment Project Cropland Farmer National Surveys

OMB: 0535-0245

Document [pdf]
Download: pdf | pdf
Statistical Methodology for the
Conservation Effects Assessment Project
Cropland Farmer National Surveys

Patrick E. Flanagan, Ph.D. 1
National Statistician
Natural Resources Conservation Service
U. S. Department of Agriculture
February 2021

1

Most of the CEAP Sampling Overview and the CEAP1 Survey Design Sections written by J. Jeffrey Goebel (2009)

I.

Introduction

A. Purpose
The purpose of the National Assessment for Cropland (CEAP-Cropland) is to estimate the
environmental benefits and effects of conservation practices applied to cultivated cropland and
cropland enrolled in long-term conserving cover (e.g., the Conservation Reserve Program).
The CEAP-Cropland Component of the National Assessment has three specific goals:
•
•
•

Estimate the effects of conservation practices currently present on the landscape.
Estimate the need for conservation practices and the potential benefits of additional
conservation treatment.
Simulate alternative options for implementing conservation programs on cropland in
the future.

The ultimate goal of CEAP-Cropland is to report conservation effects in terms that represent
recognizable outcomes, such as cleaner water and soil quality enhancements that will result in
more sustainable and profitable production over time.

B. Background
The Conservation Effects Assessment Project (CEAP) was initiated in 2002 as a means by which
to analyze societal and environmental benefits gained from the 2002 Farm Bill’s substantial
increase in conservation program funding. Though the National Resources Inventory (NRI)
collected data on agricultural land, including erosion and conservation practices, links between
those data were not directly discernable. In addition, many details about management
practices, use of chemicals, and other aspects that affect the environment were not collected.
The CEAP Cropland Farmer Surveys were designed to collect that data and link them via physical
process models to environmental effects necessary to achieve the CEAP goals.

II.

Survey Sample Designs

A. Overview
The objective of the NRI-CEAP Cropland Survey was to obtain additional site specific data
needed to utilize the field-level process model APEX to estimate field-level effects of
conservation practices. The process model was run for a sub-sample of NRI sample points;
inputs for a sample point included historical NRI site specific data, data obtained from the NRICEAP Cropland Survey for the agricultural field where the sample point is located, additional
information on conservation practices from Field Office records, soil properties and
characteristics associated with the particular soil at the sample point location, and climate data

associated with the sample point location. The input data associated with a particular point
describe a “representative field;” outputs from the process model runs include losses of
materials (such as sediment and chemicals) from this field and changes in condition (such as
accumulation of carbon). These outputs are used to estimate both on-site and off-site effects.
The APEX model outputs can be treated like other NRI variables; the site specific results for each
sample point can be aggregated or averaged for some meaningful portion of the landscape
using statistical weights. The statistical (survey) weight for an NRI sample point is the acreage
value assigned to that sampling unit based upon the sampling design and certain control figures
[derivation of weights for the NRI-CEAP Cropland Survey is discussed in Section VI, Estimation
Procedure]. The APEX model outputs also serve as inputs into hydrologic models that simulate
transport of water, sediment, and chemicals from the land into and through stream networks
and eventually into estuaries and oceans. The NRI-CEAP data and the models can then be used
to estimate changes in in-stream concentration of sediment and chemicals that result from
changes in land management.
The sampling strategy utilized for the NRI-CEAP Cropland Survey was to select a sub-sample of
NRI sampling units from the NRI Foundation Sample; in particular, a subset of sample points was
selected from those sampling units used for the 2002 and 2003 Annual NRI surveys. Sampling
strategies for the NRI Foundation Sample, Annual NRI surveys, and the NRI-CEAP survey are
discussed below. The NRI sampling structure provided a natural framework for the data
collection and modeling activities needed to support the CEAP national cropland assessment; it
also provided efficiency to the process because sample locations were already identified and
significant data already existed for these sites. The full collection of NRI sample sites provides a
statistically credible representation of the diversity of soils, climate, cropping systems, and
natural resource issues for the Nation’s agricultural lands. Data collection activities were spread
over a four-year period because of financial constraints and operational considerations. A
different set of sample points was selected for each year. The goal was to develop a data base
that supported statistical analysis of the benefits of conservation practices at the national and
regional levels.

B. 2003 – 2006 Survey (CEAP1)
The target population for the NRI-CEAP Cropland Survey was all land in the 48 contiguous states
that is classified by NRI as having a land cover/use of “cultivated cropland” or “land in CRP.”
Cultivated cropland is defined by NRI as “land in row or close-grown crops, including hayland
and pastureland in rotation with row or close-grown crops;” land in CRP is “land that was under
a Conservation Reserve Program (CRP) contract.”
The sampling approach utilized for the NRI-CEAP Cropland Survey was to select a sub-sample of
Annual NRI sample points. In particular, the sample comes from sampling units selected initially
for the 2002 and 2003 Annual NRI surveys. The sampling strategy developed for the farmer
surveys included:

o

o

o

Collect data for 20,000 sample sites over a four year period, in order to obtain a full
representation of the diversity of cropping systems, resource concerns, farming activities,
conservation practices, soils, climate, and other natural resource conditions on cultivated
cropland; and to obtain insight into implementation of conservation systems associated
with the 2002 Farm Bill. [sample sites are cropland fields associated with NRI sample points;
the Foundation NRI sample contains about 200,000 cropland points].
Sampling and data collection for 2003 and 2004 were to focus on developing a good baseline for the most predominant cropping and conservation systems, to make sure that
credible statistical analyses could be made on a national basis for all U. S. cultivated
cropland.
Sampling and data collection for 2005 and 2006 were to have a complementary focus: (a) to
obtain data for areas and systems that are less extensive but usually more environmentally
sensitive (vulnerable); and (b) to obtain data on actual changes in conservation systems and
practices that occurred due to implementation of 2002 Farm Bill provisions – data collection
in 2005 and 2006 provided a fuller and broader perspective, since some practices were not
installed until after 2003.

An NRI sample point is used to identify a field in order to determine land cover/use and
management systems; similar protocols are used to determine the natural or inherent features,
such as soil type or erosion equation factors. The NRI utilizes points as the sampling units rather
than farms or fields; land use and land unit boundaries change frequently in some parts of the
country, and factors such as soil type do not follow human-induced boundaries such as land unit
boundaries. Sample point coordinates are known based upon Digital Ortho-Photo Quadrangle
(DOQ) base maps and standards. The temporal nature of desired results was handled in several
ways: (i) the NRI-CEAP farmer survey collected site specific data for several years, and historical
NRI data are available for each sample point; (ii) conservation practices, other agricultural
management systems, and acts of nature have long-term effects upon the environment – the
process models used to quantify effects produce results by year and season; (iii) the Annual NRI
utilizes a supplemented panel survey design, wherein each year’s sample includes a Core Panel
(sampling units observed each year) and a Supplemental (or rotating) Panel – this provides the
flexibility to revisit sample units over the course of time.

Sample for 2003 Survey
The sample for the 2003 NRI-CEAP Farmer Survey was selected from the 2002 Annual NRI
sample points classified as having a land cover/use of either cultivated cropland or land in CRP
for the 2002 growing season. In particular, the samples were selected from the supplemental
panel P02, as follows:
(a) Any sample point in P02 classified as “land in CRP” for 2002 was included.
(b) Sample points classified as “cultivated cropland” were selected as follows:
o it was determined which segments in P02 contained at least one point classified as
“cultivated cropland” for 2002
o within each of those segments, one point classified as “cultivated cropland” in 2002 was
selected randomly.
(c) For South Dakota and North Dakota, one-half of these points were not sampled; systematic
sampling was used to select half of the points. The sampling rate was reduced due to lack of
available interviewers within these two states.

(d) An additional 333 points were removed from the sample because they represented farm
operators that had also been selected for the ARMS-II survey. These samples were removed
from the survey so that respondent burden for ARMS-II would not be affected. An initial
examination of these overlap samples indicated that no bias should be expected; the
samples were distributed across the country in proportion to cropland occurrence. This will
be verified as part of a post-survey statistical evaluation of non-response, which will utilize
historical NRI information and operator information collected from NRCS field offices.
Sample sizes by state are presented in Table 6. The sample included 2,236 CRP sample points
and 9,580 cultivated cropland points.

Sample for 2004 Survey
The sample for the 2004 NRI-CEAP Cropland Survey was selected from the 2003 Annual NRI
sample points classified as having a land cover/use of either cultivated cropland or land in CRP
for the 2003 growing season. In particular, the samples were selected from the supplemental
panel P03, as follows:
(a) Any sample point in P03 classified as “land in CRP” for 2003 was included.
(b) Sample points classified as “cultivated cropland” were selected as follows:
o it was determined which segments in P03 contained at least one point classified as
“cultivated cropland” for 2003
o within each of those segments, one point classified as “cultivated cropland” in 2003 was
selected randomly.
The sample included 2,268 CRP sample points and 10,148 cultivated cropland points.

Sample for 2005 Survey
The sample for the 2005 NRI-CEAP Cropland Survey was selected from the 2003 Annual NRI
sample points classified as having a land cover/use of either cultivated cropland or land in CRP
for the 2003 growing season. In particular, the samples were selected from the Core Panel P00,
as follows:
(a) Any sample point in P00 classified as “land in CRP” for 2003 was included.
(b) Sample points classified as “cultivated cropland” were selected as follows:
o it was determined which segments in P00 contained at least one point classified as
“cultivated cropland” for 2003
o within each of those segments, one point classified as “cultivated cropland” in 2003 was
selected randomly.
(c) The following randomization process was used to eliminate all cropland sample points in 10
states:
o Minnesota and Wisconsin were paired [placed in Stratum A]; each was given an equal
chance of selection. Minnesota was kept in the sample and Wisconsin was selected for
elimination.
o North Dakota and South Dakota were paired [placed in Stratum B]; each was given an
equal chance of selection. South Dakota was kept in the sample and North Dakota was
selected for elimination.
o The states of Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, and
Connecticut were combined into a New England Grouping. New York and the New

England Grouping were paired [placed in Stratum C]; each was given equal chance of
selection. New York was kept in the sample and the New England Grouping was
selected for elimination.
o The states of Montana, Colorado, Wyoming, Utah, and New Mexico were grouped
[placed in Stratum D]; each was given an equal chance of selection. Colorado, Montana,
and Utah were kept in the sample; Wyoming and New Mexico were selected for
elimination.
(d) Sample sizes for cultivated cropland were reduced in 11 states, as follows:
o randomization techniques were utilized that reduced the sample by one-third in four
states: Kansas; Minnesota; North Carolina; Ohio
o randomization techniques were utilized that reduced the sample by one-half in two
states: South Dakota; Texas
o randomization techniques were utilized that reduced the sample by two-thirds in five
states: Illinois; Indiana; Iowa; Missouri; Nebraska
(e) No cropland points in Florida, Nevada, and West Virginia were included for the 2005 survey;
problems had been encountered in the 2003 and 2004 surveys. These three states were
included for the 2006 survey.
Sample sizes by state are presented in Table 6. The sample included 3,893 CRP sample points
and 7,489 cultivated cropland points. The sample size for cultivated cropland was about 25%
less than for each of the earlier years; less funding was available for conducting farmer
interviews.

Sample for 2006 Survey
The primary objective for sampling in 2006 was to provide a greater ability to make regionallevel assessments (rather than just national), particularly by Major River Basin. Stratified
sampling techniques were used to concentrate on fields in the most environmentally sensitive
(or vulnerable) areas in order to provide more precise estimates of the effects of conservation in
areas where the impacts of conservation are the greatest; sampling in 2003, 2004, and 2005
provided appropriate representation for predominant situations that covered 90% of the
cropland base. Funding existed to conduct approximately 6,000 farmer interviews for cultivated
cropland fields; no additional tracts of CRP land were selected.
Each county was ranked relative to its potential for soil and nutrient loss from cropland, by using
the National Nutrient Loss and Soil Carbon (NNLSC) database which contains estimates based
upon EPIC model runs for 1997 NRI cropland sample points [see Potter et al (2006)]. The NNLSC
database used general information on farming practices that was imputed onto the NRI
cropland sample points. County level estimates were derived for: wind erosion, waterborne
sediment, nitrogen loss in sediment, phosphorus loss dissolved in runoff, nitrogen loss dissolved
in runoff, and nitrogen loss dissolved in leachate. County vulnerability rankings were derived
using these seven factors as follows:
o

A county was classified with vulnerability rank 1 if it had an estimated value for at least one
factor in the top 10%; for wind erosion, the factor needed to be in the top 3% of all counties
because 85% of all counties do not have significant cropland wind erosion. This category
contained 658 counties.

o
o
o

A county was classified with vulnerability rank 2 if it was not classified as vulnerability rank 1
but had an estimated value for at least one factor in the top 20% [top 5% for wind erosion].
This category contained 385 counties.
A county was given a vulnerability rank 3 if its vulnerability could not be estimated from the
NNLSC database and it contained at least 20,000 acres of cultivated cropland. This category
included 70 counties.
Counties with low and very low vulnerability according to these seven factors were given
vulnerability ranks 4 and 5 respectively. There were 736 counties with rank 4 and 1,255
counties with rank 5.

The sample for the 2006 NRI-CEAP Farmer Survey came from 2003 Annual NRI sample points
that had not been selected for previous farmer surveys. Each state and county had a different
assortment of available cultivated cropland sample points relative to the county vulnerability
rankings described above. The 2006 sample is not a stand-alone sample as are the samples for
the three previous years. Some areas had no probability of selection for the 2006 survey; the
2006 results can only be used in conjunction with data collected for previous survey years.
For the 2003, 2004, and 2005 NRI-CEAP Farmer Surveys, sample points were spread out across
states and counties as much as possible given the nature of the 2002 and 2003 Annual NRI
samples. For example, only one cultivated cropland point per sample segment was selected for
the farmer surveys; this spread out the sample and also greatly reduced the chance that the
same farmer or operator was included in the sample more than one time in a given year. This
was a restriction put in place following discussions with USDA-NASS and the Office of
Management and Budget (OMB) in an effort to reduce respondent burden. For the 2006
sample, it was necessary to select some sample points in sample segments that had been used
for the 2004 or 2005 sample.
One of the basic methods of sample selection for 2006 was as follows:
o
o
o
o

determine which segments in P00 and P03 had at least two points classified as cultivated
cropland in 2003
if the segment had two points classified as cultivated cropland in 2003 and the county had
vulnerability rank less than 4, select the sample point not used for either the 2004 or 2005
survey
if the segment had three points classified as cultivated cropland in 2003 and the county had
vulnerability rank less than 4, randomly select one of the two sample points not used for
either the 2004 or 2005 survey
no sample points were selected in counties with vulnerability rank 4 or 5.

This procedure was used for Alabama, Arizona, California, Colorado, Kentucky, Michigan,
Mississippi, New Jersey, North Carolina, Oklahoma, Oregon, Tennessee, Utah, Virginia, and
Washington. The modified procedure used for Arkansas, Georgia, Idaho, Louisiana, Maryland,
Pennsylvania, and South Carolina was that only sample points from P03 were used.
Florida, Maine, Massachusetts, Nevada, New Mexico, Vermont, and West Virginia used sample
points in all P00 segments not used for the 2005 survey. For Indiana, Iowa, and Nebraska,
sample points were selected from all P00 segments not used for 2005 for counties with rank 1,
and half in counties with rank 2; for Delaware, Missouri, North Dakota, Wisconsin, and

Wyoming, all eligible P00 points were selected except only half in counties with rank >3. For
Kansas and Texas, sample points were selected from all P00 segments not used for 2005 for
counties with rank < 4; and sample points were selected from all eligible P03 segments in
counties with rank 1, and half were selected for counties with rank 2 or 3. For Minnesota and
South Dakota, all eligible sample points in counties with rank 1 or 2 were selected, and half of
the P00 rank 4 or 5 sample points. For Connecticut, half of the P00 points were selected. For
Illinois, sample points in P00 segments not used for 2005 were used in counties with rank 1;
sample points were selected for half of the segments in counties with rank > 1. For Montana,
all eligible sample points in counties with rank 3 were selected; sample points were selected
from segments in half of the eligible P03 counties with rank 4 or 5. For Ohio, all eligible points in
segments in counties with rank 1 and 2 were selected, except for half of the P03 segments with
rank 2. For New York, all eligible points in segments in counties with rank 1 and 2 were
selected, except for P00 segments with rank 2. No sample points were selected in New
Hampshire and Rhode Island.

C. 2015 – 2016 Survey (CEAP2)
Frame
A point is included in the frame if the most recent collected land cover/use (LCU)
satisfies one of the following conditions:
•
•
•
•

LCU > 0 and LCU < 200 and LCU ≠ 7
LCU in {211, 212, 213}
LCU = 200 and not range
LCU = 410

The states included in the frame are the coterminous 48 states (not including DC).
Aquaculture (171) is treated just like other cropland LCUs.
Points that are classified as urban or roads in 1997 and as 200-213 in the most
recent collected year are not eligible. The last set of points removed contains 376 points,
one of which is 213 in the most recent year and the rest of which are 200. These 376 points
are removed because NRI editing procedures change the a collected LCU of 200-213 to
urban.
Each point is classified into one of five mutually exclusive and exhaustive LCU
groups. With \t" representing the most recent year, the LCU categories obtained from
this program in item 2 are defined as follows:
•
•
•
•
•

1 = LCU(t) in 1 – 20 (high value specialty crops)
2 = LCU(t) in 141-144 and LCU(t-1), LCU(t-2), LCU(t-3) not in the set 11 – 116
3 = LCU(t) in 200-213 and LCU(t-1), LCU(t-2), LCU(t-3) not in set 11-116
4 = LCU(t) in 21-116, 170, 171, 180, or LCU(t) in 141-144 and at least one of LCU(t-1),
LCU(t-2), LCU(t-3) in the set 11-116, or LCU(t) in 200-213 and at least one of LCU(t-1),
LCU(t-2), LCU(t-3) in the set 11-116
5 = LCU 410 (CRP)

The combination of groups 1, 2, and 4 below approximates the NRI definition of cropland
(cultivated and non-cultivated combined). Category 3 approximates the NRI definition of
pasture. (A perfect classification of NRI points into broaduses is not possible with the data
available because of the 2004 protocol change.) The five LCU categories are aggregated to the
following three groups.
•
•
•

Cropland (1): LCU category of 1, 2, 4
Pasture (3): LCU category of 3
CRP (4): LCU category of 5

The CRP category is labeled 4 for consistency with the original code of 410
Ten CEAP production regions are defined for the 2015-2016 National survey. Figure 1 shows a
map of the ten production regions. The specifications for the regions are outlined in Table 1.

Figure 1: CEAP Production Regions
1

Working Name
Pacific Coast

LRR
A, C

Refinements
None

Concept or Rationale
High value irrigated crops.
Water quantity issues

2

Irrigated West

B, D, E

None

3

Northern Plains
Wheat Belt
Southern Plains
Wheat/Cotton
Corn Belt
South Central
Pasture and Crop
East Central Pasture
and Crop
Southeast Mid
Atlantic Coastal Plain

F, G

Dryland wheat
predominant Wind ero.
Aquifer driven

Lower MS River and
Texas Gulf
Northeast

4
5
6
7
8
9
10

H, I
K, L, M
J, P. and western
N
Eastern portion N

Less 109 from LRR L
P west of MS River

T, P, U

East of MLRA 134

O, T, and P

West with MLRA 134

Plus MLRA 136

R, S, + 101 from L

Water quantity resource
concern note the non Irr
portions
Dominated by wheat and other
small grains
Wheat cotton region, wind
erosion and aquifer depletion
Corn soy dominated region
Animal ag and manure on
crop/pasture hay common
Animal ag and manure on
crop/pasture hay common
Split from other for less rain
intensity and courser soil in
general
Silty soils with cotton rice and
cane. Intense rain

Table 1: Description of 10 CEAP Production Regions for Sampling

Sample Size by CEAP Region
Let 𝐴𝐴̂ℎ be the estimate of the area in CEAP-eligible categories in CEAP region h in the year 2010.
The estimated areas are obtained from the 2010 pointgen. Due to topological errors in the CEAP
region shapefile, not all NRI points are in a CEAP region. The total area corresponding to points
classified in eligible LCUs that are not in a CEAP region is 83,800 acres. The total estimated area
in CEAP-eligible categories is 480,436,800 acres. Because the area associated with missing CEAP
regions is only 0.017% of the estimated area based on points located in a CEAP region polygon,
the area with missing CEAP regions is ignored for the sample size calculation. Define a target
sample size for CEAP region h by nh, the result of accumulating and rounding the 𝑛𝑛�ℎ , where
𝑛𝑛�ℎ =

𝑁𝑁𝐴𝐴̂0.5
ℎ
̂0.5
∑𝐻𝐻
ℎ=1 𝐴𝐴ℎ

and N = 45,000: This yields the sample sizes in Table 3 below. The square root
allocation is often used as a compromise between equal allocation, optimal for individual
area estimates, and proportional allocation, optimal for the total of the regions combined
(Bankier, 1988)
CEAP.Reg
nh

R1
2420

R2
3881

R3
5733

R4
5651

R5
8701

R6
4070

R7
4120

Table 3: Target sample sizes by CEAP region

R8
3611

R9
3700

R10
3113

A more detailed discussion of the sampling, adjustment to the totals, and analysis or results are
provided in Appendix B.

Change to the 2016 CEAP Sample
A point is included in the initial frame if the point's latest observed LCU satisfies at least
one of the following:
•
•
•
•

> 0, < 200, not 7
211 – 212
200 and not range
410

We refer to the above set of LCUs as the eligible set. We add several flags to use at
various stages of the processing. We call the following LCU-ineligible:
•
•
•
•
•

BDB-ineligible: !(collect14=0 or (collect14=200 and range14 = 0)) Points ineligible
according to the information available in collect for 2014.
REMOVED
FEDERAL
Urban Change (latest LCU = 200 and 1997 = urban)
Latest LCU not eligible set (This can occur if new data were collected or if data were
edited.)

We call the following NRI-ineligible:
•
•
•
•

LA no 1997 data
In 2001 not 2014
Killed
Salvaged

We replace points that are in the current 2016 sample and are ineligible. We define a
point to be ineligible if it is either LCU-ineligible or NRI-ineligible. Points in the 2013 or
2014 sample are not replaced and are not used as replacements in this step. The
removed points are replaced using the following algorithm:
•
•

For psus containing removed points that also contain least one other eligible
point, randomly select one of the eligible points from the psu.
Remaining psus do not contain at least one other eligible point. Define a stratum
to be an intersection of a CEAP region, segment LCU group (defined for the
original 2015-2016 sample), sample class, and state. Tabulate the number of
psus to replace in the stratum. Select one psu at random from the psus
containing at least one eligible point in the stratum. Randomly select one point
from the eligible points in the selected psu. Prior CEAP points are selected with
priority when selecting points from segments.

We maintain the one point per segment rule. Segments containing points in the 2013,
2014, or 2015 samples are not eligible for selection as replacements. Likewise, segments
containing at least one usable point for 2016 are not eligible as replacements.

A total of 789 points are replaced in this fashion. This procedure replaces all but 44 of
the points that need to be replaced. We select 44 additional crop points in the next step
to compensate for these 44 points.
Selection of Additional Crop Points
To reach 25,000 total points, we select 1376 additional crop points. Two segments containing
eligible crop points are selected in region 6, and randomly selected points in these two
segments are added. The crop sample sizes for the remaining CEAP regions are approximately in
proportion to the estimated crop area in the CEAP region. Within a CEAP region, we define
strata by intersections of sample classes and states for segment group 1 (crop). The number of
segments to select from segment group 1 is proportional to 2 the number of segments in the
stratum. Segments such that all points are LCU-ineligible are removed from the counts for this
allocation. A simple random sample of segments of the specified size is selected from each
stratum. The CEAP region sizes are (37, 94, 238, 227, 513, 2, 59, 65, 89, 53) for regions (1, 2, 3, 4,
5, 6, 7, 8, 9, 10). For the original sample, we used a compromise of one-per-stratum and twoper-stratum sampling within the strata. We use simple random sampling here for simplicity. We
randomly select one point from the eligible points in the selected psu, with prior CEAP points
selected with priority.

III.

Estimation

A. Overview
In both CEAP1 and CEAP2, points were selected from the NRI using stratification, which gives
each point a different probability of selection. In addition, the NRI itself used a similar process.
To produce unbiased estimates from such a multistage sample, each point is given a weight. Its
base weight is computed from the inverse of its probability of selection taking all stages into
account. Then, weights must be adjusted for in-scope nonresponse in a classification manner
that mitigates the impact on nonresponse bias. Finally, weights are adjusted to known (or
nearly known) totals such as total cultivated cropland by region.
After the sample selection for CEAP2, the CEAP regions were redefined as shown below. Since
estimates were to be created by these regions, both CEAP1 and CEAP2 weights were adjusted
based on these regions.

B. CEAP1
Introduction
The Annual NRI estimation procedure combines information from several sources to produce a
final data set composed of records containing information for the years 1982, 1987, 1992, 1997,
2000, and annually thereafter. Each record represents data elements for a sample point; an
estimation weight is attached to each record. For each NRI survey year, data are collected at
both the segment level and at the point level. The areas measured for small water features,
roads and railroads, and urban and built-up lands are converted to point data during the
estimation process. Each of these created points is given an initial weight based on the area in
the segment and the probability that the segment is included in the sample; imputation is used
for unobserved data elements in order to complete the data record for these created points.
Initial weights for created points and for observed points are adjusted during the estimation
process using ratio adjustments and small area estimation. Control totals for surface area,
federal land, and large water areas, derived from GIS databases, are maintained throughout the
process. Finally, the weights are adjusted using iterative proportional scaling (raking) so that the
new data base produces acreage estimates for broad cover/use categories for historical years
that closely match previously published estimates [see Fuller (1999)].

Development of Estimation Weights for NRI-CEAP
Estimation weights for the NRI-CEAP1 cultivated cropland sample points were developed in a
manner consistent with development of weights for the Annual NRI. Weights for other river
basins will be developed in a similar fashion although some additional ratio adjustment
procedures may be utilized, for example, for irrigated conditions. Estimation weights for points
identified as “land in CRP” were basically those derived for the Annual NRI data base.
The procedure for points identified as cultivated cropland follows:
o

Calculate initial weights, where WInit,q,k,j is the initial weight for point j, where point j falls
within 6-digit hydrologic unit q and has cropping system k
WInit,q,k,j = Aq,k,,j / (p q,k,j * mq,k,j ) , where:
A q,k,j = size of segment (q,k,j) in acres,
p q,k,j = probability that segment (q,k,j) is in the sample,
m q,k,j = number of sample points in segment (q,k,j)

o

Make the first adjustment to the initial weights
WAdj1, q,k,j = (WInit, q,k,j ) * (Yk / Xk ), where:
Yk = estimated acres of cultivated cropland in cropping system k ,
based upon 2003 Annual NRI
Xk = ∑ q,j WInit, q,k,j

o

Make the second adjustment to the initial weights
WAdj2, q,k,j = (WAdj1, q,k,j ) * ( Tq / Z1,q ), where:
Tq = estimated acres of cultivated cropland in 6-digit
hydrologic unit q, based upon 2003 Annual NRI
Z1,q = ∑ k,j WAdj1, q,k,j

o

Make the third adjustment to the initial weights
WAdj3, q,k,j = (WAdj2, q,k,j ) * ( Yk / X2,k ), where:
X2,k = ∑ q,j WAdj2, q,k,j

o

Make the fourth adjustment to the initial weights
WAdj4, q,k,j = (WAdj3, q,k,j ) * ( Tq / Z3,q ), where:
Z3,q = ∑ k,j WAdj3, q,k,j

o

Designate the final adjusted weight for point (q,k,j) to be the estimation weight, W0, q,k,j

Development of Replicate Weights for Estimating Variances
A form of jackknife variance estimation is utilized for the Annual NRI because of the rather
complex nature of the estimation procedure. The Annual NRI survey process is a type of two
phase sampling, since the samples represent a subsample of segments selected from the 1997
NRI sample. The replication method used for the NRI is a form of the “delete-a-group jackknife”
[see Kott (2001)]. The goal of the variance estimation procedure for an Annual NRI data set is to
construct a set of H modified weights for each observation, which allows computation of H
replicate estimates for a variable y. A variance estimate can then be calculated for an NRI
estimate, say Ŷ, as follows:
var( Ŷ ) = ∑ h c h * (Ŷh - Ÿ ) 2, where
c h is a constant determined by the replication procedure
Ŷh is the hth replicate estimate for Y, and
Ÿ = H-1 ∑ h Ŷh
For the 2003 Annual NRI and the NRI-CEAP cropland survey, H = 29 is used. To define the
replicates, a form of systematic sampling was used with the 1997 NRI sample units to create 29
groups of samples of approximately equal size. The same set of replicates is used for both the
2003 Annual NRI and the NRI-CEAP cropland database. This means that an estimation process
can be established so that variance estimates based upon the larger sample can be retained
within the smaller data base, if certain regression and/or ratio techniques are utilized.
The first set of replicate weights for the NRI-CEAP data set is derived as follows:
o

Calculate initial weights for the point (q,k,j) by modifying the estimation weight, W0, q,k,j , as
follows:
WInit,1,q,k,j

o

= 0,

if point (q,k,j) is in replicate #1

= (29/28) * W0, q,k,j ,

otherwise

Make the first Adjustment to the Initial Weights
WAdj1,1,q,k,j = (WInit, 1,q,k,j ) * (Yk / X1,k ), where:
Yk = estimated acres of cultivated cropland in cropping system k,
based upon 2003 Annual NRI
X1,,k = ∑ q,j WInit, 1,q,k,j

o

Make the second adjustment to the initial weights
WAdj2, 1,q,k,j = (WAdj1,1, q,k,j ) * ( Tq / Z1,q ), where:

Tq = estimated acres of cultivated cropland in 6-digit
hydrologic unit q, based upon 2003 Annual NRI
Z1,q = ∑ k,j WAdj1, 1,q,k,j
o

Make the third adjustment to the initial weights
WAdj3,1, q,k,j = (WAdj2,1, q,k,j ) * ( Yk / X1,2,k ), where:
X1,2,k = ∑ q,j WAdj2,1, q,k,j

o

Make the fourth adjustment to the initial weights
WAdj4, 1,q,k,j = (WAdj3, 1,q,k,j ) * ( Tq / Z1,3,,q ), where:
Z1,3,q = ∑ k,j WAdj1,1, q,k,j

o

Designate the final adjusted value for point (q,k,j) to be the first replicate weight, W1, q,k,j

A similar process is used for each of the remaining 28 replicates. Each point (q,k,j) then has an
estimation weight, W0, q,k,j , and a set of 29 replicate weights, { Wh, q,k,j, : h=1,2, …, 29 }, that are
used for variance estimation.

2020 Weight Adjustments
For comparisons between CEAP1 and CEAP2, the CEAP1 weights were given a final ratio
adjustment to the 2003 totals for cultivated cropland in each of the 12 current CEAP regions
taken from the 2017 NRI database.

C. CEAP2
Preserves state controls exactly for states with ratio adjustment factors in a specified interval.
1. Compute base weight as defined as follows:
For point j in segment I, define the base weight by
−1
𝑤𝑤0,𝑖𝑖,𝑗𝑗 = 𝑝𝑝𝑗𝑗,𝑖𝑖
𝐴𝐴𝑖𝑖

𝑁𝑁𝑓𝑓,𝑔𝑔(𝑖𝑖) 𝑚𝑚𝑐𝑐𝑐𝑐,𝑖𝑖

� 𝑐𝑐𝑐𝑐,𝑖𝑖
𝑛𝑛𝑠𝑠,𝑔𝑔(𝑖𝑖) 𝑚𝑚𝑖𝑖 𝑚𝑚

,

where
o Pf,i is the foundation probability or segment i;
o Ai is the area of segment i;
o Nf,g(i) and ns,g(i), respectively, are the number of segments in the frame and sample in the
group g containing segment I;
o mcc,i is the number of CEAP2 eligible (real) points in segment i;
o mi is the number of real points in segment i; and
o 𝑚𝑚
� 𝑐𝑐𝑐𝑐,𝑖𝑖 is the number of points in the CEAP2 sample in segment i.

The number of CEAP2 points in a segment �𝑚𝑚
� 𝑐𝑐𝑐𝑐,𝑖𝑖 � is 1 except for segments in the 2013 or 2014
regional CEAP surveys, where the number can exceed 1.
The groups are defined as intersections of the 2015 CEAP2 regions (see sample design discussion
about regions), the 7 aggregated sample classes, and states.
The frame is the union of the frame used for the 2015 sample and the revised frame used for
the revised 2016 sample.
2. Nonresponse adjustment. Bound ratios by 4. Cells defined as intersections of…
o Two broaduse groups (based on pgen)
 Cropland = Cultivated, noncultivated
 Pasture
o HUC 4 (≈200 HUC 4’s)
 PGEN HUC 4 definition
o 2015 CEAP Regions
 Five erosion categories “quintiles”
 National quintiles instead (unweighted, year 2012 from 2012 pgen):
3. Combine Rhode Island cultivated with Connecticut cultivated, and combine Nevada cultivated
with Nevada non-cultivated
o No CEAP point is classified as Rhode Island cultivated
o One CEAP point is classified as Nevada cultivated, and this point has broaduse noncultivated cropland through 2008. (The point is a new rotation point that changes to
cultivated cropland when it is sampled in 2009.)
4. Ratio adjustment at state level x 3 NRI broaduses (cultivated, non-cultivated, pasture). NO
BOUNDS on ratios. With no bounds on ratios and combining states as in step 3, national level
estimates are preserved.
5. Truncate weights from step 4 to remain in [Median/4, Median*4] by CEAP region
6. Repeat ratio adjustment at state level x 3 NRI broaduses (cultivated, non-cultivated, pasture).
No bounds on ratios. Bound ratios to remain in [0.75, 1.25]. Call the weights that result from this
step 6 “W2.”
7. Use raking (successive ratio adjustments) to control to 2015 broaduse estimates (cultivated,
non-cultivated, pasture separately) by 20CEAP region and HUC-2. We control to CEAP region and
HUC-2 margins, not intersections. The total number of controls is 3(C + H), where C is the
number of CEAP regions and H is the number of HUC-2’s.
o Use the broaduse designation for the year 2015 from the 2017 pointgen
o Hold weights for points not classified as crop or pasture fixed at W2 from step 6
o Only ratio adjust weights for points classified as cultivated crop, non-cultivated crop, or
pasture in 2015 based on the 2017 pointgen.
8. Final adjustment was made after correcting some HUC8 designations to the 12 CEAP Regions.

Replicate variance estimation:
The replicate weight procedure starts with the weights from step 6. Sort all points (note CEAP
sample has 1 point per segment) by state and by geoorder within stage. The point in position 𝑗𝑗 is
assigned the replicate number 𝑟𝑟 = (𝑗𝑗 − 1)mod 29 + 1. We set the 𝑟𝑟 𝑡𝑡ℎ replicate weight for a

point equal to 0 if the point is assigned replicate number 𝑟𝑟. Otherwise, we set the weight to W2
from step 6. We repeat step 7 with the replicates.

D. Change Between Surveys
An important analysis of CEAP2 vs. CEAP1 is whether a characteristic measured in a specified
geographic area shows a statistically significant change. Since most of these measures are sums,
or averages constructed from a nontrivial number of sample points, the Central Limit Theorem
applies. If the absolute difference of the two, |∆12 |, is greater than 𝑧𝑧𝛼𝛼 𝜎𝜎�12 ( |∆12 | > 𝑧𝑧𝛼𝛼 𝜎𝜎�12 ),
where
• zα is the α level standard normal z score for the chosen type I error (often zα = 1.96, with
α = 0.25), and
• 𝜎𝜎�12 is the standard error of the difference, further discussed below, then
the difference is regarded as statistically significant.

An estimate of the standard error of the difference discussed above is computed using the formula
𝜎𝜎�12 = �𝜎𝜎�12 + 𝜎𝜎�22 − 2𝜌𝜌�12 𝜎𝜎�1 𝜎𝜎�2 , where
• 𝜎𝜎�12 is an estimate of the variance of the CEAP1 estimate, using replicated weights (see
below) and 𝜎𝜎�1 is the square root of that variance;
• 𝜎𝜎�22 is an estimate of the variance of the CEAP2 estimate, using replicated weights (see
below) and 𝜎𝜎�2 is the square root of that variance; and
• 𝜌𝜌�12 is an estimate of the covariance of the two estimates (see below).

Given a characteristic estimate or observation yi at each of the points of interest in a geographic
area (yi could be a dichotomous [0,1] variable indicating in or not in a category), the estimates
above are calculated as follows using the R = 29 replicate weights:
•
•
•

2

1
𝜎𝜎�12 = ∑𝑅𝑅𝑟𝑟=1�𝑡𝑡1(𝑟𝑟) − 𝑡𝑡1̂ � , where 𝑡𝑡1(𝑟𝑟) = ∑𝑆𝑆 𝑤𝑤1𝑖𝑖(𝑟𝑟) 𝑦𝑦1𝑖𝑖 , for r = 1..R, and 𝑡𝑡1̂ = ∑𝑅𝑅𝑟𝑟=1 𝑡𝑡1(𝑟𝑟)
2
= ∑𝑅𝑅𝑟𝑟=1�𝑡𝑡2(𝑟𝑟) − 𝑡𝑡̂2 � , where 𝑡𝑡2(𝑟𝑟) = ∑𝑆𝑆 𝑤𝑤2𝑖𝑖(𝑟𝑟) 𝑦𝑦2𝑖𝑖 , for r = 1..R, and 𝑡𝑡̂2
𝜌𝜌�12 = ∑𝑅𝑅𝑟𝑟=1�𝑡𝑡1(𝑟𝑟) − 𝑡𝑡1̂ ��𝑡𝑡2(𝑟𝑟) − 𝑡𝑡̂2 �, where
1
o 𝑡𝑡1(𝑟𝑟) = ∑𝑆𝑆 ∗ 𝑤𝑤1𝑖𝑖(𝑟𝑟) 𝑦𝑦1𝑖𝑖 , for r = 1..R, and 𝑡𝑡1̂ = ∑𝑅𝑅𝑟𝑟=1 𝑡𝑡1(𝑟𝑟) ,
𝑅𝑅
1
o 𝑡𝑡2(𝑟𝑟) = ∑𝑆𝑆 ∗ 𝑤𝑤2𝑖𝑖(𝑟𝑟) 𝑦𝑦2𝑖𝑖 , for r = 1..R, and 𝑡𝑡̂2 = 𝑅𝑅 ∑𝑅𝑅𝑟𝑟=1 𝑡𝑡2(𝑟𝑟) , and

𝜎𝜎�22

o

S* is the subset of the sample points in both CEAP1 and CEAP2.

𝑅𝑅
1
= ∑𝑅𝑅𝑟𝑟=1 𝑡𝑡2(𝑟𝑟)
𝑅𝑅

References
BankierM. D. (1988). Power allocations: determining sample sizes for subnational areas. The
American Statistician, 42(3), 174-177.
Breidt, F.J. & W.A. Fuller (1999) Design of supplemented panel surveys with application to the
National Resources Inventory, Journal of Agricultural, Biological, and Environmental Statistics,
4(4): 391 – 403.
Fuller, W.A. (1999) Estimation procedures for the United States National Resources Inventory,
Proceedings of the Survey Methods Section of the Statistical Society of Canada, 39 – 44.
Goebel, J.J. (2009). Statistical Methodology for the NRI-CEAP Cropland Survey, Natural Resource
Conservation Service, Washington, D.C.
Kott, P.S. (2001) The delete-a-group jackknife, Journal of Official Statistics, 17: 521 – 526.
Nusser, S.M. & J.J. Goebel (1997) The National Resources Inventory: a long-term multi-resource
monitoring programme, Environmental and Ecological Statistics, 4(3):181- 204.
Potter, S.R., S. Andrews, J.D. Atwood, R.L. Kellogg, J. Lemunyon, L. Norfleet, D. Oman (2006)
Model Simulation of Soil Loss, Nutrient Loss, and Change in Soil Organic Carbon Associated with
Crop Production, Natural Resources Conservation Service, USDA, Washington, D.C.
Schnepf, M. (2016). A History of Natural Resource Inventories Conducted by the USDA’s Soil
Conservation Service and Natural Resources Conservation Service, Natural Resources Conservation
Service, Washington, D.C.

Appendix A. The National Resources Inventory (NRI)
Introduction to the NRI
The current National Resources Inventory evolved from a need for information to guide decisions about
resources conservation after the “Dust Bowl” of the 1930s. After evolving through several iterations,
the National Resources Inventory was formally mandated in the 1972 Rural Development Act and its
current design began in 1982. In 2000, it converted from a 5-year collection to an annual design.
Throughout that time period its scope expanded from a heavy focus on cropland erosion to a much
wider assessment of resources described herein. A detailed account of the history of the NRI can be
found on the NRI Website,
http://www.nrcs.usda.gov/wps/portal/nrcs/main/national/technical/nra/nri/, at the “History of the
NRI” link. That contains the report, “A History of Natural Resource Inventories Conducted by the USDA’s
Soil Conservation Service and Natural Resources Conservation Service” compiled by Max Schnepf for the
Soil and Water Conservation Society in 2008 and updated by Dr. Patrick Flanagan in 2016.
As of FY 2021, the National Resources Inventory (NRI) program houses a database of surface-level
information about the non-Federal natural resources of the United States of America and provides the
infrastructure and overall process to collect updated information about those resources. The
information consists of characteristics of land, that which covers it, including water, and how it is used.
The database is a longitudinal data set containing variables from 1982, 1987, 1992, 1997, and annually
from 2000 through 2017. The variables consist of raw collected data, data derived from the raw data,
estimates, and administrative data for a two-stage sample of geographic areas, called segments, and
sample points on the ground within those segments. At this point, the NRI covers the 48 conterminous
States, Hawaii, Puerto Rico, and the Virgin Islands for all of the aforementioned years and Alaska for
2007.

NRI Goals and Objectives
The primary goal of the NRI will be to comply with the initial mandate from the Rural Development Act
of 1972 that directed the Secretary of Agriculture “to carry out a land inventory and monitoring program
to include, but not be limited to, studies and surveys of erosion and sediment damages, flood plain
identification and utilization, land use changes and trends, and degradation of the environment resulting
from improper use of soil, water, and related resource conditions.”
The primary objective of the NRI is to provide natural resource managers, policy makers, and the public
with scientifically valid, timely, and relevant information on natural resources and the environment. The
NRI is unique because of its established linkages to NRCS soil survey data. Information about specific
properties and characteristics of the soil and surrounding landscapes is utilized to develop NRI data
elements and interpretations.
NRCS operates the NRI program on the basis of rigorous, scientifically developed sample survey
(statistical) principles and protocols. To that end, the NRI –








utilizes the independent, objective expertise of internationally recognized experts in survey
statistics via a cooperative agreement with the Center for Survey Statistics and Methodology
(Iowa State University)
utilizes probability sampling techniques to ensure that results are scientifically credible
follows strict quality assurance protocols
protects the integrity and confidentiality of the data collection
provides databases and statistical summaries that allow data users to make statistically valid
analyses and inferences

The NRI Sample Design and Selection
Target Universe
The NRI target universe is the land area of the United States of America and it territories, where land
area includes land covered by anything including water. The exception is coastal territorial water.
Portions of water along the coast are included in the target universe, but only to the extent that they
have the potential to change to land area or become part of the estuarine system. Many large bays are
included that are primarily interior to the coastline, e.g., Chesapeake, Delaware, San Francisco, and
Mobile bays. Most gulfs are not included, e.g., Gulf of Maine. Islands off of the coast are included, but
the water areas surrounding them are not. The Great Lakes and Saint Lawrence Seaway are treated the
same way as the oceans. Since the NRI is a longitudinal data set, the Universe is the above over time
from 1982 to the present at specific time intervals: 1982, 1987, 1992, 1997, and yearly 2000 – 2017.

NRI Foundation Sample
The Foundation NRI sample is a two-stage stratified area sample of all States, Puerto Rico, and the Virgin
Islands. The primary sampling units (PSUs) are areas of land called “segments.” The segments in the
sample were selected from a collection of grids covering all land and water area in the target universe.
Within the sample segments, points were selected in the geographically balanced random process
described below. For most segments, three points are selected, but that varies to some degree
dependent on the segment size. The foundation sample for 1997 contained 300,000 segments and
about 800,000 points. See Nusser and Goebel (1997) for a more complete description of the survey. The
samples each year from 2000 to 2017 are core and rotation subsamples of about 72,000 segments
selected from the 1997 “foundation” sample. The annual sampling process is further described below

Selection of Sample Primary Sampling Units (PSUs)
The NRI evolved into a longitudinal data collection going back to the same sources of data over and over
to get both cross-sectional data for each release and have the ability to compare the data over time to
assess change at local levels. The sources of data for the 2017 NRI were almost entirely selected for the
1982 NRI, so the sampling details below reflects sample selection in 1982.

The Sampling Frame
The surveys from which the NRI data is collected are entirely area frames with a two-stage selection,
intending that data collection take place at each stage level. The first stage is a selection of primary
sampling units (also called segments), which are subsets of each of the 3,100+ counties in the 48
contiguous states, Hawaii, Puerto Rico, and the Virgin Islands. To construct the first stage frame, each
county is first divided up into non-overlapping portions ranging in size from 40 to 640 acres.

Defining the PSUs and PSU Strata
Defining the PSUs, was done as follows:
Standard County. For those parts of the country defined by the Public Land Survey System (PLSS) and
for a standard county that is square and 24 miles on each side, the county would be divided into 16
square townships, each 6 miles on a side. Each township is then divided into 36 sections, each one mile
on a side. The sections are numbered from 1 to 36 starting in the Northeast corner and proceeding back
and forth horizontally in a serpentine manner. For sampling, 3 strata of 12 sections are then formed in
each township, with the two top rows being one stratum, the second two rows as the second stratum
and the last two rows as the third stratum. Each of the sections is then divided into four PSUs, each ½
mile on a side. See diagrams below.

¼ Section

Strata

Township

PLSS Non-Standard Counties. In irregularly shaped (non-square) PLSS counties, as many regular (6 mile
by 2 mile) strata are formed and then the remaining sections or partial sections are formed into 12
section groups.
PLSS Counties with Varying PSU/Segment Sizes. Due to the heterogeneity in some irrigated land and
homogeneity in forest, range, and barren land in the west, some strata were constructed with differing
PSU/segment sizes of as small as 40 acres up to 640 acres, though only 3 sizes were used (40, 160, and
640) beyond some variation due to non-square county borders.
Non-PLSS Counties in Ohio & Southern States. In Ohio, Louisiana, and Arkansas, these areas a grid
pattern was superimposed on the county maps and then sampled similarly to PLSS counties.
Non-PLSS Counties in the 13 Northeast States. The strata in the 13 northeastern states are areas of land
two minutes of latitude by four minutes of longitude in size. The PSUs are rectangular areas of land 20
seconds of latitude by 30 seconds of longitude. The PSUs range in size from 96 acres in northern Maine
to 113 acres in southern Virginia.

Original PSU Sample Selection Methods
Within each PSU stratum, PSUs (segments) were selected either using a simple random sample without
replacement for strata with equal sized PSUs, or for strata with some differing in PSU size, they were
sampled with probability proportional to size. Initially, a 2, 3, and 4 percent samples were selected. This
was done to facilitate choices in sample reduction in some PSUs before making the final sample choices.
•
•
•
•

In the simplistic case of a stratum with 48 equal sized PSUs, a 2 percent sample would be the
selection of one PSU, while a 4 percent sample would be the selection of 2 PSUs.
Within a township, a 3 percent sample was also selected by selecting 2 PSUs in one of the three
strata, and 1 PSU from the other two.
Other schemes were employed for non-standard counties to choose 2, 3, and 4 percent
samples.
The final sample of 300,000 PSUs for the 48 coterminous States, Hawaii, and the Caribbean
territories (Puerto Rico and the Virgin Islands) was determined by a fixed budget estimate and
sample choices that would minimize variance of key variables.

Sample Changes and Sub-Sampling over Time
Between 1982 and 1992, the original sample of 320,000 PSUs in 1982, were reduced to 300,000 PSUs
with some augmentation in selected counties where analysis showed a need for additional sample size.

Selection of Sample Points in the Sample PSUs
The last step in selecting the sample was to locate three sample points within each
PSU. There were exceptions-two points were selected from 40-acre PSUs and only one
point was selected per PSU in Louisiana and northwestern Maine.
The procedure for selecting the points within a PSU was as follows:

1. A grid consisting of squares formed with three rows and three columns was
superimposed on the PSU. Each square was subdivided into four equal blocks. The
numbers 1 to 12 were assigned to the blocks in each row with a number appearing
once in each row and once in each column. No adjoining blocks had the same
number.
2. Two numbers between 1 and X were selected at random, where X is the width of the
side of the PSU in feet. These two numbers determine the coordinates of sample
point #1 in feet north and east from the PSU’s southwest corner.
3. Points #2 and #3 were located in the blocks with the same label as the block for
point #1. They were positioned in the same relative position within the blocks as
point #1. Steps for selection of two sample points within a PSU were similar, except
the PSU was divided into 4 blocks Instead of 36.

1

2

3

4

5

6

7

8

9

10 11 12

4

6

12

7

1

3

11 10

2

5

9

8

5

3

1

6

4

2

9

12

8

11

7

10

Sample Results
The resulting PSU selection is a random area sample of segments in every county across the country.
The localized idea is depicted below in the first illustration. Within each segment, the points are
selected in a balanced fashion resulting in points as depicted in the second illustration.

The distribution of segments throughout the country is as follows:

The 2000 and Later Annual Samples
Prior to 2000, in 1982, 1987, 1992, and 1997, the entire sample of about 300,000 segments and over
800,000 points were collected in one year. That presented a huge resource impact both in use of
personnel and in the cost of the collection. In addition, such a large collection and the strain it puts on
resources tends to have a negative impact on data quality. For all of those reasons, starting in 2000, the
NRI Program changed to an annual sample approach. After extensive research documented in Breidt
and Fuller (1999) a rotating panel design was shown to produce the best results, consisting of a fixed
sample of “core” segments that are included in the collection every year, combined with a sample of
“rotating” segments which rotate in and out of the annual sample over time.
The core sample of segments consists of just over 41,000 segments. To construct the core sample,
segments were selected in every county using a stratified selection from the following strata:
•
•
•
•
•
•
•
•
•
•
•
•

Wetland (contains one or more wetland point)
CRP (contains one or more CRP points and no wetland points)
Developed Land Change (not in above)
Urban (Urban in segment, not in above)
High Erosion (Not in above, but has high erosion cropland point)
Cropland (not in above and has one or more cropland points)
Pasture (not in above and has one or more pasture points)
Range (not in above and has one or more range points)
Forest (not in above and has one or more forest points)
100% Urban
100% Federal or Water
Remainder

A similar approach is used annually to select the rotating panel of around 31,000 segments. Details of
this entire process are provided in Fuller (2003).
The annual design was implemented as indicated from 2000 to 2003. After that, some variations were
implemented by using some repeated rotation panels in their entirety.

Appendix B

Sampling Procedure for 2015-2016 National CEAP Survey

1 Frame
1.1 Eligible Land Cover/Uses
A point is included in the frame if the most recent collected land cover/use (LCU)
satisfies one of the following conditions:
• LCU>0 and LCU<200 and LCU 6= 7
• LCU in {211, 212, 213}
• LCU = 200 and not range
• LCU = 410
The states included in the frame are the coterminous 48 states (not including DC).
Aquaculture (171) is treated just like other cropland LCUs.
Points that are classified as urban or roads in 1997 and as 200-213 in the most
recent collected year are not eligible. The last set of points removed contains 376 points,
one of which is 213 in the most recent year and the rest of which are 200. These 376 points
are removed because NRI editing procedures change the a collected LCU of 200-213 to
urban.
Each point is classified into one of five mutually exclusive and exhaustive LCU
groups. With “t” representing the most recent year, the LCU categories obtained from
this program in item 2 are defined as follows:
• 1 = LCU(t) in 1-20 (high value specialty crops)
• 2 = LCU(t) in 141-144 and LCU(t-1), LCU(t-2), LCU(t-3) not in the set 11-116
• 3 = LCU(t) in 200-213 and LCU(t-1), LCU(t-2), LCU(t-3) not in the set 11-116
• 4 = LCU(t) in 21-116, 170, 171, 180, or LCU(t) in 141-144 and at least one of
LCU(t-1), LCU(t-2), LCU(t-3) in the set 11-116, or LCU(t) in 200-213 and at least
one of LCU(t-1), LCU(t-2), LCU(t-3) in the set 11-116
• 5 = LCU 410 (CRP)
The combination of groups 1, 2, and 4 below approximates the NRI definition of cropland
(cultivated and non-cultivated combined). Category 3 approximates the NRI definition
of pasture. (A perfect classification of NRI points into broaduses is not possible with
the data available because of the 2004 protocol change.) The five LCU categories are
aggregated to the following three groups.
• Cropland (1): LCU category of 1, 2, 4

1

• Pasture (3): LCU category of 3
• CRP (4): LCU category of 5
The CRP category is labeled 4 for consistency with the original code of 410.

1.2 CEAP Regions
Ten CEAP production regions are defined for the 2015-2016 National survey. Figure 1
shows a map of the ten production regions. The specifications for the regions are outlined
in Table 1.

Figure 1: CEAP Production Regions

2

#

Working Name

LRR

Refinements

Concept or Rationale

1

Pacific Coast

A,C

None

2

Irrigated West

B,D,E

None

3

Northern Plains
Wheat Belt
Southern Plains
Wheat/Cotton
Corn Belt
South Central Pasture
and Crop
East Central pasture
and Crop
Southeast Mid Atlantic
Coastal Plain
Lower MS River and
Texas Gulf
Northeast

F,G
H,I

Dryland wheat
predominant Wind ero.
Aquifer driven

K,L,M
J,P, and western N

Less 109 from LRR L
P west of MS River

eastern portion N

plus MLRA 136

T, P, U

East of MLRA 134

O, T and P

West with MLRA 134

High value irrigated crops.
Water quantity issues
Water quantity resource
concern note the non Irr portions
Dominated by Wheat
and other small grains
Wheat cotton region,
wind erosion and aquifer depletion
Corn soy dominated region
Animal ag and manure
on crop/pasture hay common
Animal ag and manure on
crop/pasture hay common
Split from other for less rain
intensity and coarser soil in general
Silty soils with cotton rice
and cane. Intense rain

4
5
6
7
8
9
10

R, S, + 101 from L

Table 1: Description of 10 CEAP Production Regions.
Table 2 contains the average variance of APEX output variables from the 20032006 CEAP survey by CEAP region, where the average is across states for a particular
region. For each APEX output variable, CEAP region 1 has the largest variance, which
may not be surprising because this region is defined by high value irrigated crops in the
Pacific Coast.
Precips2
Runoffs2
Percs2
WaterYlds2
RUSLE2s2
Sediments2
WindEros2
TNlosss2
Nrunoffs2
Nseds2
PercNs2
Plosss2
solublePs2
sed Ps2
C starts2

7

8

6

9

1

2

3

4

10

5

2.275
2.335
4.216
3.665
0.826
3.199
0.019
50.901
2.859
13.379
25.675
6.196
3.476
3.809
8.168

1.962
2.055
7.168
4.680
0.837
1.887
0.099
53.836
1.385
4.194
30.894
3.013
2.143
1.464
10.352

2.954
2.324
4.675
4.182
1.596
1.590
0.616
33.010
1.782
5.187
19.855
2.016
0.885
1.635
7.571

2.369
2.702
5.362
4.726
2.711
4.205
0.988
43.503
2.555
10.978
23.172
4.879
2.915
3.321
12.101

10.553
8.379
6.654
10.755
0.498
6.466
0.211
150.861
9.442
13.102
80.382
13.544
6.348
9.037
49.092

5.015
2.419
3.079
4.495
0.144
3.086
4.399
95.705
1.840
9.335
51.439
8.850
0.680
8.602
11.152

1.775
0.409
2.260
2.302
0.036
0.130
3.117
24.433
0.468
9.317
11.385
2.865
0.346
2.771
8.347

2.829
0.508
2.468
2.540
0.222
0.226
5.034
50.796
0.244
10.536
28.414
4.424
0.175
4.387
8.854

1.621
1.525
3.352
2.843
0.436
2.142
0.066
90.358
1.072
10.415
47.680
7.907
4.393
4.466
11.516

2.239
1.223
3.583
2.684
0.463
1.207
0.571
29.160
1.165
8.410
15.609
3.086
1.458
2.222
25.907

Table 2: Average variance of APEX output variables by CEAP region.

1.3 Programs
Four programs construct the frame:
3

1. “extractlatestlcu3check1.R” obtains the most recent collected landuse and cropping
history and checks the range indicator for points with latest LCU 200. The program
also adds segment-specific information from PSU, SAMPLED.SEGMENT, and
SDE including foundation selection probabilities, a core indicator, and location.
2. “addcolstoframecheckrev2.R” adds additional columns including indicators for whether
the point was in the 03-06 CEAP or the 2013-2014 CEAP surveys. The program
also adds an LCU category and obtains CEAP regions using an R overlay operation.
3. “mergeinceapregionsnewRev” obtains CEAP regions from an ArcGIS overlay operation. These CEAP regions are used for the sample selection.
4. “createcrdatRev” reorganizes the data by the CEAP regions obtained in step 3
instead of by state. This program also removes points in LA that are not in the
pointgen, removes points that have been removed, and removes points that are
classified as urban or roads in 1997 and as 200-213 in the most recent collected
year. This programs also aggregates the five LCU categories into the three LCU
groups.

2 Strata Definitions and Sample Sizes
2.1 Sample Sizes by CEAP Region
Let Aˆh be the estimate of the area in CEAP-eligible categories in CEAP region
h in the year 2010. The estimated areas are obtained from the 2010 pointgen. Due
to topological errors in the CEAP region shapefile, not all NRI points are in a CEAP
region. The total area corresponding to points classified in eligible LCUs that are not
in a CEAP region is 83,800 acres. The total estimated area in CEAP-eligible categories
is 480,436,800 acres. Because the area associated with missing CEAP regions is only
0.017% of the estimated area based on points located in a CEAP region polygon, the
area with missing CEAP regions is ignored for the sample size calculation.
Define a target sample size for CEAP region h by nh , the result of accumulating
and rounding the n
˜ h , where
N Aˆ0.5
n
˜ h = PH h 0.5 ,
ˆ
h=1 Ah

(1)

and N = 45, 000. This yields the sample sizes in Table 3 below. The square root
allocation is often used as a compromise between equal allocation, optimal for individual
area estimates, and proportional allocation, optimal for the total of the regions combined
(Bankier, 1988).

4

CEAP.Reg
nh

R1
2420

R10
3113

R2
3881

R3
5733

R4
5651

R5
8701

R6
4070

R7
4120

R8
3611

R9
3700

Table 3: Target sample sizes by CEAP region.
The program to calculate the estimated areas and the target sample sizes is
“estimatedareafrompgenoverlay1rev.R.”
2.1.1

Increasing Sample Size to Account for CRP

We obtain a revised sample size to account for differential eligibility rates across crop,
pasture, and CRP. Assume the eligibility rate for CRP is 15%, and the eligibility rate
for crop and pasture is 85%. Assume the response rate is 70% for all three domains. If
the sample size is 45000, and the sample is drawn exclusively from pasture and cropland,
then the expected realized sample size is,
n
˜ = 45000(0.85)(0.7) = 26675.

(2)

In the frame, approximately 6% of the segments contain at least one CRP point. If a
sample of size 45000 is drawn and 6% of the points are CRP, then the expected realized
sample size is
n
˜ = 45000(0.7)[0.06(0.15) + 0.94(0.85)] = 25452.

(3)

If the sample size is increased from 45000 to 47500, then the expected realized sample
size is,
n
˜ = 47500(0.7)[0.06(0.15) + 0.94(0.85)] = 26866.

(4)

The CEAP region sample sizes for N = 47, 500 are provided in Table 4 below.
CEAP.Reg
nh

R1
2554

R10
3286

R2
4097

R3
6051

R4
5965

R5
9185

R6
4296

R7
4349

R8
3811

R9
3906

Table 4: Target sample sizes by CEAP region.
2.1.2

Modifications for points in 2013 or 2014 CEAP surveys

It was decided that the data from the 2013 and 2014 CEAP surveys are to be used
for points that are in the previous two regional survey samples and are selected for the
national survey. Because this reduces the overall workload, cost constraints permit a
larger total sample size. To determine the number of additional points to sample, the
basic sampling procedure, described below, was implemented 30 times. The number of
points in either the 2013 or 2014 sample was recorded in each implementation. The total
sample size was increased by the median number of points obtained in a selected sample
5

that are also in either the 2013 or 2014 sample. The median was 1166, and the modified
sample sizes are given in Table 5 below.
CEAP.Reg
nh

1
2617

10
3366

2
4198

3
6199

4
6112

5
9410

6
4402

7
4456

8
3904

9
4002

Table 5: Modified CEAP region sample sizes.

2.2 Point Sample Sizes
The target sample size at the level of CEAP region and broaduse is converted to a
target sample size for the intersection of CEAP region, sample class, LCU group, and
state on the basis of the number of points in the frame. A point is classified into one of
the three “LCU groups” defined in item 4 of Section 1 on the basis of the most recent
collected land cover/use and associated cropping history.
We let Nhbk` be the number of points in the frame in CEAP region h, LCU
group b, sample class k, and state `. The sample classes 8-12 are aggregated to a single
sample class denoted by 0. The sample size nh is split among the domains defined by
intersections of (h, b, k, `) according to the Nhbk` . For CEAP regions other than regions
6 and 7, the allocation is in proportion to Nhbk` . For each of CEAP regions 6 and 7, the
sample size is first distributed to LCU groups in proportion to the square root of Nhb
and then to states and sample classes in proportion to Nhbkl . The allocation for regions
6 and 7 is square root instead of proportional at the first level because these are the only
two regions where the estimated area in pasture exceeds the estimates area in cropland.
(Estimates are based on the 2010 pointgen for the year 2010.) Specifically, let
nh Nhbk`
,
b,k,` Nhbk`

n
˜ hbk` =

P

=

P

h 6= 6, 7

0.5 N
nh Nhb
hbk`
,
0.5 P
N
b hb
k,` Nhbk`

h = 6, 7.

(5)
(6)

The target sample size nhbk` is obtained by applying accumulate and round to the n
˜ hbk`
within a CEAP region.
2.2.1

Modifications to Point Sample Sizes for Regions 6 and 7

For the CRP category for regions 6 and 7, the target point sample size exceeds the
number of available segments. The target point sample sizes were modified for regions
6 and 7 as follows. The modified nhbk` is obtained by applying accumulate and round
to n
˜ hbkl defined,
n∗ Nhbk`
,
n
˜ hbk` = Phb
k,` Nhbk`

6

(7)

where c6 = 165, c7 = 61,
n∗hb

=

(0)
nhb
(0)

1+

!

ch
(0)

(0)

,

b = 1, 3

(8)

nh1 + nh2

= nhb − ch ,

b = 4,

and
nh N 0.5
(0)
nhb = P hb0.5 .
b Nhb
2.2.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

(9)

Resulting Frame Counts and Target Point Sample Sizes
BU
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4

SC
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7

R1
50
444
10
534
258
141
3282
60
47
343
10
291
147
27
348
256
0
2
12
0
0
0
3
0

R10
11
1716
83
2227
1607
1092
2457
49
42
1197
26
1187
813
209
899
367
0
20
20
18
4
12
21
0

R2
208
890
641
562
443
3840
7930
110
90
1117
283
340
194
180
1660
746
4
17
742
9
3
239
247
4

R3
98
3292
1819
156
322
6185
12060
66
50
531
503
70
61
293
1247
356
8
433
1278
0
10
369
493
2

R4
76
746
1519
489
564
4791
12536
9
35
110
903
169
142
384
1113
294
1
80
1900
9
19
354
539
0

R5
45
7352
5823
3662
5098
13606
34364
140
84
2735
1722
1209
1072
1368
4380
1368
0
351
2224
59
54
377
427
8

R6
13
126
82
182
134
338
1253
82
136
870
132
905
786
241
1933
3830
0
3
36
1
1
1
12
1

R7
23
611
354
1640
839
1024
3370
185
190
752
278
3010
1375
519
2914
2181
1
17
172
17
18
19
59
1

R8
18
2595
313
1113
708
579
3258
74
130
1605
245
855
423
136
1010
700
1
150
320
22
20
35
81
2

R9
51
2425
451
509
560
1324
6078
32
77
821
162
303
275
164
809
347
2
141
263
6
12
45
75
2

Table 6: Frame point counts by CEAP region, LCU Group (BU), and sample class (SC).

7

25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

BU
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4

SC
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7

R1
21
186
4
223
107
59
1371
24
20
142
5
122
61
12
145
107
0
1
5
0
0
0
2
0

R10
2
409
21
533
386
262
586
11
11
285
5
283
192
51
216
88
0
5
6
5
1
3
5
0

R2
42
185
131
115
90
787
1625
24
18
229
58
68
38
38
338
152
0
3
152
4
1
49
50
1

R3
21
687
379
32
67
1292
2516
14
10
110
105
16
13
59
261
75
3
90
267
0
1
79
102
0

R4
16
169
349
109
128
1093
2862
3
9
26
204
39
33
87
254
66
0
19
435
2
5
81
123
0

R5
3
789
625
393
546
1463
3696
15
9
296
186
133
117
147
471
147
0
38
238
5
5
40
46
2

R6
9
83
54
118
88
223
821
53
43
286
43
298
258
78
633
1259
0
3
36
1
1
1
12
1

R7
5
145
82
389
198
243
799
43
41
147
57
603
276
103
586
435
1
17
172
17
18
19
59
1

R8
5
706
84
302
190
155
881
19
34
436
66
231
115
39
276
190
0
39
87
9
6
10
24
0

R9
14
648
122
136
152
355
1626
9
22
220
44
82
74
44
219
93
1
37
70
0
2
13
19
0

Table 7: Target point sample sizes by CEAP region, LCU Group (BU), and sample class
(SC).

2.3 Segment Sample Sizes
The target sample size refers to a sample of points, but the selection procedure
obtains a sample of segments. One point is to be selected from each sampled segment.
The segments are classified into groups on the basis of the LCUs of the in-scope points
in a segment. Table 8 gives the possible combinations. Table 9 gives the number of
segments in each category S by CEAP region. No segment has more than three eligible
points.

8

S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

CRP
3
2
2
0
1
0
0
1
0
1
2
1
1
0
0
0
1
0
0

Crop
0
1
0
3
2
2
0
0
1
1
0
1
0
2
1
0
0
1
0

Pasture
0
0
1
0
0
1
3
2
2
1
0
0
1
0
1
2
0
0
1

Group
1
1
1
2
1
2
3
1
3
1
1
1
1
2
3
3
1
2
3

Table 8: Segment group based on number of points classified as crop, pasture, or CRP

9

S
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

R1
2
1
2
983
1
117
87
1
56
1
1
500
143
185
1
392
461

R10
1
3
1
774
11
395
100
1
242
10
4
16
9
1388
844
478
29
2217
1739

R2
194
46
22
2340
65
407
368
10
206
9
137
47
17
2166
451
611
125
1569
946

R3
444
111
50
5155
183
449
228
16
146
23
203
102
30
2404
351
308
179
1728
584

R4
460
112
43
4428
225
379
240
28
167
25
274
97
32
2045
289
336
257
1458
600

R5
176
250
124
14016
541
2791
644
117
1013
244
198
444
192
6711
2110
1052
290
3895
2181

R6
5
3
2
209
4
176
893
5
201
4
2
3
6
218
258
1576
4
318
2144

R7
9
8
8
536
15
476
563
10
523
17
29
37
39
871
1164
1572
69
1965
3616

R8
35
10
12
815
37
216
283
15
102
14
59
83
48
1369
448
702
167
2312
1879

R9
38
35
7
2358
63
232
251
11
122
13
55
44
34
1287
225
381
73
753
666

Table 9: Number of segments in each group S by CEAP region

3 Segment Sample Sizes
To define a procedure for selecting a sample of segments, let d denote the segment
group. Let Mhdk` and mhdk` , respectively, be the number of segments in the frame and
sample for CEAP region h, aggregated sample class k, and CEAP region `, where
m
˜ hdk` =
=

mh Mhdk`
, h 6= 6, 7
d,k,` Mhdk`
m∗ Mhdk`
P hd
, h = 6, 7,
k,` Mhdk`
P

(10)
(11)

where (m∗h1 , m∗h3 , m∗h4 ) = (921, 3443, 38) for h = 6, and (m∗h1 , m∗h3 , m∗h4 ) = (1763, 2452, 241)
for h = 7. The target sample size mhdk` is obtained by applying accumulate and round
to the m
˜ hdk` within a CEAP region.

10

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

BU
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4

SC
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7

R1
37
201
4
267
125
59
1273
26
30
229
2
214
109
15
173
160
0
2
6
0
0
0
2
0

R10
7
970
39
1187
858
485
1199
29
39
892
13
912
599
106
577
265
0
19
18
17
4
11
16
0

R2
116
426
216
295
220
1538
3616
55
68
602
116
240
127
75
909
445
2
13
388
5
2
125
133
4

R3
58
1440
630
74
151
2299
5048
36
30
282
213
44
39
108
677
224
5
267
617
0
7
183
260
2

R4
43
346
460
236
254
1827
5138
6
24
64
380
120
95
169
592
188
1
42
1001
4
14
194
297
0

R5
25
3496
1800
1750
2279
5034
12968
61
63
1724
547
846
687
433
1934
766
0
294
1555
49
48
279
345
6

R6
8
49
26
88
64
135
518
33
118
531
54
596
478
118
1001
2176
0
2
24
1
1
1
8
1

R7
20
310
131
860
430
451
1546
100
150
540
143
2149
928
297
1738
1493
1
17
130
14
17
14
47
1

R8
12
1774
145
613
373
259
1505
31
95
1192
130
625
290
70
597
415
1
125
220
18
19
28
68
1

R9
31
1106
143
223
247
500
2365
15
63
471
65
200
166
80
397
203
2
92
176
6
10
34
51
2

Table 10: Frame segment counts by Group (BU), sample class (SC), and CEAP region
(R1-R10)

11

25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

BU
1
1
1
1
1
1
1
1
3
3
3
3
3
3
3
3
4
4
4
4
4
4
4
4

SC
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7

R1
33
180
3
238
112
53
1135
23
26
204
2
191
96
13
155
143
0
2
6
0
0
0
2
0

R10
3
393
14
486
351
197
488
14
16
364
6
373
242
44
234
109
0
8
6
6
2
4
6
0

R2
50
184
92
126
96
662
1559
24
29
259
49
104
54
36
392
191
0
6
170
1
1
52
58
3

R3
29
703
307
36
73
1123
2466
18
15
137
105
22
19
53
330
109
2
132
301
0
3
90
126
0

R4
24
183
244
125
134
972
2732
3
13
34
202
63
52
92
315
99
0
24
533
2
6
101
159
0

R5
6
890
458
444
578
1279
3298
15
15
438
137
216
175
109
494
194
0
77
399
13
14
72
88
1

R6
8
49
26
88
64
135
518
33
81
359
37
405
325
81
678
1477
0
2
24
1
1
1
8
1

R7
9
144
58
397
195
208
708
44
50
179
47
708
306
97
574
491
1
17
130
14
17
14
47
1

R8
6
807
67
280
169
120
684
14
43
540
58
284
132
31
270
187
0
56
99
6
8
12
31
0

R9
19
664
86
134
148
301
1426
8
39
285
40
120
100
48
238
123
1
54
106
4
6
19
32
1

Table 11: Segment sample sizes by Group (BU), sample class (SC), and CEAP region
(R1-R10)

4 Selection of Segments and Points
This section describes the procedure to select a sample of segments and then select
one point per segment. The procedure is guided by the following principles:
1. Attempt to attain geographic spread.
2. Retain points that provided usable data for the 2003-2006 CEAP survey..
3. Select one point per segement.
4. Avoid high variation in weights within groups defined above.
5. Avoid selecting core segments where possible.

12

4.1 Combination of One Per Stratum and Two Per Stratum Sampling
Each stratum (defined by the combination (h, b, k, `)) is subdivided into smaller
groups on the basis of geoorder. A combination of one per stratum and two per stratum
sampling is used to select a sample of PSUs from each group. This sampling procedure
is intended to achieve geographic spread, similar to systematic sampling, but reduce
the probability of undesirable samples that may arise if the geoorder reflects a grid
pattern in the underlying population. Selecting two per statum for a sub-sample of the
strata provides degrees of freedom for variance estimation. We first describe the general
procedure and then explain an application to the 2015/2016 CEAP sample.
4.1.1

General Methodology

Let g denote the (h, b, k, `) used at the second stage of the allocation. Let Ng and
ng be the number of segments in the frame and sample, respectively, for group g. Let
s1 < · · · < sNg be the ranks of the geoorders of the elements in the frame for group g.
(This maps the geoorders onto a line, where the segments in the frame are separated
by equal distances.) Let Ng n−1
g = k. Randomly select a starting point r ∈ [1, Ng ]. For
any segment j in the frame such that sj < r, define s∗j = sj + sNg . For segments j
(j = 1, . . . , sNj ) such that sj ≥ r, let s∗j = sj . Define stratum d by
Ud = {j : r + k(d − 1) ≤ s∗j < r + kd},

d = 1, . . . , ng .

(12)

Randomly select d∗ from d = 1, . . . , ng .
• If d∗ > 1, randomly select two elements from Ud∗ . Randomly select 1 element from
Ud for d ∈
/ {d∗ , d∗ − 1}.
• If d∗ = 1, randomly select two elements from Ud∗ . Randomly select 1 element from
Ud for d ∈
/ {d∗ , d∗ + 1}.
This procedure is called the one-two sampling procedure below.
4.1.2

Application to CEAP

Let Ng and ng be defined as above. Let Ng,pc be the number of segments in group
g containing at least one prior CEAP point. Let Ng,r and Ng,c , respectively, be the
number of rotation and core segments in group g. Consider several cases:
• Case 1: ng − Ng,pc ≤ 0. Use one-two sampling to select ng of the Ng,pc segments
in the sample for group g. Move to the next group.
• Case 2: mg = ng − Ng,pc > 0 and Ng,r ≥ mg . Use one-two sampling to select mg
segments from the Ng,r rotation segments in group g.
• Case 3: mg = ng − Ng,pc > 0 and Ng,r < mg . Include all Ng,r segments in group
g. If Ng,c ≥ mg − Ng,r , then use one-two sampling procedure to select a sample of
size mg − Ng,r from the Ng,c core segments. Otherwise, include all Ng,c segments,
and print a warning message.
13

4.1.3

Selection of Points within Segments

A simple procedure is used to select points within segments. If a segment contains
a prior CEAP point, then one point is selected at random from the prior CEAP points
in the segment. If the segment contains no prior CEAP points, one point is selected at
random from the set of eligible points.

5 Summary of Realized Sample
Figure 2 compares realized point sizes (vertical axis) to target point sizes (horizontal
axis) for intersections of CEAP regions, sample classes, states, and LCU groups. Each
CEAP region is plotted separately. Table 12 contains realized and target point sample
sizes at the level of CEAP region and LCU group. The total sample size is 48666, and
1228 points are in either the 2013 or 2014 CEAP surveys, leaving 47438 points that
require data collection.

14

Figure 2: Realized (y-axis) and target (x-axis) point sample sizes for 10 CEAP regions.
Blue = CRP, green = pasture, and black = cropland.

15

CEAP Region
1
1
1
10
10
10
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9

LCU Group
1
3
4
1
3
4
1
3
4
1
3
4
1
3
4
1
3
4
1
3
4
1
3
4
1
3
4
1
3
4

Realized
1837
775
5
2208
1140
18
2986
1001
211
4988
712
499
4757
774
581
8135
1081
194
1048
3333
21
2105
2189
162
2319
1432
153
2969
889
144

Target
1995
614
8
2210
1131
25
2999
939
260
5008
649
542
4729
718
665
7530
1506
374
1449
2898
55
1904
2248
304
2342
1387
175
3062
798
142

Table 12: Realized and target point sample sizes for combinations of CEAP region and
LCU group (1=crop, 3=pasture,4=CRP).
The core can be used in two cases. The first case is if a core segment is a prior
CEAP segment. The second case is if there are not enough rotation segments to cover
the target sample size. The table below summarizes the use of the core for the selected
sample:

16

2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40

CEAP Region
1
1
10
10
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9

Prior CEAP
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1

Count
200
83
4
337
1
375
5
665
2
861
1
2518
114
71
50
298
6
472
16
711

Table 13: Counts of points in selected sample that are in the core. Prior CEAP = 1 if
point is a prior CEAP point and zero otherwise.

6 Division of Sample between Data Collection Years 2015 and 2016
• Begin by assigning the 2015 sample to be the subset of the selected sample with
most recent observed year of 2008 or 2010-2012 that are not in the core.
• The remaining points in the sample are assigned to be sampled in 2016.
• Re-adjust the 2015 and 2016 sample sizes so that the sample sizes are approximately equal for the two years for each state. This is is accomplished by completing the following two steps for each state. In the following, the notation [a]
is the closest integer to a. If the original 2015 sample size is equal to n16 + k,
where k > 0, then select [k/2] points from the 2015 sample for the state randomly
and assign the [k/2] points to the 2016 sample. If the original 2016 sample size is
equal to n15 + k, where k > 0, then let r = min{N16,03 , [k/2]}, where N16,03 is the
number of points in the 2016 sample for the state last observed in 2003. Select r
points from the subset of the 2016 sample for the state that are not last observed
in 2003, and randomly and assign the r points to the 2015 sample.

17

7 Programs to Compute Target Sample Sizes and Select Sample
1. estimatedareafrompgenoverlay1rev.R: estimated area of CEAP-eligible cropland
by CEAP region
2. SampleSelectCheck1.R: Compute target sample sizes by CEAP region and pointlevel target sample sizes. Prepare data sets at the PSU level.
3. SampleSelectCheck2.R: Complete sample selection.
4. evaluatesample1Rev.R: Tabular comparisons and maps.

Reference
• Bankier, M. D. (1988). Power allocations: determining sample sizes for subnational
areas. The American Statistician, 42(3), 174-177.

18


File Typeapplication/pdf
AuthorFlanagan, Patrick - NRCS, Beltsville, MD
File Modified2023-12-29
File Created2022-07-28

© 2025 OMB.report | Privacy Policy