2009 ASA Paper--An Estimation Procedure for the New Public Employment Survey Design

Att8 - govsrr2009-9.pdf

Public Employment & Payroll Forms

2009 ASA Paper--An Estimation Procedure for the New Public Employment Survey Design

OMB: 0607-0452

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 0607-0452 can be found here:

Document [pdf]

Download: pdf | pdf

Attachment 8

GOVERNMENTS DIVISION REPORT SERIES
(Research Report #2009-9)

An Estimation Procedure for the New Public Employment Survey Design

Yang Cheng

Casey Corcoran

Joe Barth

Carma Hogue

U.S. Census Bureau

Washington, DC 20233

CITATION: Cheng, Yang, Casey Corcoran, Joe Barth, Carma Hogue. 2009. An Estimation
Procedure for the New Public Employment Survey Design. Governments Division Report Series,
Research Report #2009-9

____________________________________
Report Completed: September 25, 2009
Report Issued: October 2, 2009

Disclaimer: This report is released to inform interested parties of research and to encourage discussion of work in
progress. The views expressed are those of the authors and not necessarily those of the U.S. Census Bureau.

Attachment 8

An Estimation Procedure for the New Public Employment Survey Design
Yang Cheng, Casey Corcoran, Joe Barth, Carma Hogue

Governments Division

U.S. Census Bureau1, Washington, D.C. 20233-0001

Abstract
The 2009 Public Employment Survey uses a new multi-stage sample method which combines
cut-off sampling based on unit size with stratified sampling to reduce the sample size, save
resources, and improve the precision of the estimates. In this paper, we propose fitting either two
separate linear models within size-based strata or one overall, based on the results of a hypothesis
test of equality of the model coefficients. We will study the properties and variance of this
estimation method. Data from the 2007 Census of Government Employment will be used to
compare our method to previous regression method.
Key Words: survey design, model-based estimator, regression, stratification, and optimum
allocation
1. Introduction
The Annual Public Employment Survey provides current estimates for full-time and part-time
state and local government employment and payroll by government function (i.e., elementary and
secondary education, higher education, police protection, fire protection, financial administration,
judicial and legal, etc.). This survey covers all state and local governments in the United States,
which include counties, cities, townships, special districts, and school districts. The first three
types of governments are referred to as general-purpose governments as they generally cover
several governmental functions. School districts cover only the education function. Special
districts cover generally one, but sometimes two functions. These are the only sources of public
employment data by program function and selected job category. Data on employment include
number of full-time and part-time employees, gross pay, and hours paid for part-time employees.
Reported data are for each government’s pay period that includes March 12. Data collection
began in March and continued for about seven months.
The methodology, questionnaires, full set of governmental functions, and classification manual
are available on www.census.gov/govs/index.html.
In 2007, the Committee on National Statistics, National Research Council, released the findings
of a two-year study of the U.S. Census Bureau’s surveys of state and local governments. In
response to the recommendations given in the study and also to concerns about small units
expressed by Census Bureau survey analysts, we decided to look into ways to modify the
sampling method.
Currently, a stratified, modified probability proportional-to-size sample method is used to obtain
annual national and state level estimates. The current sample design yields a large number of
small townships and special districts. These units only account for a very small part of the final
1

This report is released to inform interested parties of research and to encourage discussion of
work in progress. Any views expressed on statistical, methodological, or operational issues are
those of the authors and not necessarily those of the U.S. Census Bureau.

Attachment 8

estimates, and have a poor response rate. Within a geographic area, there is usually very little
variability in the responses from small units for the same type of government. The objective of
this research was to design a sample that would reduce the number of small units in certain
problematic areas of the country.
After exploring possible cut-off sample methods for the Annual Public Employment Survey, we
suggested an alternative sample method based on the current stratified sample design to reduce
the sample size, save resources, and improve the precision of the estimates. We introduced a
modified cut-off sample method, which was achieved in two stages. We first stratified the sample
by state and government type. Later, we applied a cumulative square root frequency method to
determine the cut-off point with respect to the size of unit in the problematic special districts and
sub-counties (cities and townships) with two constraints: 1) sample size in the stratum is more
than 50; and 2) sample size below the cut-off point is more than 20. The cut-off point serves as a
decision point for distinguishing small and large governmental units in the stratum. In the second
stage, we applied a sub-sampling method with a fixed rate for these special districts and subcounties satisfying the two conditions given above. For more information on the methodology,
see Barth et al. (2009).
2. Standard Approaches to Estimation
Before we investigate a new estimation procedure that corresponds to the new modified cut-off
sample method, let’s first introduce some notation and definitions. Let U be a finite population of
all local governments. There are 89,526 units in our universe, U =

UUU

, where g is the

g =1 h=1

state index, and h represents the government type (h=1 for county governments, h=2 for subcounty governments – includes municipal governments and township governments, h=3 for
special districts, and h=4 for school districts).
Let X denote the variables of interest from the 2002 Census of Government Employment, such as
full-time employment, full-time payroll, part-time employment, and part-time payroll. Let Y
denote the corresponding variables we measured in 2007. Also, we define a new variable, Total
Pay, for each government unit by combining the full-time and part-time payrolls. We used Total
Pay as the size of each government unit when we applied the proportional-to-size sampling
method.
Before we introduce our proposal for estimation, we demonstrate how the standard estimation
approaches work for the modified cut-off sample method.
2.1 Design-Based Approach
The first standard approach is a design-based method. We apply the Horvitz-Thompson (H-T)
estimator (Cochran, 1977, p. 259). For given state g and government type h, a sample of
n gh units is selected, without replacement. Let π ghi be the first-order inclusion probability for
the ith unit in the sample, and π ghi, j be the second-order inclusion probability when the ith and
jth units are both in the sample. The Horvitz-Thompson (1952) estimator of the population total,
G

H N gh

Y = ∑∑∑ y ghi , is
g =1 h=1 i=1

Attachment 8

H ngh

y ghi

g=1 h=1 i=1

ghi

∑∑∑π

Y$HT =

where y ghi is the measurement of the variable of interest for the ith sampled unit in state g and
government type h.
The H-T estimator,

Y$HT ,
G

is an unbiased estimator of Y, with theoretical variance
H N gh

V (YˆHT ) = ∑∑∑

(1 − π )
ghi

π ghi

g =1 h=1 i=1

H N gh N gh

+ 2∑∑∑∑

2
ghi

(π

ghi, j

g =1 h=1 i=1 j〉i

( ) is

An unbiased sample estimator of V YˆHT
G

n gh

Vˆ (YˆHT ) = ∑∑∑
g =1 h=1 i=1

(1 − π )
ghi

2
ghi

n gh n gh

2
y ghi
+ 2∑∑∑∑
g =1 h=1 i=1 j〉i

− π ghi π ghj )

π ghi π ghj

(π

ghi, j

y ghi y ghj

− π ghi π ghj )

π ghi π ghj π ghi, j

y ghi y ghj

For this paper, we tested the estimator and variance estimator only for selected states: Alabama,
California, Pennsylvania, and Wisconsin. These states were selected because they represent
small states, large states, township states with fully functioning township governments
(Pennsylvania) and townships that have limited government functions (Wisconsin). Also, we
reduced the sample size in sub-county government units in Alabama, reduced the sample size in
special districts in California, and reduced the sample size in both sub-counties and special
districts in Pennsylvania and Wisconsin. Thus, we end with having five strata in Alabama
(counties, small sub-counties, large sub-counties, special districts, and school districts), five strata
in California (counties, sub-counties, small special districts, large special districts, and school
districts), and six strata in Pennsylvania and Wisconsin (counties, small sub-counties, large subcounties, small special districts, large special districts, and school districts). Table 1 displays
actual numbers from the 2007 Census of Government Employment, compared with the H-T
estimator, difference between the true value and H-T estimator, relative difference, and
Coefficient of Variation (CV) of the H-T estimator.
Table 1: Comparison of the H-T Estimates based on PES to 2007 Census of Government
Employment
Source: U.S. Census Bureau, 2007 Census of Government Employment. Payroll is in $1,000s.
CV (%)
H-T
Difference
Relative
2007 Census of
Estimate
Difference
Government
(%)
Employment
Alabama
Full-time
Employment
Full-time
Payroll
Part-time
Employment
Part-time
Payroll
California
Full-time
Employment

183,506

188,539

5,033

2.74

1.97

552,926

563,886

10,960

1.98

1.90

28,281

25,538

-2,743

-9.70

4.63

24,747

24,198

-549

-2.22

2.84

1,228,513

1,214,564

-13,949

-1.14

0.68

Attachment 8

Full-time
Payroll
Part-time
Employment
Part-time
Payroll
Pennsylvania
Full-time
Employment
Full-time
Payroll
Part-time
Employment
Part-time
Payroll
Wisconsin
Full-time
Employment
Full-time
Payroll
Part-time
Employment
Part-time
Payroll

6,626,856

6,583,673

-43,183

-0.65

0.48

509,494

499,394

-10,100

-1.98

1.97

715,268

699,111

-16,157

-2.26

1.42

384,145

388,250

4,105

1.07

1.96

1,493,150

1,525,592

32,443

2.17

1.66

111,050

128,262

17,212

15.50

8.69

98,620

108,772

10,152

10.29

8.53

181,370

177,861

-3,509

-1.93

1.76

702,900

706,111

3,212

0.46

1.06

91,103

103,087

11,984

13.15

15.04

82,044

82,554

510

0.62

6.01

In examining these data, we conclude that the weights among the sample units are properly
distributed. For a given 95 percent significant level, all true values are falling into the confidence
interval of the H-T estimator. The CVs for full-time employment and full-time payroll by state
are very small. CVs for part-time employment and part-time payroll by state are reasonable and
stable, except for part-time employment in Wisconsin. These CVs show that the variability of
full-time variables is much smaller than that of part-time variables.
2.2 Model-Based Approach
The second standard approach is a model-dependent method. When using a probability sampling
design, some still prefer inferences that are model-dependent. The linear regression estimation is
a method used to increase precision by the use of an auxiliary variable, xi , the information from
2002 Census of Government Employment, which is correlated with the same information, y i , in
2007. In model-dependent inference, no matter how the sampling plan and estimator are
obtained, inference is made on the basis of the model. Those using model-dependent methods
have asserted that once the sample is drawn, the probabilities of selection are irrelevant. They
regard the assumption of a model and the use of a best estimator under the model as essential.
Model-dependent design and inference may have substantial advantages if the model is
appropriate. Based on prior experience or from scatter plots on the variables of interest from the
individual sampled local government units, the relationship observed between 2002 and 2007
could be represented approximately by a straight line, and the variability around the regression
line increases as x increases. When the relation between xi and y i is examined, it may be found
that although the relation is approximately linear, it does not need to go through the origin. Thus,
we propose a simple linear regression model:

Attachment 8

y ghi = a gh + bgh x ghi + ε ghi , ∀g = 1,..., G; h = 1,..., H ; i = 1,..., n gh
where y ghi and x ghi are obtained for every government sample unit in state g and type of
government h. We can estimate a gh and bgh using only the sample data. Therefore, the linear
regression estimate of the total population, Y, is

(

H N gh

yˆ r = ∑∑∑ aˆ gh + bˆgh x ghi
g =1 h=1 i=1

)

(

)

yˆ r = ∑∑ N gh y gh + bˆgh (X gh − x gh )
G

g =1 h=1

where bˆgh is called the linear regression coefficient of y ghi on x ghi in the finite population and is
computed from the sample. The least square estimate of bgh is

∑ (x

− x gh )( y ghi − y gh )

n gh

bˆgh =

ghi

i=1

∑ (x
n gh

− x ghi )

ghi

i=1

where aˆ gh = y gh − bˆgh x gh , x gh =

1
n gh

n gh

∑ x ghi and y gh =
i=1

1
n gh

n gh

∑y

ghi

i=1

If the model holds, the variance of the model-dependent estimator is reduced regardless of the
procedure used for sample selection. We only need to estimate the parameters for slope and
intercept from the sample. In most cases, the regression line goes through the origin when we
observe the sample scatter plots. If the straight line goes through the origin, then the best
estimator of slope, b, is simply the ratio of the sample means, i.e., bˆgh = y gh / x gh .
Based on the assumptions of the model-based approach, we can obtain the approximate
theoretical variance of the regression estimator yˆ r as
G

V ( yˆ r ) ≈ ∑∑

where ρ gh = S gh , xy
2
S gh
,x =

∑ (x

g =1 h =1

gh , x

N gh

N gh − 1 i =1

− x gh )
1

N gh − 1

N gh

i =1

ghi

)

g =1 h =1

gh
1
(y gh,i − y gh )2 are the population
∑
N gh − 1 i =1

2
S gh
,y =

and

∑ (x
unbiased sample estimator of V ( yˆ ) is
N (N
Vˆ ( yˆ ) ≈ ∑∑
n (n
variances, and S gh , xy =

(

* S gh , y ) is the population correlation between variables x gh and y gh ,
2

gh ,i

N gh (N gh − n gh ) 2
2
S gh , y 1− ρ gh
n gh

gh
gh

− x gh )( y ghi − y gh ) is the population covariance. The

− n gh ) ngh

∑ [(y
− 2)
i =1

ghi

]

2
− y gh ) − bˆgh (x ghi − x gh ) .

Attachment 8

Again, we apply the linear regression model and calculate the model-dependent estimator and its
variance estimator based on the same sample data from Alabama, California, Pennsylvania, and
Wisconsin for full-time employment, full-time payroll, part-time employment, and part-time
payroll. The model-dependent estimators for full-time employment and full-time payroll are as
good as the H-T estimators when compared to the marginal survey total from 2007 Census of
Government Employment. Also, the CVs of the survey total estimate from the model-dependent
approach are smaller than from the design-based approach. For part-time employment and parttime payroll, model-based estimates give us better estimators and smaller variation. Therefore,
we conclude that the estimates from the model-dependent method have better precision than those
from the Horvitz-Thompson method.
2.3 Model-Assisted Approach
The model-assisted methodology is commonly used in the Census Bureau, and is a method
between the design-based approach and the model-based approach. We assume the model fits the
population reasonably well. However, we cannot make the assumption that the population was
really generated by the model. Ultimately, the model serves as the vehicle for finding an
appropriate regression coefficient to put into the regression estimator formula. The efficiency of
the regression estimator, as compared to the design-based estimator, will depend on the goodness
of the fit. The basic properties (approximate unbiasedness, validity of the variance formula, etc.)
are not dependent on whether the model holds or not. This procedure is called model-assisted,
not model-dependent.
For a simple linear model, we can estimate the parameters, a gh and bgh from the whole
population. Because we do not know the whole population, we use data from the sample to
calculate the statistics that are used to estimate the slopes and intercepts in the universe.
Therefore, model-assisted estimates are determined by both model selection and sample design.
Given state g and government type h when g = 1,..., G; h = 1,..., H , we can write the population
least squares line relating to x ghi and y ghi as

y ghi = a gh + bgh x ghi + ε ghi , ∀g = 1,...,G; h = 1,..., H ;i = 1,..., N gh
The parameters a gh and bgh are defined in terms of population first and second moments as
follows:
2
2
aˆ gh = Y gh − bˆ gh X gh and bˆgh = S gh,xy
S gh,x
2
is the population variance, and X gh and Ygh are
where S gh, xy is the population covariance, S gh,x

the population means. To obtain estimators, we replace population moments in the above
formulas with design-weighted sample moments. We have

∑ (x
n gh

bˆ gh =

i =1

ghi

− x gh )( y ghi − y gh ) π ghi

∑ (x
n gh

i =1

− x gh ) π ghi
2

ghi

where x gh and y gh are the Horvitz-Thompson estimators of X gh and Ygh .
The approximate unbiased sample variance estimator is

Attachment 8

[

]

gh
2
1
(
y ghi − y gh ) − bˆgh (x ghi − x gh ) π ghi
∑
g =1 h =1 (n gh − 2 ) i =1

Vˆ ( yˆ r ) ≈ ∑∑

From the same data set in Alabama, California, Pennsylvania, and Wisconsin, we can calculate
the survey total estimates using a model-assisted approach, and compare the results to the 2007
Census of Government Employment. We then calculate the variance estimators of survey total
and CVs for four states and four variables. All results are listed in Table 2 below. We conclude
the following significant results: 1) all CVs in the model-assisted approach are much smaller than
the H-T estimates; 2) the ranges of relative differences between the model-assisted estimates and
the totals of 2007 Census of Government Employment is from 0.02 percent to 7.65 percent,
which can be compared with the ranges of relative differences between the H-T estimates and the
totals of 2007 Census of Government Employment (from 0.62 percent to 15.50 percent), and in
most cases, the relative difference improves when applying the model-assisted method; and 3) the
model-assisted estimation significantly improves the precision of survey total estimates for parttime employment and part-time payroll. When the data follows the model, we have very similar
estimates for model-assisted and model-dependent methods. Estimates are much better using a
model-assisted approach than the model-dependent when the models do not fit. Compared with
Table 1, we find that the model-assisted methods are much better than the H-T estimator for this
sample design.
Table 2: Comparison of Model-Assisted Estimates on PES to 2007 Census of Government
Employment
Source: U.S. Census Bureau, 2007 Census of Government Employment. Payroll is in $1,000s.
2007 Census of
H-T
Difference
Relative
CV (%)
Government
Estimate
Difference
Employment
(%)
Alabama
Full-time
183,506
194,894
11,388
6.21
0.16
Employment
Full-time
552,926
578,307
25,381
4.59
0.16
Payroll
Part-time
28,281
30,445
2,164
7.65
0.56
Employment
Part-time
24,747
25,909
1,162
4.70
0.39
Payroll
California
Full-time
1,228,513
1,220,421
-8,092
-0.66
0.07
Employment
Full-time
6,626,856
6,594,102
-32,754
-0.49
0.09
Payroll
Part-time
509,494
509,377
-117
-0.02
0.10
Employment
Part-time
715,268
716,422
1,154
0.16
0.14
Payroll
Pennsylvania
Full-time
384,145
391,407
7,262
1.89
0.11
Employment
Full-time
1,493,150
1,542,324
49,174
3.29
0.10
Payroll

Attachment 8

Part-time
Employment
Part-time
Payroll
Wisconsin
Full-time
Employment
Full-time
Payroll
Part-time
Employment
Part-time
Payroll

111,050

117,341

6,291

5.66

0.18

98,620

98,568

-52

-0.05

2.81

181,370

182,707

1,337

0.74

0.10

702,900

716,115

13,215

1.88

0.10

91,103

97,070

6,272

6.91

0.22

82,044

81,328

-716

-0.87

0.30

2.4 Summary
Using the 2002 Census of Government Employment as the new universe sample frame, we
applied the modified cut-off sample method to select mock samples from the 2007 Census of
Government Employment. We calculate the survey total estimates for four variables of interest:
full-time employment, full-time payroll, part-time employment, and part-time payroll in
Alabama, California, Pennsylvania and Wisconsin. Later, we compare these estimates using
three standard methods (the design-based approach, the model-dependent approach, and the
model-assisted approach) with the true values we get from the 2007 Census of Government
Employment. For full-time employment and full-time payroll, the estimates from design-based,
model-dependent, and model-assisted all look good. Estimates from the design-based and the
model-assisted approaches are slightly better. For part-time employment and part-time payroll,
we conclude that the estimates from the model-assisted method and the model-dependent
approach are better than the design-based approach. Additionally, we find that when the data fit a
model very well, the model-dependent approach appears to have better estimates and variance
estimators. Because we sampled with probability proportion-to-size and the size of government
is the total pay, we find that the design-based estimates work well for full-time payroll and fulltime employment cases. But, if the model fit is not perfect and the sample design is problematic,
the model-assisted method works better than the design-based and model-dependent methods. In
most of the cases, we find that the model-assisted estimators are better than H-T estimators and
model-based estimators for part-time employment and part-time payroll.
3. Decision-Based Approach
Now, we introduce a decision-based approach in order to improve the precision of estimates and
reduce the mean square error for the survey total estimate. The idea is to test the equality of
linear regression lines to determine whether we can combine data in different strata. Let us start
with the following Lemma .
Lemma: When we fit two linear models for two separate data sets, if a1 = a 2 and b1 = b2 , then
the variance of the coefficient estimates is smaller for the combined model fit than for two
separate stratum models when the combined model is correct.
For some sub-counties and special districts that satisfy the sample size described in Section 1, we
apply a cumulative square root frequency method and create two strata within the same type of
government: small units group and large units group. Data from small and large government

Attachment 8

units are drawn from the same government type. Should we estimate the survey total of key
variables by combining small and large unit data or should we keep them separately? To answer
this question, we test the equality of two linear regression lines in small versus large government
units within sub-counties and special districts (where sub-sampling has occurred in the small
government units stratum). Secondly, we evaluate the linear regression among all four types of
governments within any given state to determine whether we can combine data with different
government types.
3.1 Test of Two Regression Lines
We have the following procedure to test two linear regression lines. First, we compare the slopes
by testing the null hypothesis that the slopes are identical (the lines are parallel). The test statistic

(

is t gh = bˆ gh ,1 − bˆ gh ,2

, where the standard error of the difference between the

b gh ,1 − b gh , 2

regression coefficients is

(s )

2
gh,xy p

sbgh,1 −bgh,2 =

⎛
⎞
2 ⎟
⎜ ∑ xgh,i
⎜ i∈S
⎟
⎝ gh,1 ⎠1

and the pooled residual mean square is calculated as

(s )

2
gh , xy p

∑ (y

i∈S gh ,1

− yˆ gh ,i ) +
2

gh ,i

(s )

2
gh,xy p

⎛
⎞
2 ⎟
⎜ ∑ xgh,i
⎜
⎟
⎝ i∈S gh, 2 ⎠ 2

∑ (y

i∈S gh , 2

− yˆ gh ,i )

gh ,i

n1 + n2 − 4

where the subscripts 1 and 2 refer to the two regression lines being compared. The critical value
of t gh for the test has (n1 − 2 ) + (n2 − 2 ) degrees of freedom, namely, v gh = n gh ,1 + n gh ,2 − 4 .
If the P value is less than 0.05, we reject the null hypothesis and conclude that the regression lines
are significantly different. In this case, there is no reason to compare the intercepts. If the P
value for comparing slopes is greater than 0.05, we can’t reject the null hypothesis. Therefore,
we conclude that the slopes are not significantly different. Now, we calculate a single slope from
combining two data sets. Our next question is whether two regression lines are either parallel or
identical.
We test whether two regression lines are parallel or identical by checking whether the two
regression lines have the same intercept. To do this, we need to calculate the slope and intercept
for the two combined data sets. Also, we need to develop an appropriate test statistic as follows:

t gh =

⎡

(s ) ⎢1 n
2
gh , xy c

⎣⎢

gh ,1

− y gh ,2 ) − bˆgh ,c (x gh ,1 − x gh ,2 )

gh ,1 +1 n gh ,2 + (x gh ,1 − x gh ,2 )

where bˆgh ,c is a slope for the combined two data sets,

∑ y gh2 ,i

the combined regression, which equals to

i∈S gh

gh,xy

⎛
⎞⎤
2 ⎟
⎜ ∑ x gh
,i ⎥
⎜
⎟
⎝ i∈S gh
⎠⎥⎦

) is the mean square of residual for
c

⎛
⎞
− ⎜ ∑ x gh ,i y gh ,i ⎟
⎟
⎜
⎝ i∈S gh
⎠
n gh − 3

⎛
⎞
2 ⎟
⎜ ∑ x gh
,i
⎜ i∈S
⎟
⎝ gh
⎠ . If the P value

Attachment 8

is low, we reject the null hypotheses, and conclude that the regression lines are not identical. If
the P value is high, we can’t reject the null hypothesis, and must conclude that there is no
compelling evidence that the regression lines are different.
3.2 Test of More Than Two Regression Lines
Similarly, we test slopes first, and then test intercepts. To compare more than two slopes, we can
test H 0 : b1 = b2 = ... = bk with k > 2 against the alternative hypothesis that the k regression
lines were not derived from samples estimating populations among which the slopes were all
equal. To compare k regression lines, we need to calculate the sample variance of x and y, the
sample covariance of x and y, and the sums of squares of the residuals and the degrees of freedom
for each regression line. The pooled residual sum of square SS p is the sum of k individual sum

( )

squares

∑(y
i=1

residual.

− y ) − ∑ (xi − x )( yi − y )
2

the

i=1

The
k

∑ (x

common

residual

sum

squares

(SS c )

− x ) . To test H 0 : b1 = b2 = ... = bk we calculate the
2

i=1

F statistic
F=

⎛ SS c − SS p ⎞
⎟
⎜
⎜
⎟
k −1
⎝
⎠

SS
∑ n − 2k
p

i=1

with the numerator and denominator degrees of freedom of k-1 and

∑n

− 2k , respectively.

i=1

If we reject the null hypothesis H 0 , it means k regressions do not have similar slopes. Next, we
can test k groups of (k-1) regression lines. If k-1=2, then we can apply the method in Section 3.1.
If we cannot reject the null hypothesis, we conclude that all population slopes underlying our k
samples of data are equal. In this situation, it is reasonable to ask whether all k population
regression lines are identical.
To test the null hypothesis of equality of intercepts, we combine the data from all k samples, and
compute a residual sum of squares, SS t . The null hypothesis is tested with the test statistic
⎛ SS t − SS c ⎞
⎜
⎟
⎟
⎜
k −1
⎝
⎠
F=

SS
∑ n − k −1
c

i =1

with k-1 and

∑n

− k −1 degrees of freedom.

i=1

If the null hypothesis is rejected, we can employ multiple comparisons to determine the location
of significant differences among the elevations. If it is not rejected, then all k sample regression
lines are an approximation of the same population regression line, and the best estimate of
underlying population regression is given by the Lemma.
In the modified cut-off sample design, we plan to test the relationships of up to four different
government types within the state. If we cannot reject all hypotheses, we will combine some

Attachment 8

government types to have a better estimator with a lower variation for the purpose of improving
the precision of estimators.
3.3 Decision-Based Method
The decision-based method first combines the data from different strata by the sample design
through the hypothesis test of the equality of the model coefficients, and then applies the modelassisted method to estimate the annual survey totals and their related variances. When we apply
the decision-based method for the Public Employment Survey (PES), we have the following
specific steps: 1) apply a simple linear regression model for each stratum based on our new twostage sampling method; 2) perform a hypothesis test on small versus large government units, and
determine whether we can combine or keep the two strata separate; 3) perform a hypothesis test
on different government types for any given state; 4) fit a simple linear model for the new defined
data group; and 5) apply the model-assisted method to compute the survey totals and their CVs.
In the next section, we will demonstrate some numerical results from applying the decision-based
method.
4. Numerical Results
Based on the 2002 Census of Government Employment, Alabama, California, Pennsylvania, and
Wisconsin were ranked as the twenty-eighth, fourth, second, and eleventh, respectively, among
the states with respect to the number of local governments. Table 3 summarizes the overall
frame by government type for those states.
Table 3: Government Organization for Studied States in 2002
Source: U.S. Census Bureau, 2002 Census of Government Organization
Type of Government
State
Counties (1)
Cities and
Special
School
Townships (3) Districts (4) Districts (5)
Alabama
67
458
529
131
California
57
478
2,765
1,044
Pennsylvania
66
2,562
1,728
515
Wisconsin
72
1,851
756
441
Subtotal
262
5,349
5,778
2,131

Subtotal
1,185
4,344
4,871
3,120
13,520

Our modified two-stage cut-off sample design is equivalent to a stratified sampling with four to
six strata for each state. In Alabama, we have five strata: county (labeled as 1), small sub-county
(labeled as 31), large sub-county (labeled as 32), special district (labeled as 4), and school district
(labeled as 5). In California, we also have five strata: county (labeled as 1), sub-county (labeled
as 3), small special district (labeled as 41), large special district (labeled as 42), and school
district (labeled as 5). For Pennsylvania and Wisconsin, we have six strata each: county (labeled
as 1), small sub-county (labeled as 31), large sub-county (labeled as 32), small special district
(labeled as 41), large special district (labeled as 42), and school district (labeled as 5).
The first step of our decision-based approach estimation procedure is to test small sub-counties
(31) versus large sub-counties (32) in Alabama, Pennsylvania, and Wisconsin as well as to test
small special districts (41) versus large special districts (42) in California, Pennsylvania, and
Wisconsin regarding variables: full-time employment, full-time payroll, part-time employment,
and part-time payroll. Applying the method described in Section 3.1, we reject the null
hypothesis for full-time employment in California’s small special districts as compared with large

Attachment 8

special districts. For Wisconsin, we reject all null hypotheses except from the part-time
employment in special districts. We cannot reject any other categories of small government units
and large government units. Thus, we combine these sample data from small government units
and large government units together for a better estimate.
Table 4: Combine strata based on the results of hypothesis tests of equality of the model
coefficient
Full-Time
Full-Time
Part-Time
Part-Time
Employment
Payroll
Employment
Payroll
(1,3,4,5)
(1,3,4), (5)
(1,3,4), (5)
Alabama
(1,3,4), (5)
(1,3,5), (40), (41)
(1,3,4, 5)
(1), (3,4,5)
(1), (3,4,5)
California
(1), (3,4,5)
(1,5), (3,4)
(1), (3,4), (5)
(1), (3,4,5)
Pennsylvania
(1), (30), (31),
(1,5), (30), (31),
(1,4), (30),
(1,5), (30),
Wisconsin
(40), (41), (5)
(40), (41)
(31), (5)
(31), (40), (41)
If we cannot combine the small and large government units in the category of sub-county or
special district, then we only test three government types: county, special district and school
district or county, sub-county and school district. Otherwise, we will test four government types:
county, sub-county, special district, and school district. After a series of tests, we conclude the
following: 1) we should combine all data for full-time payroll in Alabama and California; 2) for
full-time employment, part-time employment, and part-time payroll in Alabama, we should
combine data from county, city and township, and special district. Thus, we will fit two separate
regression lines; 3) for full-time employment and part-time payroll in Alabama, and for part-time
employment and part-time payroll in California, we should combine data from cities and
Figure 1: Linear fits for small and large special districts in California regarding full-time
payroll versus linear fit for data combining small and large special district

Attachment 8

townships, special districts, and school districts. Thus, we will fit two separate regression lines;
and 4) for full-time payroll and part-time payroll in Wisconsin, we are only able to combine data
from counties and school districts. Therefore, we have five different regression lines.
Table 4 displays all possible combinations from the null hypothesis, which test the equality of the
model coefficient. Symbol (1,3,4) means that we can group data from counties, cities and
townships, and special districts, and then fit one simple linear model. Symbol (5) means that we
can model by data from school districts without the other government types. For Alabama,
California, and Pennsylvania, we are able to combine data from different government types. But
for Wisconsin, we can’t combine data in the different categories except combining data from
counties and school districts for full-time payroll and part-time payroll, and combining data from
counties and special districts for part-time employment.
Figure 1 displays how the decision-based approach works on small government units as
compared with large government units. We use full-time payroll in California as an example.
Two solid straight lines are linear regression fits for small and large special districts. They are
not the same, but have very similar slopes and a small difference between two intercepts. Since
we cannot reject the null hypothesis of testing equality of the model coefficients and claim model
coefficients are significantly different, we combine the small and large government units to
reduce model error when we apply the model fit for combining data for separate stratum models.
The dotted line is the best linear fit for combining data.
Figure 2: Linear fit for individual government type vs. linear fit for data combining all
government types

Attachment 8

Again, we use full-time payment in California as an example to demonstrate how the hypothesis
tests of equality work among the four different government types. In Figure 2, the solid line is
the best linear fit for each individual government type, and the dotted line is the best linear fit for
data combining all government types. We can see from Figure 2 that the difference between two
straight lines for different types of government is very small, and for the county government that
are almost identical.
Finally, we compare the Coefficients of Variation (CV) among the H-T estimator, the modelassisted estimator, and our proposed decision-based estimator. We calculated the CV for full-time
employment, full-time payroll, part-time employment, and part-time payroll in the states of
Alabama, California, Pennsylvania, and Wisconsin.
Table 5 displays the CV comparison for three estimation methods among four variables of
interest and four states. All CVs from the H-T method are less than 2 percent for full-time
variables. Some CVs are pretty large for part-time variables, especially for Pennsylvania and
Wisconsin. All CVs from the model-assisted approach are significantly improved over the H-T
method. Only one CV is more than 0.5 percent for part-time employment in Alabama. Most of
the CVs are less than 0.2 percent. All CVs in the decision-based approach are less than those
from the model-assisted approach. In most cases, it seems that the improvement is small. For
example, the CV for the part-time employment estimate in Wisconsin is 0.220 percent when
applying the model-assisted method as compared with 0.215 percent when applying the decisionbased method. However, we see some significant changes between the model-assisted and
decision-based approaches. For example, CVs for the full-time payroll estimates in Alabama and
California, or full-time employment estimates in California, have improved by more than 50
percent. They are from 0.159 percent, 0.086 percent, and 0.069 percent down to 0.070 percent,
0.035 percent, and 0.022 percent, respectively.
Table 5: CV comparison among H-T, model-assisted, and decision-based

ft_emp
ft_pay
Alabama
pt_emp
pt_pay
ft_emp
ft_pay
California
pt_emp
pt_pay
ft_emp
ft_pay
Pennsylvania
pt_emp
pt_pay
ft_emp
ft_pay
Wisconsin
pt_emp
pt_pay

H-T
Model
1.971%
1.898%
4.628%
2.839%
0.675%
0.484%
1.966%
1.420%
1.963%
1.655%
8.690%
8.528%
1.762%
1.065%
15.045%
6.014%

AssistedDecision Based
0.158%
0.136%
0.159%
0.070%
0.564%
0.531%
0.385%
0.363%
0.069%
0.022%
0.086%
0.035%
0.102%
0.088%
0.140%
0.129%
0.111%
0.087%
0.101%
0.082%
0.179%
0.172%
0.281%
0.163%
0.096%
0.096%
0.100%
0.077%
0.220%
0.215%
0.301%
0.233%

Attachment 8

5. Future Plans
In the future, we will plan to investigate more complicated models instead of simple linear
regression models by exploring more variables such as population size which may affect
variables of interest in: full-time employment, full-time payroll, part-time employment, and parttime payroll. We can even explore some nonlinear models or nonlinear estimators.
Secondly, we will also address the accuracy of the variance estimator in the decision-based
approach. A simple standard variance formula may not be suited for our complicated survey
design. Plus, there are many variations attributable to data collection such as missing data or
nonresponse error. We are exploring a variance estimator based on the concept of replication
methods such as random groups, balanced half-samples, and jackknife. A bias study should be
conducted as well.
Finally, we need to consider a data simulation study to quantify variance due to the decision
(group merging) process. The keys for a decision-based estimation are to group data in different
categories through testing a series of hypotheses of equality of model coefficients. We plan to
study whether the variation exists during these hypothesis tests and how much the variance
increases.
Acknowledgements
We acknowledge the contributions of Dr. Eric Slud from the Statistical Research Division of the
U.S. Census Bureau and from the University of Maryland at College Park, and Dr. Patrick
Cantwell from the Decennial Statistical Studies Division of the U.S. Census Bureau. Also, we
are indebted to our reviewers, Dr. Eric Slud, Ms. Rita Petroni, and Ms. Lisa Blumerman for their
helpful suggestions, which have improved the original paper.
References
Barth, J., Cheng, Y., Hogue, C. (2009), Reducing the Public Employment Survey Sample Size,
JSM 2009
Cochran, W.G. (1977), Sampling Techniques. Third Edition. New York: John Wiley & Sons, Inc.
Hansen, M.H., Madow, W.G., and Tepping, B.J. (1983), An Evaluation of Model-Dependent and
Probability-Sampling Inference in Sample Surveys, Jour. Amer. Stat. Assoc., 78, 776-793.
Horvitz, D.G., and Thompson, D.J. (1952), A generalization of sampling without replacement
from a finite universe, Jour. Amer. Stat. Assoc., 47, 663-685.
Pardoe, I. (2006), Applied Regression Modeling – A Business Approach. New York: John Wiley
& Sons, Inc.
Sarndal, C.-E., Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling,
Springer-Verlag.
Zar, J.H. (1999), Biostatistical Analysis. Third Edition. New Jersey, Prentice-Hall

File Type	application/pdf
File Title	Microsoft Word - Document1
Author	hogue001
File Modified	2009-12-02
File Created	2009-10-02