Weighting and Imputation for Missing Data in Fisheries Economic and Social Surveys

Lew_Himes-Cornell_Lee2014_weighting_data_imputation_paper_072214.pdf

Alaska Halibut Catch Sharing Plan Survey

Weighting and Imputation for Missing Data in Fisheries Economic and Social Surveys

OMB: 0648-0705

Document [pdf]
Download: pdf | pdf
Weighting and Imputation for Missing Data
in Fisheries Economic and Social Surveys

July 22, 2014

Daniel K. Lew, Amber Himes-Cornell, and Jean Lee

Proposed running head:
“Missing Data in Economic and Social Surveys”
JEL Classification Codes
Q22, C8

This article and its findings are those of the authors and do not necessarily reflect the views of
the National Marine Fisheries Service or the U.S. Department of Commerce. We thank Brian
Garber-Yonts, Geana Tyler, Scott Prose, and the Pacific States Marine Fisheries Commission for
their assistance with the collection of the data, and Ron Felthoven, Chang Seung, and
participants at the 2014 IIFET biennial forum for useful comments. All remaining errors are the
authors’.


National Oceanic and Atmospheric Administration, National Marine Fisheries Service, Alaska Fisheries Science
Center, Seattle, WA 98115 USA

Department of Environmental Science and Policy, University of California, Davis, CA 95616 USA

Pacific States Marine Fisheries Commission, Portland, OR 97202 USA

Corresponding author. E-mail: [email protected], Phone: (530) 752-1746, Mailing address: Department of
Environmental Science and Policy, University of California, One Shields Ave., Davis, CA 95616 USA

1

Weighting and Imputation for Missing Data in Fisheries Economic and Social Surveys

ABSTRACT
Surveys of fishery participants are often voluntary and, as a result, commonly have missing data
associated with them. The two primary causes of missing data that generate concern are unit
non-response and item non-response. Unit non-response occurs when a potential respondent
does not complete and return a survey, resulting in a missing respondent from those who had
been contacted to participate in the survey. Item non-response occurs in returned surveys when
an individual question is unanswered. Both types of missing data may lead to issues with
extrapolating results to the population. Numerous approaches have been developed to address
both types of missing data, and two of the principal ones, weighting and data imputation, are
discussed in this paper. We explain how to adjust data to estimate population parameters from
surveys and illustrate the effects of different weighting and data imputation approaches on
estimates of costs and earnings in the Alaska charter boat sector using data from a recent survey.
The results suggest that ignoring missing data will lead to markedly different results than those
estimated when controlling for the missing data.

Keywords: Alaska, charter boat fishing, data imputation, missing data, non-response bias,
sample weighting, survey methods

2

Weighting and Imputation for Missing Data in Fisheries Economic and Social Surveys

Surveys are commonly used in fisheries research to understand economic and social conditions
of specific populations of fishery participants, such as anglers, fishing communities, commercial
fishermen, and charter operators. To this end, surveys involve selecting a subset of the
population of interest (the sample), gathering information from the sample about variables of
importance, and generating estimates for the sample to make inferences about the characteristics
of the population. Probability-based samples (e.g., simple random samples and stratified random
samples) are generally used to ensure sample estimates have known statistical properties and
avoid selection bias, which can lead to samples that are not representative of the population if not
controlled for (e.g., Lohr 2010, Rea and Parker 2005).1 Provided the sample is representative of
the population and every element in the sample provides all the requested information, sample
estimates are generally accepted as good estimates of population parameters.2 However, in this
context, missing data can be problematic, as the representativeness of the sample is brought into
question, which undermines the ability of the survey results to be extrapolated to the population
and thus the overall utility of the information.
Survey researchers are generally concerned with two types of non-response that result in
missing data (e.g., Lohr 2010, Groves et al. 2004). The first type of non-response, unit nonresponse, refers to sampled individuals or entities (i.e., the targeted respondents contacted to
participate) that do not respond to any component of the survey. In the case of mail surveys, for
instance, this manifests as individuals, households, or businesses who receive the survey in the
mail, but do not complete and return the survey questionnaire. For voluntary surveys, some level
of unit non-response is expected, particularly in recent years as response rates in traditional

3

survey modes (i.e., mail and telephone) have declined (Dillman, Smyth, and Christian 2009;
Connelly, Brown, and Decker 2003; de Leeuw and de Heer 2002).
In fisheries-related economic and social surveys, it is common for researchers achieving
“good” response rates to assume representativeness of random samples, or alternatively, the
absence of non-response bias (i.e., the systematic difference between respondents and nonrespondents). Several benchmark response rate levels have been put forth as sufficient, or
“good.” For example, the results of Dolsen and Machlis (1991) have often been used to justify
ignoring potential non-response bias when response rates exceed 65% (e.g., Margenau and
Petchenik 2004). However, in a meta-analysis of survey response rates, Groves (2006) found
that response rates may not be a good predictor of the presence of non-response bias. This
suggests that it is generally insufficient to rely solely on a “good” response rate to evaluate the
potential presence of non-response bias.
In the broader survey literature, weighting methods have a long history of being used to
adjust the influence of sample respondents for providing information about the population (Brick
and Kalton 1996; Bethlehem 2002). Fisher (1996) appears to be the first to discuss the use of
weighting methods in fisheries surveys with an application to an angler survey. Given this, it is
surprising how few fisheries studies actually employ formal methods, such as weighting, to
adjust for non-response bias when non-response occurs. Among fishery-related studies, those
that use weighting methods to adjust for unit non-response have almost exclusively been in the
domain of recreational fishing surveys (Fisher 1996; Hunt and Ditton 2002; Tseng, Huang, and
Ditton 2012), although Knapp (1996; 1997) does apply non-response weighting for surveys of
commercial fishery participants in the Pacific halibut fishery in Alaska.

4

A second type of non-response also common in voluntary surveys is item non-response.
Item non-response refers to cases where individual questions in the survey are left unanswered.
For survey types in which answering each question is not compulsory, and especially for
questions that may be cognitively difficult to answer or viewed as too intrusive, item nonresponse tends to be pervasive. A variety of data imputation methods have been developed that
allow the use of incomplete surveys by replacing the missing values with imputed values so that
both item respondents and item non-respondents can be included in the analysis (Brick and
Kalton 1996; Little and Rubin 1989; Durrant 2009). The use of data imputation methods to
adjust for item non-response in fisheries studies is far less frequent than even the sparing number
of cases where unit non-response is addressed. In fact, we were unable to find any study in the
published literature that explicitly uses formal data imputation methods for dealing with item
non-response. Instead, the most common strategy used in the fisheries literature appears to be to
remove surveys with missing data for one or more variables—meaning that frequently only
surveys with completed responses are used for the analysis (e.g., Fisher 1997; Beardmore et al.
2011; Bacalso, Juario, and Armada 2013). More striking is that a far greater number do not even
acknowledge item non-response in their data.
Given the pervasiveness of unit and item non-response, it is surprising how little attention
has been given to ways of handling both types of non-response in fisheries economic and social
science surveys.3 In this paper we illustrate the use of weighting and data imputation methods to
adjust for missing data in an economic survey of charter boat fishing businesses in Alaska. We
restrict our efforts to a few commonly employed adjustment methods and illustrate the difference
between the methods in terms of the estimated population totals and associated standard errors.

5

To our knowledge, this is the first study to explicitly adjust for both unit and item non-response
in a fisheries survey.
The remainder of the paper is organized as follows: We present an overview of the
weighting and data imputation methods used to deal with missing data in the next section. This
is followed by a description of the data used to illustrate the application of several of these
methods and a presentation of the specific weighting and data imputation methods applied to the
data. Next, we present the results as applied to a survey of the Alaskan charter fishing sector.
The paper concludes with a discussion of the comparison of results across missing data methods
and future directions for research.

Weighting and Data Imputation Methods
There are several common ways of dealing with missing data in surveys. In this paper,
we focus on weighting methods typically used to adjust for representativeness of the sample due
in part to unit non-response, and data imputation methods used to address item non-response
among respondent data.

Weighting
Unit non-response is one of several types of missing data for which survey researchers
often attempt to compensate, such that the sample data can be used in analyses without concerns
over the missing data. The compensation mechanism often employed involves applying weights
to individuals in the responding sample that adjust for missing data associated with unreturned
questionnaires (Brick and Kalton 1996; Little and Rubin 1989). Weighting is also employed to
adjust for other sources of non-representativeness of the sample relative to the population, such

6

as when the sampling methods used result in unequal probabilities of any individual in the
population being selected for the sample. In data analysis, responses by an individual with a
weight greater than one will count more than those with a weight equal to or less than one. The
individual weight given to the ith respondent in a sample is denoted wi, which is commonly
represented as the product of several potential weights (e.g., Brick and Kalton 1996):

Individual weight for i (wi) = wi1 wi2 wi3.

(1)

In equation (1), there are three weights that make different kinds of adjustments.4 In statistical
sampling, adjustments are often made to samples to adjust, or correct, for departures from the
sample selection procedure that may occur if the sampling procedure employed leaves out one or
more population segments (Brick and Kalton 1996). We denote the weight that adjusts the
sample for sample selection as w1. This weight, often called the “base weight,” is equal to the
inverse of the probability of being selected for the sample (e.g., Little and Vartivarian 2003;
Brick and Kalton 1996).5 In a simple random sample of n respondents from a population of size
N, the base weight is equal to N/n for everyone in the sample. In a population census, where the
sample equals the population, w1 equals 1 since N = n. For cluster samples, w1 will be the same
for each individual within a cluster, but different across clusters.
The second weight, denoted as w2, represents the non-response adjustment weight. This
weight is applied to account for the potential differences between those who responded and those
who did not (from among all of the individuals contacted to participate). Generally, calculating
this weight requires information about both respondents and non-respondents.6 For example, the
most common approach to calculate non-response weights is to select one or more variables from

7

an external data source and calculate weighting classes, or adjustment cells (Brick and Kalton
1996; Little 1986). The weighting classes are discrete partitions of the data over one or more
variables for which there is information on both respondents and non-respondents. Weights are
calculated as the inverse of the frequency in each class. Alternatively, in cases in which there are
multiple variables known about both respondents and non-respondents that are believed to
influence the decision to respond or not respond, regression-based approaches can be employed
that estimate the probability of responding to the survey explicitly (e.g., Iannacchione, Milne,
and Folsom 1991; Micklewright, Schnepf, and Skinner 2012). To determine which variables
may distinguish respondents from non-respondents, a logit or probit model regressing response
or non-response on candidate variables with available data for both respondents and nonrespondents may be used (e.g., Moore and Tarnai 2002).
The final weight, denoted as w3, is the post-stratification weight. It represents a further
adjustment of the respondent sample data to ensure that the sample conforms to one or more
known population totals. Thus, post-stratification reduces the potential bias due to incomplete
coverage of the population (Brick and Kalton 1996). It is important to note that calculations of
w2 and w3 are distinguished by the information upon which they are based. The non-response
adjustment weight (w2) is based on differences in sample characteristics between respondents in
the sample and non-respondents in the sample. The post-stratification weight (w3) is based on
differences in population characteristics from the respondent sample that are typically evaluated
from external data sources (e.g., U.S. Census demographic data for general household surveys).
In post-stratification, respondents in each class—with respect to a specific variable known about
the population—is multiplied by a factor so that the weights for the class respondents sums to the
population total for that class. More formally, suppose that for the variable of concern, the total

8

size of the population is X and the totals for each class of respondents within the total population
(c = 1, 2,…C) are denoted Xc, such that X1 + X2 +…+XC = X. Furthermore, suppose that for the
sample, the class totals are denoted Xcs. Then the post-stratification weight, w3, is (Xc/Xcs).7 See
Holt and Smith (1979) for details.
As is clear from the discussion above, calculating w2 and w3 requires having at least some
information about non-respondents and the population, respectively, that can be used to compare
with the sample of respondents. In cases where there are no external sources of data on the
sample of respondents and non-respondents, researchers sometimes conduct follow-up surveys
of non-respondents to collect some basic information that can be used to generate the nonresponse adjustment weights, w2 (e.g., Arlinghaus, Bork, and Fladung 2008; Sutton and Ditton
2005). For recreational angler survey samples drawn from fishing license registries, there is
often some basic information, such as the location of residence, which may be utilized for
comparing the sample to the total population and developing the post-stratification weights, w3.
For surveys of commercial fishermen, auxiliary information about the population is generally
more abundant (though access varies), and often includes information about fishing vessels,
licenses and permits issued, and other information collected by state and federal regulators. For
more diffuse populations, such as stakeholder groups with indiscernible boundaries, information
for calculating these weights is more challenging to procure.

Data Imputation
In order to address item non-response, data imputation methods are employed to fill in
missing data with appropriate responses for specific questions that are not answered by
respondents. Brick and Kalton (1996), Little and Rubin (1989), and Durrant (2009) provide

9

useful summaries of several data imputation methods. Brick and Kalton (1996) note that
imputation methods can generally be thought of within a multiple regression framework.
Following this, suppose y is the variable of interest with item non-response. We let yr be the
value of y when reported and ym the value when missing because a respondent chose not to
provide an answer. In addition, suppose the vector z is a vector of auxiliary variables that are
used to impute values for y. Thus, for the ith observation, the regression

ymi = f(zmi) + mi

(2)

can be used to explain the differences between imputation methods. From this perspective,
imputation methods can be distinguished in two primary dimensions: (a) stochastic assumptions
(mi) and (b) the auxiliary variables used (zmi). The vector of auxiliary variables may include
data external to the survey, other variables from within the survey, or item responses for the
variable of interest (yr). Data imputation methods that allow for stochasticity are called
stochastic (or random) imputation methods. Those that do not are called deterministic
imputation methods. When auxiliary variables are used to explain variation in responses, the
approach is referred to as a regression imputation method. A special case occurs when all the
auxiliary variables in the regression are categorical, which results in imputation class methods.
In a common approach, researchers assume that auxiliary variables do not have an effect
and ignore potential effects of stochasticity. This results in a specific value being used to replace
missing values, such as the mean or median of item responses. When stochasticity is accounted
for in this approach, a residual term is added to the specified value. However, these simple,
single-value imputation approaches are less desirable for imputation of variables when there are

10

auxiliary variables available that are correlated with the variable and can better explain some of
its variation.
In imputation class approaches, a small number of auxiliary variables are used to classify
respondents. Simple imputation methods (assigning the mean of a class, for example) or
regression-based methods can then be used to assign values within each class of respondents.
Hot deck imputation is one type of imputation class approach (Andridge and Little 2010). In hot
deck imputation, the value from an item respondent (the donor) is assigned to a non-respondent.
The donor is generally selected from the group of item respondents that are most similar to the
respondent with the missing value. As Brick and Kalton (1996) note, the number of imputation
classes must be selected carefully since there needs to be at least one donor in each class.
Another hot deck method uses a distance function-based approach (Chen and Shao 2000). In this
approach, a distance function is minimized to identify the “nearest neighbor” from the set of item
respondents. That is, for the jth item non-respondent, the researcher finds the item respondent
that minimizes the distance function (Dj) across all item respondents (Nr):

Nr

Dj =

x

i

xj ,

(3)

i1

defined for a set of auxiliary variables (x) assumed to be related to the variable of interest.
The “nearest neighbor” provides the donor value for the missing value.
Regression-based imputation models involve estimating equation (2) for the item
responses, then using the estimated function to predict the missing values (Durrant 2009). In
deterministic regression imputation, the predicted values are used as the imputed values. In
stochastic regression imputation, an error residual is added to the predicted values to allow for
11

randomization and uncertainty. The residual term can be drawn from a standard zero-centered
distribution (e.g., the normal distribution) with the appropriate standard deviation from the
model, or by drawing from computed residuals from the fitted values for the item responses,
either randomly or for a respondent with similar characteristics. Regardless of the method used
to select residuals, the stochastic approach is generally seen as preferred to the deterministic one
since it maintains the distribution of y. However, this parametric approach is susceptible to
misspecification issues and goodness-of-fit issues.
For this reason, in this paper we focus instead on two simple data imputation methods
and three imputation class approaches. The two simple data imputation approaches involve
replacing missing values with either zero or the mean of item responses (zero imputation and
mean imputation); stochasticity is ignored. These simple approaches are likely to be the most
commonly used in filling in missing data from incomplete questionnaires, and population
estimates with imputed values from these approaches will be compared against those that use
each of three different hot deck imputation approaches. The first hot deck imputation approach
considered uses a small number of auxiliary variables to define respondent classes from which
random donor values are taken to replace those in the same class. We refer to this approach as
the random hot deck imputation. The two other hot deck imputation approaches described use a
nearest neighbor approach. In the deterministic nearest neighbor imputation, the item
respondent corresponding to the minimum value of the distance function associated with a set of
auxiliary variables (i.e., the nearest neighbor) provides the donor value. In the K-nearest
neighbor imputation, a donor value is randomly selected from among the top K nearest
neighbors.

12

An important consideration in adopting an imputation approach is variance estimation
(e.g., Lohr 2010).8 It is well known that standard variance estimation procedures (e.g., Taylorseries approximation, jackknife, and simulation methods) of imputed data will generally
underestimate the true variance. For example, Rao and Shao (1992) discuss how the jackknife
resampling approach to estimating variance leads to a naïve estimator when applied to data
imputation due to the fact that the standard (delete-1) jackknife method does not account for the
variance due to the imputation itself. To remedy this shortcoming, they propose a general
approach for adjusting the jackknife variance estimator so that it does incorporate the imputation
method in the variance calculation. The procedure involves replicating the imputation of values
in each jackknife-replicated dataset. Shao (2002) discusses how the procedure can be extended
to any imputation method. We employ this approach to estimate the variance in this study.

An Empirical Application
To illustrate weighting and data imputation methods, we generate estimates of
population-level totals and means for costs and earnings from data collected in a survey of
saltwater sport fishing charter businesses in Alaska. The Alaska charter boat sector has
undergone significant change in recent years due, at least in part, to regulatory changes in the
management of the Pacific halibut (Hippoglossus stenolepis) sport fishery. To control growth of
the charter sector in the primary recreational charter boat fishing areas off Alaska, a limited entry
program was implemented in 2010 (75 Federal Register 554). In addition, in the past several
years, charter vessel operators in Southeast Alaska (International Pacific Halibut Commission
[IPHC] Area 2C) have been subject to harvest controls that impose both size and bag limits on
the catch of Pacific halibut on guided fishing trips, with these limits being more restrictive than

13

the regulations for non-guided trips (e.g., 78 Federal Register 16425).9 Moreover, a Catch
Sharing Plan (CSP) will be implemented during 2014 that formalizes the process of allocating
catch between the commercial and charter sector and for evaluating changes to harvest
restrictions (78 FR 75843). The CSP allows leasing of commercial halibut individual fishing
quota (IFQ) by eligible charter businesses. Leased halibut IFQ could then be used by charter
businesses to relax harvest restrictions for their angler clients, since the fish caught under the
leased IFQ would not be subject to the charter sector-specific size and bag limits that may be
imposed—though the non-charter sector size and bag limit restrictions (currently two fish of any
size per day) would still apply to charter anglers individually.
The Alaska Saltwater Sport Fishing Charter Business Survey was developed by the U.S.
National Marine Fisheries Service (NMFS) to collect baseline economic information about the
charter sector for use in evaluating the effects of the changing management landscape on the
charter sector and economy. It was developed after extensive input from numerous charter boat
business operators (the target population) in focus groups, in-depth interviews, and meetings
with charter boat associations. The 12-page survey included questions about employment,
services offered to clients, revenues, costs, types of clients served, and other information useful
for classifying responses.
The survey was administered in the first half of 2012 as a repeated mail survey to the
entire population of saltwater sport fishing charter boat businesses actively offering charter
fishing experiences in Alaska during 2011 (650 businesses).10 Thus, statistical sampling
methods were not employed to determine the businesses that would receive the survey—a
complete census was conducted whereby all eligible businesses were contacted to participate; in
this case, the eligible population consisted of 650 saltwater sport fishing charter businesses.11

14

Like many voluntary cost and earning surveys conducted in the fishing sector, this survey
is a good candidate for adjusting for missing data. Despite following a Dillman tailored design
survey administration approach (Dillman, Smyth, and Christian 2009) involving multiple
contacts by mail and telephone, the survey achieved a low overall response rate of approximately
27 percent, or 174 respondents. Thus, 73 percent of the population did not respond to the survey.
The low unit response rate is not a rare outcome among voluntary cost and earnings commercial
fishery surveys (e.g., Fisheries and Oceans Canada 2007; Holland et al. 2012), and is low enough
to trigger concerns about non-response bias. In addition, there were numerous questions with
low item-response rates, with an average item non-response rate of 32 percent across all
questions. That is, the average question in the survey had about 32 percent of respondents leave
the question blank. The low unit response rate and the pervasive item non-response rate suggest
that adjustments must be made for missing data for population-level estimates to be considered
valid.12 Moreover, there was a rich set of auxiliary information available about all charter
businesses in the population that could be utilized to construct weights and impute data.
Our focus here is on the revenue and cost information collected in the survey.
Respondents were asked to provide information on the total revenue earned during the 2011
fishing season across all sources, including direct payments from client fishing trips, payments
received from a booking agent or other service (i.e., broker) for client fishing trips, payments for
non-fishing activities (such as transportation, eco-tours, etc.), and commissions from referrals.
In addition, respondents were asked for the revenue they received from leasing or selling a
charter halibut permit (CHP), which is a federal permit issued to charter businesses required for
Pacific halibut fishing under the limited entry program (75 Federal Register 554). For these

15

revenue categories, the number of item respondents and descriptive statistics are presented in
Table 1.
The survey also included several questions that collected detailed information about
annual expenditures incurred during 2011, including those associated with providing charter boat
services (charter trip operating expenses, such as vessel fuel, fish processing and shipping,
broker fees, vessel cleaning, and supplies); general overhead expenses (non-wage payroll costs,
utilities, repair and maintenance, business insurance, office supplies, etc.); expenses incurred for
vehicles, machinery, and equipment; and payments for buildings, land, and other real estate.
Table 2 presents the descriptive statistics for these expenditure categories for the item
respondents.
This survey was conducted as a census and did not exclude any eligible member of the
population; as such, the base weight (w1) for all individuals in the sample is 1. Importantly, since
the survey in this study is a census of the population, the two other weights considered in this
study are both based on population-level data. Fortunately, in this case a wealth of external
auxiliary information about respondents and non-respondents (and generally the population of
charter boat operators targeted in the survey) is available in the form of saltwater charter logbook
data mandated by the Alaska Department of Fish and Game (ADF&G).13 These data include
information on when, where, and how much charter boat fishing occurred during the year,
including details on the number of clients and trips, fish targeted and harvested, and the
residency of charter clients. The availability of these effort data and the likelihood that they are
correlated to costs and revenues allows us to explore the effects of different weighting and data
imputation methods on population-level estimates of total costs and total revenues.

16

For this paper, we construct weights to account for non-representativeness of the unit
respondents, then apply the five different data imputation methods discussed above to evaluate
differences in population-level total cost and total revenue estimates. Total cost and total
revenue are calculated by summing over the weighted cost and revenue categories, respectively,
after missing data have been imputed.

Results
Table 3 presents a comparison of responding and non-responding charter businesses with
respect to several variables created from the charter logbook data. These auxiliary variables
were selected to capture characteristics of charter businesses in Alaska that varied across the
sector, mainly related to when fishing occurred, the size of the operations, the fish targeted, and
the types of clients. Across the variables, respondents and non-respondents appear fairly similar,
with minor discrepancies in several instances. However, given the number of variables available
for comparing respondents and non-respondents, and to conduct a more in-depth evaluation, we
took an approach similar to Moore and Tarnai (2002) and estimated a logit model to formally
assess differences between respondents and non-respondents. Variables from Table 3 formed the
independent variables, and an alternative-specific constant (ASC) associated with respondents14
was added to capture unmodeled respondent effects. Table 4 presents the model results, which
indicate that the only two variables for which there is a significant difference between
respondents and non-respondents, when holding all else constant, are dummy variables
indicating no fishing was done in the late season (mid-Aug – September) and no fishing was
done in the off-season (October – March): more non-respondents tended to fish in the late and
off-seasons than respondents. Otherwise, there were no statistically significant effects from

17

other variables that may represent differences between respondents and non-respondents. These
results are robust to a variety of specifications tried. Consequently, these two dummy variables
formed the basis for calculating w2, the non-response adjustment weights. Cross-tab frequency
tables for the respondents and for the total sample (respondents and non-respondents) were
constructed. From these, weights were constructed as the ratio of the number of total population
elements to the number of response sample elements in each cell (Table 5).15 The non-response
adjustment weights range from 0.53 to 2.30. The responses provided by those with a weight of
2.30 (businesses not fishing in the late shoulder season, but fishing in the off-season) have over
four times as much weight in the calculation of population estimates as the responses assigned
0.53 (businesses fishing in both the late shoulder and off-season) since the latter group was
overrepresented in the responding sample relative to the former.
The post-stratification weight (w3) addresses non-coverage bias in the sample that may
result because the sample does not include a sufficient representation of the population in
relation to one or more key variables. In this case, the principal dimension to control for in poststratification is the size of the charter business, which we defined as the number of client fishing
trips reported. Another potential post-stratification dimension would be the region in which
charter-based fishing for halibut occurred (IPHC Area 2C or Area 3A). In Table 6, w3 is
calculated as a simple post-stratification weight based on client-only trips (denoted weight A),
and, alternatively, on both the fishing region and client-only trips (denoted weight B). Note that
the range of post-stratification weights, regardless of the assumption, is much smaller than for
the non-response adjustment weight, with weights ranging from 0.78 to 1.20 for weight A, and
0.73 to 1.45 for weight B. This suggests that, at most, some observations will contribute about

18

twice as much weight as others. The total weight for each respondent was determined using
equation (1).
For the hot deck imputation methods, we again rely on the charter logbook data to
provide the auxiliary information necessary for these imputation approaches. In the random hot
deck imputation, we set up three respondent classes based on the size of the charter business
(which is likely linked closely to the revenues and costs), proxied by the total number of client
trips in 2011. The respondent classes were (a) fewer than 200 trips, (b) between 201 and 400
trips, and (c) more than 400 trips.16 Within these classes, donor values were randomly selected
from among the item respondents. For the two nearest neighbor hot deck imputations, eight
variables from the charter logbook data were used in equation (3) to evaluate the closeness of
item respondents to each item non-respondent to determine the best candidate to provide the
donor value. These eight variables were the following: a dummy variable indicating whether
fishing occurred in Southcentral Alaska (IPHC Area 3A), the number of guides used, the number
of calendar days fished, the total client fishing trips, a dummy variable indicating crew fishing
trips were taken, a dummy variable indicating some unpaid fishing trips were taken during the
season, the number of hours spent fishing for Pacific salmon, and the number of hours spent
fishing for bottomfish (including Pacific halibut).17 The K-nearest neighbor algorithm we use
assumes K = 3.
For each weighting assumption (no weighting, weight A, and weight B) and data
imputation method (zero imputation, mean imputation, random hot deck imputation,
deterministic nearest neighbor imputation, and K-nearest neighbor imputation), the populationlevel total expenditures and total revenues are calculated. These estimates are the weighted sum
over all the expenditure and revenue categories, respectively, and are presented in Table 7.

19

Standard errors of these totals are calculated according to the adjusted jackknife variance
estimation procedure (Rao and Shao 1992).
The results indicate that regardless of the weighting approach used, the zero imputation
method always led to the lowest estimates and the mean imputation method always resulted in
the highest estimates across the imputation methods. When no weighting is applied to adjust for
non-response and post-stratification, total revenues range from a low of $101 million (s.e. =
$1.93 million) with the zero imputation method to a high of $155 million (s.e. = $2.62 million)
with the mean imputed values, while the total costs had a low of $118 million (s.e. = $1.79
million) and a high of $194 million (s.e. = $3.00 million), again associated with the zero and
mean imputation methods, respectively. Weighting only by where fishing was done (weight A)
led to a lower estimates of total revenue, $90 million (s.e. = $1.71 million) for zero imputation
and $144 million (s.e. = $2.39 million) for mean imputation, compared to the estimates with no
weighting. Similarly, total expenditures estimates of $110 million (s.e. = $1.62 million) under
zero imputation and $186 million (s.e. = $2.84 million) for mean imputation are smaller than the
corresponding no weighting estimates. When weighting by both the region where fishing
occurred and by the amount of fishing done (weight B), the estimates for the zero and mean
imputation approaches are, somewhat surprisingly, almost identical to those for the case without
weights.
Among the hot deck imputation method results, the random hot deck imputation
estimates are always lower than the nearest neighbor-based estimates. When no weighting is
applied, the total revenue estimate using the random hot deck imputation approach is $127
million (s.e. = $8.27 million) and the total expenditure estimate is $169 million (s.e. = $5.90
million). When weighting by weight A, the estimates are lower, with a total revenue estimate of

20

$114 million (s.e. = $7.05 million) and a total expenditure estimate of $155 million (s.e. = $5.16
million). Again, the total revenue and expenditure estimates under the weight B assumption are
very similar to the unweighted estimates.
The deterministic nearest neighbor and K-nearest neighbor imputation estimates are
similar to one another, regardless of the weighting assumption. With no weighting, the
deterministic revenue and expenditure estimates are $143 million (s.e. = $0.2.65 million) and
$174 million (s.e. = $2.58 million), while the stochastic (K-nearest neighbor) estimates are
almost identical, differing primarily in the standard error estimates, which are larger due to the
randomness incorporated into the procedure ($4.31 million and $6.93 million, respectively).
Under the weight A assumption, the total revenue and expenditure estimates from both are
slightly higher than the random hot deck imputation estimates (about $127 million and $162
million, respectively). Under the weight B assumption, total revenue and expenditure estimates
are $139 million and $174 million, respectively, with standard errors that follow the same pattern
as noted above.

Discussion
To formally assess differences between the estimates calculated under the different
weighting and data imputation methods, we calculate the 90% confidence intervals for the
difference in estimates using the method of convolutions approach (Poe, Giraud, and Loomis
2005), a computationally-intensive approach that gives precise estimates for estimating the
difference between two independent empirical distributions. In our case, we use the jackknife
replications from the adjusted jackknife variance estimation to generate the empirical distribution
of pairwise differences. Confidence intervals containing zero suggest no statistical difference

21

between the totals. Comparing the estimates across data imputation methods, but holding the
weighting method constant, the results indicate that across all weighting assumptions, the zero
imputation estimates are statistically lower than all other estimates and the mean imputation
estimates are statistically larger than the other estimates (Table 8). Additionally, there is no
statistical difference between the random hot deck and nearest neighbor imputation estimates.
These patterns were consistent between the total revenue and total expenditure estimates.18
The first finding that the zero imputed values are lowest and mean imputed values are
largest is unsurprising given that the simple imputation methods do not use additional
information to determine the best value to replace missing values and instead assign the same
value to all missing values. Therefore, zero imputation will always lead to the lowest estimates
(assuming values cannot be negative) since the other methods will assign at least some non-zero
values to item non-respondents. And, if the distribution of item responses is not fairly uniform
and the mean is influenced by several large values, we would expect the mean imputation-based
estimates to be larger than the other methods.
The data imputation alternatives used in this application, namely the random hot deck,
deterministic nearest neighbor, and K-nearest neighbor methods, yield statistically similar
estimates. This in itself does not provide clear guidance on the best data imputation to use for
these data. However, we argue that in this case, the K-nearest neighbor approach is the preferred
one. This is largely due to two factors, one endemic to this application, and another more
general reason. First, in this application, there is a wealth of auxiliary data likely to shed light on
several important characteristics of each charter business, both for the item respondents and nonrespondents. Since all variables of interest are likely to be correlated to some degree with the
size of the operation, where and when the business operates, and other information available in

22

the charter logbook data, we are able to draw from a number of candidate variables in identifying
good donor values from among item respondents for a given item non-respondent.
Given that the random hot deck imputation approach does not use the full set of auxiliary
information (due to the need to keep dimensionality low so imputation classes containing a
minimum number of donor values can be identified), the nearest neighbor imputation approaches
stand to identify better donor values since they use more auxiliary data in the imputation process.
In comparing the nearest neighbor imputation approaches, a second factor comes into play.
Recall that the difference between the two nearest neighbor methods used in this paper is that
one selects the donor value associated with the one item respondent that is closest to the one with
the missing value based on criteria embodied in the distance function in equation (3), and the
other randomly selects from the top K (in this case three) nearest neighbors. Selecting randomly
from among several different nearest neighbors will minimize the potential impact of outliers
being used as donors. As a result, stochastic imputation methods, such as the K-nearest neighbor
imputation approach, are generally preferred over deterministic ones. For these reasons, the total
revenue and total expenditure estimates that used the K-nearest neighbor imputation to deal with
item non-response are likely to be the most appropriate estimates for use by policymakers.
The method of convolutions comparisons also were used to evaluate the effect of
weighting for a given data imputation approach (Table 9). Even though estimates of total
revenues and expenditures are similar between the unweighted and weight B estimates in our
empirical application across all data imputation methods, this is coincidental. The weight B
assumption is based on post-stratifying on both the region in which fishing occurred and the
number of client fishing trips during 2011 (and embodies the non-response adjustment weights
as well). The range of the post-stratification weights suggests post-stratification weighting has a

23

moderate marginal effect since the range of the weights is not large, yet they are different from
one. Statistically significant differences between total estimates under the no weighting and
weight A assumptions (except in the case of random hot deck imputation) provide further
evidence that weighting assumptions affect estimates. Similar differences were found between
estimates assuming weight A and those assuming weight B across the data imputation methods.
Thus, it is clear that weighting matters, and an argument can certainly be made for the weight B
estimates to be preferred over the other estimates due to those estimates ensuring the sample
matches with several key population-level variables.
In our application, the survey was conducted as a census of the population, where each
member of the population was included. This negated any need to adjust the sample for the
sampling methods used, thus removing one of the several factors that are often adjusted for with
weighting. As a result, the weighting in our application was perhaps not as pronounced as it
would be when individual weights are also adjusted for sampling methods in other cases.
In general, the selection of the weighting and data imputation methods used to adjust for
missing data in a given survey will depend upon the availability, quality, and completeness of
auxiliary data. In this application, we had a large amount of auxiliary information about the
survey population that enabled us to employ a variety of weighting and data imputation
approaches to deal with both unit and item non-response since the data reflected key
characteristics of the population that could be related to the variables in the survey with missing
data. This is not always the case. However, even for populations with no external sources of
information collected about them, follow-up surveys that gather information on a few key
characteristics (e.g., Arlinghaus, Bork, and Fladung 2008; Sutton and Ditton 2005) can be used
to assess non-response bias and developing adjustment weights, if necessary.19 Moreover,

24

although not done in this paper, data imputation methods can utilize data from other questions in
the survey itself as auxiliary data instead of, or in addition to, auxiliary data from an external
source, provided there are questions in the survey that are likely related to the variable of interest
with missing data. In this case, the survey data used may itself have missing values, which raises
questions about how to utilize it in the data imputation procedure. One possible way to address
this problem is presented by Brick and Kalton (1996) who discuss multivariate imputation, an
iterative procedure of repeated data imputation across multiple questions with item non-response
that continues until a convergence criterion is met.
We have discussed the use of weighting and data imputation methods in the context of a
cost and earnings survey, but the methods are applicable to other types of economic and social
science surveys, including surveys of anglers, stakeholders, communities, and other fishery
participants, as well as the general public. It is important to stress that the methods are only
useful for surveys that are censuses of populations or utilize probability samples, samples drawn
from the population in which the sampled elements have a known probability of being selected.
These types of surveys, with appropriate adjustments when necessary to address missing data or
sampling issues, can be used to draw inferences about the population. Convenience samples are
not uncommon among fisheries surveys, but by construction are not representative of the
population. As a result, the methods described in this paper cannot be employed with those
surveys to adjust the sample so that it is more representative of the population and capable of
drawing population inferences.
Finally, from a policy perspective, our empirical results suggest that as a whole, the
Alaska charter sector operated at a net loss during 2011. This is consistent with anecdotal
evidence from charter boat operators and is supported by the fact that fewer charter businesses

25

were in operation during 2012 than in 2011.20 Since survey-based cost and revenue estimates
such as these are used as inputs in fishery regional economic impact models (e.g., Lew and
Seung 2010) and policy decisions, improving the accuracy of these estimates is important since
any biases they embody may be amplified in subsequent analyses.

Conclusion
Missing data are persistent in economic and social surveys in fisheries, but are rarely
accounted for when analyzing and presenting the results from these studies. From a
methodological perspective it is clear that when auxiliary data are available, ignoring missing
data in fisheries surveys is unlikely to be an optimal strategy, and will often lead to results that
are biased. As shown in this paper, there are several straightforward methods researchers can
apply in the analysis of survey data that will correct for these missing data and lead to improved
population estimates. We have described and illustrated the application of weighting to adjust a
sample of respondents to better reflect the population, and several data imputation approaches to
deal with missing data in individual questions, in a survey of charter fishing businesses in
Alaska. The use of these methods in fisheries research enables survey researchers to provide
useful information from surveys for which unit and item non-response are issues, as well as to
improve estimates based on these data that can be used by fishery managers.
Research on dealing with missing data continues to be an active area, and several other
recent methods that were not covered here have been proposed that may prove useful for
addressing missing data in fisheries survey. For example, Rubin (1996) advocates multiple
imputation, a Bayesian approach that uses repeated trials of the imputation process as a way of
estimating population mean and variance estimates that minimize mean-squared error. A

26

comparison of this and other recent methods to the comparatively simpler approaches presented
here are left for future research.
The focus of this paper has been on presenting several ways of dealing with missing data.
However, in closing, it should be emphasized that the best situation is one in which there are no
missing data, or at least there are minimal missing data. To this end, it is worth emphasizing that
survey researchers should endeavor to minimize the potential for missing data by maximizing
response rates through best practices in survey design, sampling, and implementation (e.g.,
Dillman, Smyth, and Christian 2009). In this way, the need to employ the methods described in
this paper may be minimized, though researchers should nevertheless endeavor to assess the need
to employ these types of methods with their survey data whenever response rates fall below
100%.

27

References
Andridge, R.R., and R.J.A. Little. 2010. A Review of Hot Deck Imputation for Survey NonResponse. International Statistical Review 78(1):40-64.
Arlinghaus, R., M. Bork, and E. Fladung. 2008. Understanding the Heterogeneity of
Recreational Anglers Across an Urban-Rural Gradient in a Metropolitan Area (Berlin,
Germany), with Implications for Fisheries Management. Fisheries Research 92:53-62.
Bacalso, R.T.M., J.V. Juario, N.B. Armada. 2013. Fishers’ Choice of Alternative Management
Scenarios: A Case Study in the Danajon Bank, Central Philippines. Ocean and Coastal
Management 84:40-53.
Battaglia, M.P., D.C. Hoaglin, and M.R. Frankel. 2009. Practical Considerations in Raking
Survey Data. Survey Practice 2(5):1-10.
Beardmore, B., W. Haider, L.M. Hunt, and R. Arlinghaus. 2011. The Importance of Trip
Context for Determining Primary Angler Motivations: Are More Specialized Anglers More
Catch-Oriented Than Previously Believed? North American Journal of Fisheries Management
31:861-879.
Bethlehem, J.G. 2002. Weighting Nonresponse Adjustments Based on Auxiliary Information.
Chapter 18 in in Groves, R.M., D.A. Dillman, J.L. Eltinge, and R.J.A. Little (eds.). Survey
Nonresponse. New York: John Wiley and Sons.
Brick, J.M., and G. Kalton. 1996. Handling Missing Data in Survey Research. Statistical
Methods in Medical Research 5:215-238.
Chen, J., and J. Shao. 2000. Nearest Neighbor Imputation for Survey Data. Journal of Official
Statistics 16(2):113-131.
Connelly, N.A., T.L. Brown, and D.J. Decker. 2003. Factors Affecting Response Rates to
Natural Resource-Focused Mail Surveys: Empirical Evidence of Declining Rates Over Time.
Society and Natural Resources 16(6):541-549.
de Leeuw, E., and W. de Heer. 2002. Trends in Household Survey Nonresponse: A
Longitudinal and International Comparison. Chapter 3 in Groves, R.M., D.A. Dillman, J.L.
Eltinge, and R.J.A. Little (eds.). Survey Nonresponse. New York: John Wiley and Sons.
Dillman, D.A., J.D. Smyth, and L.M. Christian. 2009. Internet, Mail, and Mixed-Mode Surveys:
The Tailored Design Method, 3rd edition. Hoboken, New Jersey: John Wiley and Sons.

28

Dolsen, D.E., and G.E. Machlis. 1991. Response Rates and Mail Recreation Surveys – How
Much is Enough. Journal of Leisure Research 23(3):272-277.
Durrant, G.B. 2009. Imputation Methods for Handling Item-Nonresponse in Practice:
Methodological Issues and Recent Debates. International Journal of Social Research
Methodology 12(4):293-304.
Filion, F.L., 1976. Exploring and Correcting for Nonresponse Bias Using Follow-Ups of NonRespondents. Pacific Sociological Review 19(3):401-408.
Fisher, M.R. 1996. Estimating the Effect of Nonresponse Bias on Angler Surveys.
Transactions of the American Fisheries Society 125(1):118-126.
Fisher, M.R. 1997. Segmentation of the Angler Population by Catch Preference, Participation,
and Experience: A Management-Oriented Application of Recreation Specialization. North
American Journal of Fisheries Management 17(1):1-10.
Fisheries and Oceans Canada. 2007. Cost and Earnings Survey 2007: Atlantic Region Report.
Report of the Economic Analysis and Statistics, Policy Sector. Ottawa, Ontario. 72 pages.
Available at: http://www.dfo-mpo.gc.ca/stats/commercial/ces/CES2004_e.pdf.
Groves, R.M., 2006. Nonresponse Rates and Nonresponse Bias in Household Surveys. Public
Opinion Quarterly 70(5):646-675.
Groves, R.M., F.J. Fowler, Jr., M.P. Couper, J.M. Lepkowski, E. Singer, and R. Tourangeau.
2004. Survey Methodology. New York: John Wiley & Sons.
Hindsley, P., C.E. Landry, and B. Gentner. 2011. Addressing Onsite Sampling in Recreation
Demand Models. Journal of Environmental Economics and Management 62:95-110.
Holland, S.M., C-O. Oh, S.L. Larkin, and A.W. Hodges. 2012. For-Hire Fishing Fleets of the
South Atlantic States and the Atlantic Coast of Florida. University of Florida. Report to the
National Marine Fisheries Service. Grant Number NAO9NMF4330151.
Holt, D., and T.M.F. Smith. 1979. Post-stratification. Journal of the Royal Statistical Society
142(1):33-46.
Hunt, K.M., and R.B. Ditton. 2002. Freshwater Fishing Participation Patterns of Racial and
Ethnic Groups in Texas. North American Journal of Fisheries Management 22(1):52-65.
Iannacchione, V.G., J.G. Milne, and R.E. Folsom. 1991. Response Probability Weight
Adjustments Using Logistic Regression. Proceedings of the Section on Survey Research
methods, American Statistical Association 1991:637-42.
Knapp, G. 1996. Alaska Halibut Captains’ Attitudes Toward IFQs. Marine Resource
Economics 11:43-55.
29

Knapp, G. 1997. Initial Effects of the Alaska Halibut IFQ Program: Survey Comments of
Alaska Fishermen. Marine Resource Economics 12:239-248.
Lee, H., E. Rancourt, and C.E. Sarndal. 2002. Variance Estimation from Survey Data Under
Single Imputation. Ch. 21 in Groves, R.M., D.A. Dillman, J.L. Eltinge, and R.J.A. Little (eds.).
Survey Nonresponse. New York: John Wiley and Sons.
Lew, D. K. and C.K. Seung, 2010. The Economic Impact of Saltwater Sportfishing Harvest
Restrictions in Alaska: An Empirical Analysis of Non-Resident Anglers. North American
Journal of Fisheries Management 30:538-551.
Little, R.J.A. 1986. Survey Nonresponse Adjustments for Estimates of Means. International
Statistical Review 54(2):139-157.
Little, R.J.A. 1988. Missing-Data Adjustments in Large Surveys. Journal of Business
Economics and Statistics 6(3):287-296.
Little, R.J.A., and D.B. Rubin. 1989. The Analysis of Social Science Data with Missing Values.
Sociological Methods and Research 18(2-3):292-326.
Little, R.J., and S. Vartivarian. 2003. On Weighting the Rates in Non-Response Weights.
Statistics in Medicine 22:1589-1599.
Lohr, S.L. 2010. Sampling: Design and Analysis, 2nd Edition. Boston: Brooks/Cole, Cengage
Learning.
Margenau, T.L., and J.B. Petchenik. 2004. Social Aspects of Muskellunge Management in
Wisconsin. North American Journal of Fisheries Management, 24:82-93.
Micklewright, J., S. V. Schnepf, C. Skinner. 2012. Non-response Biases in Surveys of
Schoolchldren: The Case of the English Programme for International Student Assessment
(PISA) Samples. Journal of the Royal Statistical Society A. 175:915-938.
Moore, D.L., and J. Tarnai. 2002. Evaluating Nonresponse Error in Mail Surveys. Chapter 13
in Survey Nonresponse (R.M. Groves, D.A. Dillman, J.L. Eltinge, and R.J.A. Little, eds). New
York: John Wiley and Sons. 197-211.
Poe, G.L., K.L. Giraud, and J.B. Loomis. 2005. Computational Methods for Measuring
Differences of Empirical Distributions. American Journal of Agricultural Economics 87(2):353365.
Rao, J.N.K., and J. Shao. 1992. Jackknife Variance Estimation with Survey Data Under Hot
Deck Imputation. Biometrika 79(4):811-822.

30

Rea, L.M., and R.A. Parker. 2005. Designing and Conducting Survey Research: A
Comprehensive Guide. San Francisco: Jossey-Bass.
Rubin, D.B. 1996. Multiple Imputation After 18+ Years. Journal of the American Statistical
Association 91(434):473-489.
Schenker, N., T.E. Raghunathan, P. Chiu, D.M. Makuc, G. Zhang, and A.J. Cohen. 2006.
Multiple Imputation of Missing Income Data in the National Health Interview Survey. Journal
of the American Statistical Association 101(475):924-933.
Shao, J. 2002. Replication Methods for Variance Estimation in Complex Surveys with Imputed
Data. Chapter 20 in Survey Nonresponse (R.M. Groves, D.A. Dillman, J.L. Eltinge, and R.J.A.
Little, eds). New York: John Wiley and Sons. 303-314.
Sutton, S.G., and R.B. Ditton. 2005. The Substitutability of One Type of Fishing for Another.
North American Journal of Fisheries Management 25(2):536-546.
Thomson, C. J. 1991. Effects of Avidity Bias on Survey Estimates of Fishing Effort and
Economic Value. American Fisheries Society Symposium 12:356-366.
Tseng, Y-P, Y-C. Huang, and R. Ditton. 2012. Developing a Longitudinal Perspective on the
Human Dimensions of Recreational Fisheries. Journal of Coastal Research 28(6):1418-1425.

31

Table 1.
2011 Annual Revenues ($)– Survey statistics for item respondents
Categories
Mean
Median Std Dev
Item
respondents
Charter fishing trips – direct payments from 162,601 46,900
570,974
133
clients
Charter fishing trips – payments from
24,141
5,800
36,934
78
booking agent or service
Non-fishing charter trips
26,500
2,000
68,521
83
Client referrals/booking commissions
7,796
0
24,637
61
Federal charter halibut permit sales income
16,541
0
99,059
58

32

Table 2.
2011 Annual expenditures ($) – Survey statistics for item respondents
Mean
Median Std Dev
Item
Categories
respondents

Vessel fuel

21,791

10,000

33,641

146

Fish processing and shipping

4,592

200

16,193

100

Referral fees

5,145

0

13,334

89

Vessel cleaning

15,012

125

112,034

99

Supplies (e.g, ice, bait)

13,123

3,000

75,153

139

Other vessel or trip operating expenses

7,122

2,160

18,689

84

Non-wage payroll costs (e.g., health insurance)

10,207

0

44,799

94

Utilities (e.g., telephone, internet)

4,383

2,000

6,623

132

Repair and maintenance expenses

11,677

4,650

23,655

136

Insurance (vessel, property & indemnity, liability)

8,078

2,950

15,602

144

Travel, meals, and entertainment

5,642

2,005

13,206

109

Office and general supplies

2,503

692

5,784

126

Legal and professional services, accounting,
and advertising

5,974

1,190

22,683

126

Financial service fees and mortgage interest
payments

15,445

2,200

46,634

109

Taxes and licensing fees

3,613

1,014

5,952

137

Vehicle fuel costs

3,089

1,079

8,519

120

Other general overhead expenses

20,715

2,483

62,889

92

Vessel(s) and vessel-related equipment

23,888

5,000

60,347

88

Vehicles (car/truck)

2,635

0

6,709

62

Fishing gear, tackle, safety equipment

3,267

1,200

5,797

97

Other machinery and equipment

2,107

300

4,731

66

Moorage/slip, boatyard and equipment
storage space

2,971

1,500

4,291

111

Office space, lodging, and other shore-side
facilities

13,942

614

49,967

68

Transferable fishing permits and licenses

3,545

0

13,157

68

Other business-related property and assets

32,952

0

144,061

60

33

Table 3.
Comparison of Respondents to Non-respondents
Variable
Did not fish in Southeast Alaska
Only used a single guide
Only used a single vessel
Took 50 trips or less
Fished 50 calendar days or less
Did not fish in early shoulder season (April to
mid-June)
Did not fish in late shoulder season (midAugust through September)
Did not fish in the off-season (October
through March)
Did not report any crew fishing trips
Reported no Alaska resident clients
Proportion of clients that are Alaska residents
250 or fewer clients
1,000 or more clients
Did not report any non-paid trips
Did not report fishing for salmon
Did not report fishing for bottomfish (incl.
Pacific halibut)

34

All
50.2%
58.9%
75.0%
55.1%
58.3%

Respondents
51.7%
59.2%
71.8%
51.1%
55.7%

Nonrespondents
49.7%
58.8%
76.1%
56.4%
59.2%

27.3%

25.3%

28.0%

21.9%

16.1%

23.9%

93.8%
42.6%

90.2%
37.9%

95.1%
44.2%

22.0%
13.9%
58.9%
5.7%
47.4%
7.9%

19.0%
14.5%
57.5%
6.3%
43.1%
7.5%

23.1%
13.7%
59.4%
5.5%
48.9%
8.1%

10.3%

8.6%

10.8%

Table 4.
Logit Model Results to Evaluate Factors Affecting Response Propensity
Asymptotic tVariable
Estimate
value
Alternative specific constant (respondent)
-0.1476
-0.3450
Dummy: No fishing was done in SE Alaska
-0.1901
-0.7466
Dummy: One guide only
0.2637
1.1011
Dummy: One vessel only
-0.3034
-1.0871
Dummy: Total trips fished 50 or less
-0.6132
-1.1821
Dummy: Total days fished 50 or less
0.4158
0.8049
Dummy: No trips in early season (April - mid-June)
0.0000
0.0000
Dummy: No trips in late season (mid-Aug - September)
-0.5124*
-1.8574
Dummy: No trips in off-season (October - March)
-0.7710**
-2.1189
Dummy: No crew fishing trips taken
-0.1900
-0.8679
Dummy: no resident clients
-0.0822
-0.2932
Percent of clients that are Alaska residents
0.4003
0.5724
Dummy: Total clients 250 or less
0.4052
1.1594
Dummy: Total clients 1000 or more
-0.0196
-0.0465
Dummy: No non-paid trips
-0.1127
-0.4959
Dummy: No salmon fishing
0.1934
0.5356
Dummy: No bottomfishing
-0.0778
-0.2347
Mean log-likelihood value
-0.5567
Likelihood ratio index
0.1969
Akaike’s information criterion (corrected)
793.1019
Bayes information criterion
869.0794
*

= Statistically different from zero at the 10% level
= Statistically different from zero at the 5% level

**

35

Table 5.
Non-response adjustment weights (w2)
No late/off season fishing 
No late shoulder season or off-season fishing 
No late shoulder season fishing, but some off-season fishing 
Late shoulder season fishing, but no off-season fishing 
Both late shoulder season and off-season fishing 

36

Weight (w2) 
1.3248 
2.2996 
0.9808 
0.5270 

Total client trips 
100 or less 
101-200 
201-300 
301-400 
401-500 
501-1000 
1001-7000 

Table 6.
Post-stratification weights (w3)
Weight A
Weight B
Fish in Southcentral Fish in Southeast
Alaska (Area 3A) 
Alaska (Area 2C) 
Any area 
1.0859
1.0977
1.0749
1.1958
1.1400
1.2562
0.7756
0.7836
0.7665
0.9238
1.2009
0.7506
0.9756
0.7985
1.4479
0.9920
0.7410
1.3505
0.9059
0.7300
1.2137

37

Table 7.
Population Estimates of Total Annual Revenue and Expenditure (in $million)

Imputation method
Zero imputation
Mean imputation
Random class hot deck imputation
Deterministic nearest neighbor imputation
K-nearest neighbor hot deck imputation

Imputation method
Zero imputation
Mean imputation
Random class hot deck imputation
Deterministic nearest neighbor imputation
K-nearest neighbor imputation

Imputation method
Zero imputation
Mean imputation
Random class hot deck imputation
Deterministic nearest neighbor imputation
K-nearest neighbor imputation

No weighting
Total revenue
Total expenditure
Total
Std Err
Total
Std Err
101.47
1.93
118.30
1.79
154.64
2.62
193.94
3.00
126.67
8.27
168.77
5.90
142.66
2.65
174.16
2.58
142.81
4.31
176.64
6.93
Weight A
Total revenue
Total expenditure
Total
Std Err
Total
Std Err
90.17
1.71
109.87
1.62
144.19
2.39
186.11
2.84
113.87
7.05
154.59
5.16
126.91
2.21
162.33
2.29
128.32
3.96
162.60
6.03
Weight B
Total revenue
Total expenditure
Total
Std Err
Total
Std Err
101.27
2.26
119.76
2.05
155.86
2.92
196.71
3.26
124.95
8.00
165.64
5.74
139.28
2.71
174.44
2.75
139.33
4.22
174.66
6.87

38

Table 8.
90% Confidence Intervals for Difference in Totals (Value 1 – Value 2) for Weight B Estimates (in $million)
Value 1
Imputation Method
Zero imputation
Zero imputation
Zero imputation
Zero imputation
Mean imputation
Mean imputation
Mean imputation
Random hot deck
Random hot deck
Nearest neighbor

Value 2
Imputation Method
Mean imputation
Random hot deck
Nearest neighbor
K-nearest neighbor
Random hot deck
Nearest neighbor
K-nearest neighbor
Nearest neighbor
K-nearest neighbor
K-nearest neighbor

Total Revenue
Lower Bound
Upper Bound
-56.20
-45.83
-40.09
-40.73
8.09
13.49
13.64
-15.19
-14.03
-3.21

-52.18
-23.05
-33.65
-26.57
31.67
20.91
27.85
8.60
13.56
11.49

39

Total Expenditure
Lower Bound
Upper Bound
-80.00
-57.01
-58.46
-65.04
19.16
17.26
11.33
-16.88
-20.59
-10.88

-73.42
-38.52
-49.49
-43.67
38.87
27.63
33.44
3.18
7.06
11.38

Table 9.
90% Confidence Intervals for Difference in Totals (Value 1 – Value 2) Under Different Weighting Assumptions (in $million)

Imputation
Method
Zero imputation
Zero imputation
Zero imputation
Mean imputation
Mean imputation
Mean imputation
Random hot deck
Random hot deck
Random hot deck
Nearest neighbor
Nearest neighbor
Nearest neighbor
K-nearest neighbor
K-nearest neighbor
K-nearest neighbor

Value 1

Value 2

Weighting
Assumption

Weighting
Assumption

None
None
Weight A
None
None
Weight A
None
None
Weight A
None
None
Weight A
None
None
Weight A

Weight A
Weight B
Weight B
Weight A
Weight B
Weight B
Weight A
Weight B
Weight B
Weight A
Weight B
Weight B
Weight A
Weight B
Weight B

Total Revenue

Total Expenditure

Lower Bound

Upper Bound

Lower
Bound

Upper
Bound

9.71
-1.46
-12.54
8.07
-3.69
-14.00
-4.14
-16.40
-29.60
12.04
-0.44
-16.07
4.65
-7.05
-20.83

12.74
1.36
-10.01
12.71
1.21
-9.30
33.81
22.34
5.84
19.44
7.83
-8.01
23.44
11.95
-2.37

5.68
-4.25
-12.57
4.11
-6.58
-14.39
-0.38
-12.49
-24.25
6.31
-5.87
-17.62
-0.35
-14.45
-27.23

11.07
1.16
-7.29
11.57
1.12
-6.75
25.50
14.48
1.02
17.34
6.12
-5.85
27.82
15.09
0.32

40

Footnotes
1

Other potential biases may also occur in the selection of the sample, such as coverage bias.

2

For this paper, we set aside the bias that may arise from poor survey design, which may lead to

measurement bias.
3

In the broader economics literature, many economic surveys selectively address missing data,

particularly with respect to key economic variables such as income or wages, which are
frequently skipped questions by respondents. See Little (1988) for an exception.
4

Other adjustment weights may be possible, but the three discussed here are most common.

5

A related weight sometimes seen in the recreational fishing literature corrects for avidity bias,

the propensity to get a disproportionate number of avid anglers in the sample when using
intercept sampling methods (Thomson 1991). Hindsley, Landry, and Gentner (2011) discuss
weighting for avidity bias and for endogenous stratification associated with the non-random onsite sampling employed with the NMFS Marine Recreational Fisheries Statistics Survey.
6

As an example of an alternative approach that does not use information about non-respondents,

see Filion (1976) who assesses non-response by analyzing early and late responders.
7

When multiple variables are important, post-stratification weighting may not be desirable. An

alternative method, called raking (Battaglia, Hoagland, and Frankel 2009), or sample balancing,
can be used. However, in our case, post-stratification is sufficient given that only one primary
variable is selected for adjusting the sample.
8

See also Lee, Rancourt, and Sarndal (2002) and Chen and Shao (2000).

9

The other main area of Alaska in which saltwater fishing for Pacific halibut occurs is

Southcentral Alaska (IPHC Area 3A), an area that includes the Cook Inlet region, Kodiak Island,

41

and the Prince William Sound. Harvest restrictions have not been imposed on charter fishing in
this area to date.
10

The original population frame included 17 businesses that did not engage in any client-based

fishing during 2011 and were subsequently excluded from the analysis.
11

NMFS plans to re-administer the survey to collect data for additional fishing seasons, which

will enable an evaluation of changes in the charter sector over time.
12

To our knowledge, there appears to be no consensus on a specific item or unit non-response

rate that would trigger the need for weighting or imputation-based adjustments. Thus, it is the
responsibility of individual researchers to assess the extent of missing data in survey studies and
document the extent to which non-response bias may be a concern. In our empirical case, the
low item and unit response rates suggested further investigation and adjustment.
13

Details about the program can be found at

http://www.adfg.alaska.gov/index.cfm?adfg=prolicenses.logbook.
14

The ASC is a dummy variable assigned to respondents only.

15

Note that it is possible to estimate predicted probabilities of responding from the logit model to

generate weights for w2. However, since the logit model with the two variables found to
influence response propensity did not have a high likelihood ratio index (a pseudo-R2 measure),
using predicted values for the Pr(response) does not seem warranted. However, if there had been
a large number of significant variables that differed between respondents and non-respondents,
using the logit model to predict non-response adjustments weights would have been appropriate.
16

Respondent classes were selected to ensure a sufficient number of donor values were

contained in each class across revenue and cost categories.
17

The non-dummy variables were normalized by the maximum values observed in the data.
42

18

Since these results are qualitatively invariant across weighting assumptions, Table 8 presents

only the 90% method of convolutions-based confidence intervals for the difference in total
revenues and total expenditures for each data imputation method using the weight B assumption.
19

As an example of an alternative approach that does not use information about non-

respondents, see Filion (1976) who assesses non-response by analyzing early and late
responders.
20

ADF&G charter logbook data indicates activity by 627 businesses in 2012, suggesting some

businesses active in 2011 were inactive or exited the fishery in 2012.

43


File Typeapplication/pdf
File TitleMicrosoft Word - Lew_Himes-Cornell_Lee2014_weighting_data_imputation_paper_072214.docx
AuthorDan.Lew
File Modified2014-07-22
File Created2014-07-22

© 2024 OMB.report | Privacy Policy