Memo

AHS-NRB-Summary-Memo-Draft.pdf

2021 American Housing Survey (AHS)

Memo

OMB: 2528-0017

Document [pdf]
Download: pdf | pdf
DRAFT: Nonresponse Bias in the American Housing
Survey 2015-2019

Prepared for
Office of Policy Development and Research
U.S. Department of Housing and Urban Development
Prepared by
Office of Evalua on Sciences
U.S. General Services Administra on

Last Updated: September 25, 2020

Contents
1 Introduc on
1.1 A note on terminology and method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Informing experiments to reduce nonresponse bias . . . . . . . . . . . . . . . . . . . .

3
4
6

2 Evidence of Nonresponse Bias in the AHS
7
2.1 Comparing 2015 AHS Sample Es mates to the 2010 Census: Na onal-Level Analysis 7
2.2 Chi-square tests of differences between responders and nonresponders . . . . . . . . 9
2.3 Representa vity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Sec on Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Predic ng Nonresponse and Refusal
15
3.1 How well can we predict nonresponse and refusal? . . . . . . . . . . . . . . . . . . . . 20
3.2 Top predictors of nonresponse and refusal . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Sec on Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Pa
4.1
4.2
4.3

erns of Par al Response
27
Characterizing item-level missingness: item’s content versus item’s order . . . . . . . 27
Predic ng panel a ri on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Sec on Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Consequences of Nonresponse
38
5.1 How panel a ri on affects correla onal analysis . . . . . . . . . . . . . . . . . . . . . . 39
5.2 How nonresponse affects metro-level es mates . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Sec on Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A Appendix
A.1 Addi onal results from the chi-squared analysis . . . . . . . . . . . . . .
A.2 Addi onal results from R indicator analysis . . . . . . . . . . . . . . . . .
A.3 Addi onal results from the predic ng nonresponse and refusal analysis
A.4 Item order effects: addi onal analyses . . . . . . . . . . . . . . . . . . . .
A.5 Predic ng panel a ri on: addi onal analyses . . . . . . . . . . . . . . . .
A.6 A ritor heterogeneity: addi onal analyses . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

45
45
47
47
50
50
52

Execu ve Summary
PLEASE NOTE: Es mates in this dra memorandum are subject to change. In par cular, as we note
below, the variance es mates used for sta s cal inference in analyses of nonresponders should be
treated with care, as replicate weights for nonresponders are unavailable.
The American Housing Survey (AHS) is a biannual, longitudinal survey of housing units designed
by the U.S. Department of Housing and Urban Development and administered by the U.S. Census
Bureau. The purpose of this memorandum is to explore whether and to what extent nonresponse
bias is present in the 2015, 2017, and 2019 na onal AHS.

Evidence of Nonresponse Bias in the AHS
In Sec on 2, we present two independent sources of evidence for nonresponse bias in the AHS. First,
na onal-level popula on es mates derived from the 2015 AHS diverge significantly from comparable popula on quan es measured by the 2010 Census. Even when employing weights designed
to correct for nonresponse bias, the results suggest the 2015 AHS overes mates the share householders who own their house outright (no mortgage or loan), are white only, who are 65 or older,
and who are members of smaller households.
Second, we present several forms of direct evidence illustra ng that responding and nonresponding
units have very different characteris cs. Responders are ten percentage points more likely than responders to receive rental subsidies, for example, and are more likely to rent than to own. Whether
taking a ributes one-by-one or as a whole, the divergences between the measurable traits of responders and nonresponders are much greater than we would expect to see due to sampling variability alone.

Predic ng Nonresponse and Refusal
In Sec on 3, we use a set of machine learning methods to examine how well characteris cs measured for all units in the sample, taken from the sampling frame and the area in which they live,
predict any form of nonresponse and refusal more specifically. The analyses yield three main insights. First, the models fare very well by conven onal standards used to score machine learning
predic on accuracy, bolstering our confidence in our ability to predict nonresponse. This is important for the design of incen ve delivery mechanisms that target poten al nonresponders. Second,
the results show that our models predict outcomes in 2017 be er than in 2019 and that we are able
to predict refusal be er than nonresponse, more generally. Finally, the most important predictors
are prior year response and levels of effort related to interviewing units (e.g., the number of contact
a empts). Contextual features also help to predict nonresponse: it is more likely in areas with more
frequent cold and cool days, for example.

Pa erns of Par al Response
Sec on 4 goes beyond the binary dis nc on between response and nonresponse to look at why
some ques ons are le unanswered by survey-takers and why some units answer in one wave
of the AHS panel but not others. Respondents are least likely to answer ques ons that appear
sensi ve or are otherwise difficult to answer without more informa on, such as those pertaining to
the level of crime in the neighborhood. While ques ons posed later in the survey are more likely
to go unanswered, we do not uncover strong evidence in support of the idea that this arises due to
the addi onal me elapsed (e.g., due to interview fa gue).
Our analysis of which kinds of units respond in 2015 but dropout due to refusal in 2017 reveals
systema c pa erns using a rich set of data, since we are able to draw on the 2015 AHS responses.
h ps://oes.gsa.gov

1

We find units with younger householders interviewed later in the 2015 survey were most likely to
drop out in 2017. A host of other characteris cs measured in the 2015 survey are also associated
with the probability of dropping out, but no clear pa ern emerges.

Consequences of Nonresponse
Sec on 5 discusses some poten al consequences of nonresponse bias for researchers using the
AHS data. We show how panel a ri on could affect es mates of important rela onships, such as
how income relates to housing adequacy. Among units that responded in 2015, those who would
go on to respond in 2017 exhibit a very different rela onship between income and adequacy than
those who would drop out. Any analysis of longitudinal trends restricted to units who respond
in both 2015 and 2017 would thus overes mate the nega ve rela onship between income and
adequacy, even when employing weights. Similarly, metropolitan-level es mates from the 2015
AHS differ from the 2010 Decennial Census in ways that ma er more for some regions and for
some variables than for others. Whereas those who own a house with a mortgage or loan owing
are consistently undercounted in all metropolitan areas, the propor on of non-white householders
is most severely undercounted in metropolitan areas located in the states of California, Arizona,
and Texas. These results suggest that without a be er understanding of nonresponse bias rela ve
to their planned analysis (including choice of sample composi on, variable selec on, and level of
geography), researchers may draw misleading conclusions.
The analyses included in the memorandum, taken as a whole, provide several data points to demonstrate evidence of nonresponse bias in the AHS. The analyses also show that nonresponse can be
predicted, which suggests that interven ons targeted at encouraging higher response rates among
units likely to be underrepresented in the group of responders could help to reduce nonresponse
bias.
Note: The results used in this memorandum were approved under Census Bureau Disclosure Review
Board (DRB) approval numbers: CBDRB-FY20-373 and CBDRB-FY20-POP001-0179.

h ps://oes.gsa.gov

2

Nonresponse Bias in the AHS 2015-2019
Prepared by: Office of Evalua on Sciences,
U.S. General Services Administra on

1 Introduc on
The American Housing Survey (AHS) is a biannual, longitudinal survey of housing units designed
by the U.S. Department of Housing and Urban Development and administered by the U.S. Census
Bureau. The sample of housing units is drawn from residen al units in the United States and is
designed to provide sta s cs that represent both the country as a whole and its largest metropolitan
areas. The AHS provides important informa on on key features of the U.S. housing stock: how
many people rent versus own their homes? How many are evicted? What propor on of units have
adequate condi ons, and what are the demographics of those who live in inadequate units?
As with many federal surveys, the AHS has experienced declining response rates, requiring increasing amounts of me and effort to reach the 80 percent response rate preferred by the Office of
Management and Budget.1 In par cular, response rates have declined from approximately 85 percent in the 2015 wave to 80.4 percent in the 2017 wave to 73.3 percent in the 2019 wave.2
Within the context of a panel survey like the AHS, nonresponse not only is declining, but also is a
dynamic phenomenon. We refer to units where an interview does or does not take place as “responders” and “nonresponders” throughout this memo.3 As Table 1 shows, of the 67,775 occupied
units introduced into the na onal AHS sample in 2015, only 70 responded in both 2015 and 2017.
Thirteen percent of those units in which someone was interviewed in 2015 were not interviewed
in 2017, while 8 percent of those not interviewed in 2015 were interviewed in 2017. Another 8
percent of the occupied units sampled were never interviewed.
If the features we want to, but cannot, measure for nonresponders differ systema cally from those
of responders, nonresponse can lead to bias. If not addressed in some way, the presence of bias
implies that the sample es mates will not converge to the true, underlying quan ty in the popula on,
no ma er how large the sample of responders.
To account for this risk, the AHS calculates a nonresponse adjustment factor (NRAF) that reweights
for nonresponse within cells defined by metropolitan area, type of housing unit, block group median
income, and area-level rural/urban status. In principle, adjustments such as this, along with raking,
1. See the OMB guidance at: https://www.whitehouse.gov/sites/whitehouse.gov/files/omb/assets/OMB/inforeg/
statpolicy/standards_stat_surveys.pdf.
2. The response rates for the 2015 and 2017 waves are taken from the AHS public methodology reports. The response
rate for the 2019 wave is taken from our analysis of the IUF with the below restric ons to the na onal sample and
excluding the bridge sample, with values based on the coding responders as STATUS == 1, 2, or 3 (n = 63, 186) and
nonresponders as STATUS == 4 (n = 22, 965). These may differ from those in the published methodology report if there
are different inclusion criteria for the published rates to remove ineligible households.
3. In Sec on 1.1, we discuss dis nc ons between different types of units in each categories—namely, within responders, occupied units interviewed at their usual residence versus vacant units versus units interviewed elsewhere. Similarly,
nonresponders contain not only respondents who are found and who ac vely refuse, but also other categories. We discuss which types of responders and nonresponders we include in the different analyses
h ps://oes.gsa.gov

3

Table 1: Non-response among occupied units added to the sample in 2015

Category

N units

Interviewed 2015-2017
Interviewed 2015, Not interviewed 2017
Not interviewed 2015, Interviewed 2017
Not interviewed 2015-2017

47,442 (70 percent)
8,872 (13 percent)
5,713 (8 percent)
5,748 (8 percent)

should reduce or even remove the inferen al threats posed by nonresponse bias. However, there is
no guarantee that the model used for bias-adjusted es mates contains all the informa on it needs.
The purpose of this memorandum is to understand whether and to what extent the 2015, 2017, and
2019 waves of the AHS exhibit nonresponse bias, with and without adjustment. Sec on 2 assesses
the degree of bias both by comparing AHS es mates to the 2010 Census and by assessing whether
the traits of responders differ from those of nonresponders. Sec on 3 delves more deeply into the
sources of nonresponse, exploring how well we can predict nonresponse and which a ributes of
units and geographic areas are most predic ve. In Sec on 4, we move from a binary measure—did
the unit respond or not—to unpack the phenomenon of “par al” response: e.g., the fact that some
units drop out of the panel in 2017 having had interviews in 2015, or the fact that respondents somemes refuse to answer ques ons mid-survey. Finally, in Sec on 5 we give an overview of whether
and how much nonresponse bias affects researchers’ ability to es mate important rela onships in
the data and CBSA-level sta s cs.

1.1 A note on terminology and method
This sec on briefly reviews some key terminology covering the different samples, interview types,
and weights in the AHS, before providing an overview of how our analyses use different samples
and weigh ng choices.
There are two broad categories of AHS samples: the na onal and the metropolitan sample. The
na onal sample is a na onally-representa ve biannual panel, whereas the metropolitan sample is
comprised of a rota ng series of large metropolitan areas. We focus on the na onal AHS sample.
In all analyses, we exclude the 6,000 units that are part of the break-in series or “bridge” sample,
which are units that were part of the pre-2015 AHS panel le in to inves gate the effect of changes
to sampling introduced in 2015. We do not exclude other subsamples, such as the over-sampling
of HUD-assisted units, as these are accounted for in the weights employed throughout.4
The AHS na onal sample can be classified into four exclusive categories: regular occupied interviews, in which the usual occupants of a unit are interviewed; a vacant interview, in which the
owner, manager, janitor, or knowledgeable neighbor (if need be) of an empty building is interviewed;
a “usual residence elsewhere” (URE) interview, for units whose occupants all usually reside elsewhere; and a noninterview.
Noninterviews are split into three types: Type A noninterviews occur when a regular occupied interview or usual residence elsewhere interview fails, usually because the respondent refuses, is
temporarily absent, cannot be located, or presents other obstacles (such as language barriers the
field staff are unable to overcome). Type B and Type C noninterviews both pertain to failures to
4. Put in terms of the AHS variable names, we exclude units with BRGSMPFLG == 1. We then include units if they either
are part of the non-metro na onal sample (AHSCBSASUP == 6) or if they are part of the top 15 metros (AHSCBSASUP == 7
& TOP15FLG == 1).
h ps://oes.gsa.gov

4

interview someone about a vacant unit. If units are ineligible for a vacant interview during the attempt, but may be eligible later, they are classified as Type B noninterviews—for example, sites that
are under or awai ng construc on, are unoccupied and reserved for mobile homes, or are occupied
in some prohibited manner. Type C noninterviews are ineligible for a vacant interview and will remain so, for example, because they have been demolished or removed from the sample. We clarify
below how these different categoriza ons are employed in the analyses.
Finally, we employ different kinds of weights in the different analyses. The AHS uses a four-stage
weigh ng procedure. First, analysts calculate a “base weight” (BASEWGT) that adjusts for the inverse
probability that a unit is selected into the sample. Second, analysts apply so-called “first stage factors” (FSFs) that calibrate the number of units selected in each primary sampling unit strata to the
number of housing units in these strata as measured using an independent Census Bureau es mate.
The third stage involves a “noninterview adjustment factor” that uses five variables to define cells for
noninterview adjustment: Census division; type of housing unit; type of CBSA; block group median
income quar les; and urban rural status. The final step is applying what are called “ra o adjustment
factors” (RAFs) to the weights through raking, which is designed to produce weights that lead to esmates with lower variance by calibra ng weighted outputs to “known es mates of housing units
and popula on from other data sources believed to be of superior quality of accuracy” (U.S. Census
Bureau and Department of Housing and Urban Development 2018, 8).
The analyses in the present memorandum use two types of weights. For es mates that include
only respondents, we employ the composite weight, WEIGHT, which is the final output of the above
process, alongside the 160 corresponding replicate weights used to es mate the variance of sample
sta s cs. We refer to this as the “composite weight” or “adjusted weight” throughout, as it adjusts
not only for different probabili es of being sampled but also adjusts for poten al nonresponse bias.
In order to understand what nonresponse bias looks like when we do not try to explicitly adjust for
it through the weigh ng scheme, we also employ what we refer to as the “base weight” throughout, which corresponds to the inverse sampling probability of each unit, or the first stage weight
described above.5
Table 2 previews each of the analyses we report and the samples and weights used for each. In
general, there were two forms of varia on:
1. Which units are included in the analy c sample? Analyses that rely on characterizing demographic features of responders focus on (1) responders who are (2) classified by the STATUS
variable as an “Occupied interview,” or as a responder who is not a vacant interview or usual
residence elsewhere. Analyses that rely on sampling frame features generally focus on (1) responders regardless of their classifica on (including URE and vacant interviews) and (2) nonresponders regardless of their reason for nonresponse (including not only refusals but also
nonresponses due to other codes). Finally, other analyses focus specifically on contras ng
occupied interview responders with refusers.
2. Are the es mates reweighted and, if so, how? We describe whether and how we reweight
observa ons using the two types of weights described above—the base weights that only
account for differen al probabili es of being sampled and the composite weights that account
for both those differen al probabili es of selec on and nonresponse adjustment factors.

5. Put in terms of the AHS variables, the composite weight refers to the combina on of the WEIGHT variable and the
replicate weight variables REPWGT.*. The base weight refers to the BASEWGT variable.
h ps://oes.gsa.gov

5

Table 2: Analyses, samples, and weights used

Analysis
Which sample(s)?
Evidence of Nonresponse Bias
Benchmarking to
Decennial Census
(Sec on 2.1)

2015 AHS respondents (Occupied
Interviews only)

Reweight?

Ra onale

Compares
base
weight to
composite
weight

2015 since most
proximate to Decennial.
Occupied Interviews
for comparability.

Differences in
A ributes (chi-squared;
a ribute by a ribute)
(Sec on 2.2)

2015, 2017, 2019 (analyzed
separately); all respondents and
nonrespondents

Compares
unweighted
to base
weight

Differences in
A ributes (R-indicator;
summary measure
across a ributes)
(Sec on 2.3)

2015, 2017, 2019 (analyzed
separately); all respondents and
nonrespondents

Base
weight

Examines sampling
frame a ributes
relevant for all rather
than demographic
a ributes less relevant
for URE/vacant
interviews
Examines sampling
frame a ributes
relevant for all rather
than demographic
a ributes less relevant
for URE/vacant
interviews

Predic ng Nonresponse and Refusal
Predic ng nonresponse
(Sec on 3)

2017 wave; 2019 wave; all types of
nonresponse and interviews

Predic ng refusal
2017 wave; 2019 wave; refusals
(Sec on 3)
and occupied interviews only
Pa erns of Par al Response
Item order and par al
comple on (Sec on
2019 wave; responders only
4.1)
Par al comple on via
2015 is focal wave; 2017 refusal;
a ri ng from panels
focus on occupied interviews and
(Sec on 4.2)
refusals
Consequences of Nonresponse
2015 is focal wave; 2017 a ri on;
A ritor heterogeneity
focus on occupied interviews and
analysis (Sec on 5.1)
refusals
Metro-level
benchmarking (Sec on
2015 wave; responders only
5.2)

None
(Sec on 3
discusses)

General nonresponse

None

Refusal as specific
behavior

None
Composite
weight

Responders only

Composite
weight

Responders only

Composite
weight

Responders only

1.2 Informing experiments to reduce nonresponse bias
A second goal of this memorandum, in addi on to characterizing nonresponse bias in the AHS, is to
explore possible predictors of and mechanisms for nonresponse bias. Understanding the predictors

h ps://oes.gsa.gov

6

of nonresponse bias is useful for informing interven ons to reduce nonresponse bias. Specifically,
this memo informs an interven on designed to target incen ves at units most likely to contribute to
nonresponse bias with the goal of differen ally increasing responses among those units to achieve
more accurate survey es mates. As we discuss in greater detail later, one important considera on
in targe ng any interven on is whether the unit (or more precisely a person who resides within) is
likely to be a “never responder”—that is, they never respond even if targeted with an interven on—
or has characteris cs that indicate amenability to interviews given the right approach. This might
suggest modeling specific forms of nonresponse, such as refusal or a ri on between panels, if we
think these forms of nonresponse are more suscep ble to interven on.

2 Evidence of Nonresponse Bias in the AHS
2.1 Comparing 2015 AHS Sample Es mates to the 2010 Census: Na onal-Level Analysis
Background
A simple way to test whether the characteris cs of a sample diverge systema cally from the popula on from which it is drawn is to compare the popula on-level es mates with known popula onlevel quan es. Here, we leverage the fact that the American Housing Survey and 2010 Census
provide na onally representa ve sta s cs on adult householders to understand whether and to
what extend the AHS sample es mates diverge from 2010 Census counts.6
The 2010 Census defines a “householder” in the following manner:
One person in each household is designated as the householder. In most cases, this is the person, or one
of the people, in whose name the home is owned, being bought, or rented and who is listed on line one of
the ques onnaire. If there is no such person in the household, any adult household member 15 years old
and over could be designated as the householder.

The AHS defini on of a “householder” parallels that used by the 2010 Census:
The householder is the first household member listed on the ques onnaire who is an owner or renter of
the sample unit and is 15 years or older. An owner is a person whose name is on the deed, mortgage, or
contract to purchase. A renter is a person whose name is on the lease. If there is no lease, a renter is a
person responsible for paying the rent. If no one meets the full criteria, the age requirement is relaxed to
14 years or older before the owner/renter requirement. Where the respondent is one of several unrelated
people who all could meet the criteria, the first listed eligible person is the householder. In cases where
both an owner and renter are present, the owner would get precedence for being the householder.

We focus on how well na onal es mates of householder characteris cs from the 2015 AHS align
with 2010 Census summaries of the same characteris cs. Sta s cally significant differences may
arise due to nonresponse bias, but also through subtle differences in the defini ons or methods used
to iden fy householders, or due to demographic changes during the five-year period between the
2010 Census and the 2015 AHS. This analysis therefore provides an exploratory assessment of how
much nonresponse bias may exist in na onal-level es mates but does not conclusively establish that
such differences are due to nonresponse bias.
Methods
To calculate na onal es mates from the AHS, we first subset the 2015 internal use file to non-vacant
interviews7 that are not part of the bridge or metropolitan samples.
6. Note that the person who responds to the AHS survey and provides demographic informa on about the householder
may not necessarily be the householder.
7. The Decennial Census focuses on “Occupied Housing units”: “housing unit is classified as occupied if it is the usual
place of residence of the individual or group of individuals living in it on Census Day, or if the occupants are only temh ps://oes.gsa.gov

7

We take two approaches to weigh ng na onal average es mates: the first weights responses only
by the inverse of the probability the unit was sampled; the second weights responses by the composite weight used to account for differen al nonresponse in the AHS (see sec on 1.1 above). Comparison of the es mates derived from the two weigh ng schemes is informa ve about how well the
nonresponse adjustment factors and raking schemes account for possible nonresponse bias.
To es mate the variance of the sample mean es mates, we employ the standard replicate weights
contained in the internal use file.8 For each feature of interest, this procedure provides a weighted
mean es mate from the AHS and its standard error es mate. We treat the 2010 Census measure
of the characteris c as a known popula on mean (e.g., with variance of zero) and derive a p-value
through a one-sample, two-sided t-test of the null hypothesis that the sample mean is equal to the
popula on mean.
Results
The results are reported on Figure 1. Points correspond to the difference between 2010 Census
figures and sample-weighted (gray, BASEWGT) and bias-adjusted (green, WEIGHT) popula on mean
es mates from the 2015 AHS, with posi ve numbers indica ng possible overrepresenta on. Numbers on ver cal line centered at 0 correspond to 2010 Census means. Horizontal lines indicate 95
percent confidence intervals derived from standard errors es mated through BRR replicate weighting. When these do not overlap the ver cal line centered at 0, we interpret the difference to be
sta s cally significant (i.e., highly unlikely to arise due to sampling varia on alone).
The number centered at zero on the first row indicates that 19 percent of householders interviewed
in the 2010 Census owned their house outright (without loan or mortgage). Taking the analysis at
face value, the gray point on this row indicates that the 2015 AHS contains roughly eight percentage
points “too many” such householders. Bias adjustment helps somewhat, bringing the sample es mate closer to the popula on sta s c. However, even with bias-adjustment, the analysis presents
sta s cally significant evidence that the es mate of the propor on of householders who own their
property outright is too high.
Similarly, the propor on of householders iden fied as “white alone” is five percentage points higher
in the 2015 AHS than in the 2010 Census. Again, bias adjustment helps somewhat, but does not
remove the discrepancy completely: whereas the Decennial Census indicates 76 percent of householders na onally are white alone, the bias-adjusted AHS es mate puts this number closer to 79
percent. Of course, it is possible that these divergences stem from demographic changes over me,
so we should be careful in interpre ng them as strong evidence of nonresponse bias. However,
the direc on of demographic change between 2010 and 2015—a lower na onal propor on of nonHispanic white alone—could also mean we are underes ma ng the degree of bias.

porarily absent, such as away on vaca on, in the hospital for a short stay, or on a business trip, and will be returning.” In
the AHS, this is equivalent to focusing on non-vacant, usual residence occupied interviews.
8. Specifically, we use Fay’s Balanced Repeated Replica on (BRR) method with ρ = .5 as described in (Lewis 2015). .
This involves using both the WEIGHT variable and the 160 replicate weights.
h ps://oes.gsa.gov

8

Figure 1: Divergence between the 2010 Census and na onal es mates derived from the 2015 AHS. Points
correspond to the difference between 2010 Census figures and sample-weighted (gray) and bias-adjusted
(green) popula on mean es mates from the 2015 AHS. Numbers on ver cal line centered at 0 correspond
to the 2010 Census. For example, first row indicates that 19 percent of householders interviewed in the
2010 Census own their house outright (without loan or mortgage), while bias-adjusted es mates from 2015
AHS es mate this propor on is roughly 7 percentage points larger (26 percent). Horizontal lines indicate 95
percent confidence intervals derived from standard errors es mated through BRR replicate weigh ng.
% own house (no mortgage / loan)

0.19

% white alone

0.76

% aged 65 or over

0.22

% hispanic

0.13

% asian alone

0.04

% renters

0.35

% black alone

0.12

% aged 45-64

0.4

% husband-wife households

0.48

% aged 15-44

0.39

% own house (mortgage / loan)

0.46

Average household size

2.59

-0.15

-0.10

-0.05

0.00

0.05

Difference between 2010 Census and 2015 AHS
(Positive indicates overrepresentation)

Weighting by:

Sampling weight only
(BASEWGT)

0.10

Bias-adjusted weights
(WEIGHT)

As we move down the plot from those two items, the a ributes shown in the middle of the plot
(e.g., percent Hispanic; percent husband-wifehouseholds) do not appear to diverge strongly from
the 2010 Census. The weights produce a notable effect on the propor on of black householders:
the unadjusted divergence suggests the AHS slightly undercounts this group whereas the adjusted
es mate overcorrects and suggests an overcount. Finally, the results suggest that people aged 1544 and large households are underrepresented in the AHS sample.

2.2 Chi-square tests of differences between responders and nonresponders
Background
The previous sec on suggests that 2015 AHS es mates of popula on characteris cs diverge significantly from counts in the 2010 Census. Divergences like this can arise due to nonresponse bias, but
also due to actual demographic changes between the 2010 Census and 2015 AHS or the methodology used to sample householders. To assess whether nonresponse itself may play a role, we can
inves gate whether units that respond to the survey are systema cally different from those that do
not. This sec on looks at which a ributes differ between the two groups.

h ps://oes.gsa.gov

9

Methods
As Table 2 notes, the analy c sample is (1) comprised of all responders and nonresponders (regardless of whether the response was an occupied interview, URE interview, or vacant interview and
the reason for the nonresponse), (2) includes each of the three waves, with the analysis conducted
separately for each wave. “Unweighted” refers to es mates without any form of reweigh ng. The
purpose of these es mates is to show how the differences in a ributes in the raw sample will tend
to get smaller as weights are applied to adjust for certain forms of oversampling. “Weighted” refers
to es mates reweigh ng only by the inverse probability of selec on (BASEWGT).
Since all the sampling frame variables we examined are categorical, we use a Chi-square test to test
the null hypothesis that the frequencies of responders and nonresponders within each of the attribute levels is randomly and independently distributed. If the p-value indicates that the observed
Chi-square sta s c is highly unlikely given this null hypothesis (e.g., less than 5 percent), we interpret this as sta s cally significant evidence that the focal a ribute is not independent of response
status.9 Sta s cally significant evidence of divergences between responders and nonresponders
cons tutes sugges ve evidence of nonresponse bias, insofar as these characteris cs are correlated
with other important measures in the AHS.
The graphs show the following differences in propor ons:
Nlr
, (e.g., the
Nr
propor on of responders who fall into the “New England” category of the geographic division
a ribute),

1. Propor on of responders (r) that fall into a given category of some a ribute (l):

Nln
, (e.g., the propor on of
Nn
nonresponders who fall into the “New England” category of the geographic division a ribute),

2. Propor on of nonresponders (n) with that level of a ribute (l):

3. Difference between #1 and #2: a posi ve point es mate indicates that the a ribute level is
overrepresented among responders.
Results
Figure 2, which focuses on the 2019 wave, shows how responders differ from nonresponders for
several sampling frame a ributes. The gray bars represent the point es mates without weigh ng;
the green bars represent the point es mates reweigh ng for unequal probabili es of selec on (but
without any noninterview adjustment factors applied). The figure shows that the probability of selecon reweigh ng makes responders and nonresponders look much more similar by Census division,
presence of a rental subsidy, and region. However, the graph also highlights the difficulty of balancing along many a ributes. For instance, the reweighted es mates show more imbalance among
self-represen ng versus non self-represen ng units than the unweighted ones. Finally, even after this ini al reweigh ng, the two groups s ll look significantly different.10 Results for the 2015
and 2017 waves look substan vely similar, and all differences were sta s cally significant at the
p < 0.001 level.

9. Note that this test does not take account of the clustering and stra fica on involved in the sampling design and
makes an an -conserva ve assump on of independent sampling.
10. Appendix Table 7 shows the p-values for each of the tests.
h ps://oes.gsa.gov

10

Figure 2: Differences between responders and nonresponders: 2019 wave. The figure shows the extent to
which a level of an a ribute is overrepresented in responders rela ve to nonresponders. Results for the 2015
and 2017 waves are similar and are found in Appendix Sec on A.1.
DIVISION

FL_SUBSIZ

HUDSAMP

West South Central
West North Central
South Atlantic

Yes

Yes

No

No

Pacific
New England
Mountain
Middle Atlantic
East South Central
East North Central
METRO_2013

REGION

Non Micropol.

West

Micropol.

South

RENTSUB
Rent reduction
Public housing
Other government subsidy

Metro, Not Central City

Northeast

Metro, Central City

Midwest

No rental subsidy or reduction
Missing

RUCC_2013

SPSUTYPE

WPSUSTRAT
Vacant; 2+ unit
Vacant; 1 unit
Renters; 2+ unit
Renters; 1 unit
Owners; 2+ unit
Owners; 1 unit
Other
Mobile home
HUD records
CI record

Urban 20k+; non-metro adj.
Urban 20k+; metro adj.
Self-rep

Urban <20k; non-metro adj.
Urban <20k; metro adj.
Metro. county (250k-1mil)
Metro. county (1+ mil)
Metro. county (<250k)

Not self-rep

Completely rural; non-metro adj.
Completely rural; metro adj.
-0.10

-0.05

0.00

0.05

-0.10

-0.05

0.00

0.05

Difference in proportions
between responders and nonresponders
(positive = overrepresented in responders)
unweighted

-0.10

-0.05

0.00

0.05

weighted

Finally, Figure 3, focuses on the Census division in which the unit is located and uses the weighted
propor ons to examine whether the differences vary across waves. We focus on Census division
because of its importance in later-stage adjustments for nonresponse bias. The figure shows that
while regions tended to stay on the same side of the red line indica ng equal representa on—that
is, they tended to either consistently be over (above the line) or under (below the line) represented
among respondents—some regions stayed fairly consistent in having similar propor ons of responders and nonresponders and other regions fluctuated more—for instance, the Middle Atlan c region
moving closer to equal representa on. This analysis suggests not only that nonresponse bias likely
present but also that it is dynamic and can shi in magnitude and possibly direc on over me.

h ps://oes.gsa.gov

11

Difference in proportions
(responders v. nonresponders;
positive = overrep. in responders;
weighted)

Figure 3: Changes over me in over versus underrepresenta on. The figure focuses on the Census division
variable and shows varia on across waves in the extent of under versus overrepresenta on.

0.025

0.000

-0.025

-0.050
Wave: 2015

Wave: 2017

East North Central
East South Central
Middle Atlantic

Mountain
New England
Pacific

Wave: 2019

South Atlantic
West North Central
West South Central

2.3 Representa vity Analysis
Background
In addi on to the a ribute-by-a ribute analysis presented in the previous sec on, we can es mate
an overall measure of how the observed a ributes of responders differ from those of nonresponders. Schouten, Cobben, Bethlehem, et al. (2009) propose such a measure they call the “R-indicator.”
At its base, the R-indicator provides a standardized summary measure of whether observable characteris cs of responders differ systema cally from those of nonresponders.
Methods
The R-indicator is calculated as follows:
1. Es mate a binary regression predic ng “interviewed” or not, based on a ributes observed for
both respondents and nonresponders (S),
2. Using the regression parameters from Step 1, predict each unit’s propensity to respond, yˆ,
3. Find the standard devia on of predicted response propensi es, SD(ˆ
y ),
4. To get a value between 0 and 1, re-parametrize so that:
ˆ = 1 − 2 × SD(ˆ
R
y ).
Provided we have good measures of the a ributes of people who do not answer the survey, higher
ˆ indicate responders and nonresponders are similar, lower values indicate they are disvalues of R
similar. This approach relies on the availability of good measures observed for both kinds of units,
h ps://oes.gsa.gov

12

such as area-level characteris cs or administra ve data from other sources. It also relies on a wellspecified model to relate the observed a ributes to response status.
To understand the intui on behind the measure, consider the following thought exercise. Suppose
there is a response rate of 50 percent, but the model is unable to detect anything systema cally
different about the responders and nonresponders. In this case, the predic on for each unit in the
ˆ = 1 − 2 × 0 = 1. When
sample will be the same: yˆ = 0.5. As such, SD(ˆ
y ) = 0, which implies R
ˆ = 1, our model is telling us whether or not someone responds is as good as random, so those
R
who respond provide a good representa on of those who do not, even with a 50 percent response
rate.
Suppose instead that we were to discover that everyone who answered the survey had a first name
star ng with J, and none of the nonresponders had a first name star ng with J. If we include an
indicator for having a first name star ng with J in our model, it will perfectly predict response: Jill,
Jamal, and Julia, for example, would be predicted to respond with probability yˆ = 1, while Robin,
ˆ = 1 − 2 × 0.5 = 0. So,
Shaun, and Sara would have probability yˆ = 0, implying SD(ˆ
y ) ≈ 0.5, thus R
ˆ tells us how well responders represent
condi onal on having the right predictor for nonresponse, R
ˆ
nonresponders. Note that R does not tell us how well responders and nonresponders represent the
target popula on, only if the two groups are similar.
Of course, perfect predic on hardly ever happens in prac ce: just by random chance, we might
end up with a large amount of people whose name starts with J who happen to respond, even if
there is no true underlying correla on between these phenomena. Given the possibility that random
sampling can produce meaningless correla ons, the ques on is whether the correla ons we observe
ˆ that
in our model are greater than we would expect to observe just by chance alone. Values of R
are really unlikely to occur just due to random chance, say less than 5 percent, are “sta s cally
significant.”
ˆ we observe, we need to es mate the variance of R.
ˆ
To infer the probability of ge ng the R
ˆ
Schouten, Cobben, Bethlehem, et al. (2009) derive the standard error of R through resample
ˆ is normally distributed.
bootstrapping. In order to obtain confidence intervals, they assume that R
However, our analyses suggest these standard errors are not amenable to the typical Z-score
transforma on used to obtain p-values in T -tests.
We therefore use a permuta on test in order to make an inference about whether we would exˆ simply by chance, or whether the observed R
ˆ is sta s cally significant.
pect to see the observed R
ˆ hundreds
Specifically, we randomly shuffle the variable indica ng response and re-es mate the R
ˆ values we might have es mated if there were truly no
of mes in order to obtain some of the R
correla on at all between the predictors and the outcome. We compare this distribu on to the
ˆ to get a p-value corresponding to a one-sided test: the probability of observing just by
observed R
ˆ at least as low as the one we observed, supposing that there is no true rela onship
chance an R
between nonresponse and our predictors. We calculate this probability by taking the propor on of
permuted R-indicators at least as low as the observed one.
We consider 73 predictors that are available for both responders and nonresponders sampled into
the 2015, 2017, and 2019 AHS surveys. These include variables from the sample frame, such as the
whether the housing is HUD-assisted, as well as informa on about the Census tract level in which
the poten al respondent is located drawn from the American Community Survey (ACS), such as the
propor on of the units that are rented or the propor on of people who are white.11 These ACS
11. Some predictors had to be dropped due to collinearity, which arises when two or more variables contain very similar
h ps://oes.gsa.gov

13

features are important because we know li le about the demographics of “never responders.” Note
that we are measuring characteris cs of areas rather than characteris cs of a par cular household
in that area.
Results
Figure 4 plots the observed value for the R-indicator (thin ver cal line) alongside the distribu on of
permuted R-indicator es mates (dark gray histogram). For all waves under analysis, the es mated
R-indicator is below one and much lower than we would expect to see just due to random chance. In
other words, we find sta s cally significant evidence that responders and nonresponders differ on
a host of observable characteris cs. Table 8 in Appendix Sec on 2.3 reports the numerical results:
across the three waves, the R-indicator ranges from 0.90 to 0.92.
Figure 4: Evidence of systema c differences between responders and nonresponders across a range of
predictors. Thin ver cal line indicates es mated R-indicator. Gray histogram represents distribu on of es mated R-indicators under the null hypothesis that response is independent from all predictors. The results
are “sta s cally significant” insofar as the observed R-indicator is highly unlikely to arise due to chance alone
under the null of independence.
2015

2017

2019

count

100
50
0

0.85

0.90

0.95

1.000.85

0.90

0.95

1.000.85

0.90

Observed vs. Permuted R Indicator Estimates

0.95

1.00

One concern is that the permuta on procedure does not faithfully describe the sampling varia on,
which might produce misleading p-values. Since the R-indicator analysis is simply another way of
answering the ques on “how well does the model predict the data,” we can also use a more convenonal approach to hypothesis-tes ng called a Likelihood Ra o Test (LRT). In essence, this test asks
whether adding the predictors to an intercept-only model improves predic ons more than we would
expect by random chance alone. The results, also presented on Table 8 in the appendix, confirm the
main finding: sta s cally significant evidence that nonresponders’ a ributes differ from those of
responders.

2.4 Sec on Summary
This sec on presents strongly sugges ve evidence of nonresponse bias in the AHS. The 2015 AHS
na onal es mates depart from corresponding popula on-level counts in the 2010 Census in key areas such as householder race and ownership status. Of course, divergences such as this may arise
for reasons unrelated to the systema c exclusion of certain groups from the sample. However, in
an analysis of a host of a ributes available for those who do and do not respond to the survey, such
as their housing type and the demographic characteris cs of their neighborhood, we find strong evidence that responders look different from nonresponders. Analyzed either one-by-one or taken as a
or equivalent informa on on units in the analysis, and thereby “cancel out” each other’s es mated influence on the
outcome.
h ps://oes.gsa.gov

14

whole, the a ributes of responders systema cally differ from those of nonresponders. Future analyses could explore how much of the gap remains when we adjust es mates with the nonresponse
adjustment factor.

3 Predic ng Nonresponse and Refusal
Background
The R-indicator analysis in the preceding sec on uses a ributes available for both responders and
nonresponders to predict where nonresponse is most likely to occur. It does so using a fairly limited
predic ve method: a parametric model where (1) a ributes about units enter addi vely into the
model and (2) the model does not perform variable selec on, or regulariza on that “zeroes out”
the influence of a ributes that do a poor job of predic ng nonresponse. Many be er methods for
predic ng binary outcomes exist.
The goal of the present analysis is to use a series of more flexible classifiers for two purposes. First,
we predict which units will be nonresponders or refusers in a given wave of the AHS. Second, we
focus on the top-performing models to explore which features of units best predict nonresponse
and refusal.
Methods
The analysis focuses on predic on of one of two binary outcomes:
1. General nonresponse:
• 1 = nonresponder: for any reason (Types A, B, and C);
• 0 = responder: this includes (1) occupied interviews, (2) vacant interviews, (3) URE interviews.
2. Refusal:12
• 1 = nonresponder: due to refusal (subset of Type A nonresponse);
• 0 = responder: occupied interview only. Since occupied interviews provide the most
direct contrast with refusals, the analy c sample excludes nonresponders who are not
refusers as well as vacant and URE interviews.
We fit a series of binary classifiers to predict these two outcomes.13 Table 3 outlines the classifiers,
which fall into two general categories.
First are tree-based classifiers. At its core, a tree-based classifier is an algorithm that is looking to find
combina ons of a ributes within which there are only responders or only nonresponders. Star ng
with the simplest version—a decision tree (dt.* in Table 3)—imagine we start with two features:
the Census region in which a unit is located and the percentage of households with a high school
educa on or less. The classifier might first find that areas where fewer than 10 percent of households have HS educa on or less have units that are more likely to respond, crea ng a split at that
value. The “tree” has its first “branch,” with one group of people at the end of the “fewer than 10
percent” fork and another group of people at the “greater than 10 percent” fork. Now suppose that,
among the first group, one region had propor onally many more responders than the other, but
12. This outcome is similar to the one used in the panel a ri on analysis discussed below in Sec on 4.2. It differs in
that it includes “never responders,” whereas the panel a ri on analysis is subset to those who responded in the 2015
wave.
13. We chose classifiers using useful list for data science applica ons: https://github.com/rayidghani/magicloops.
h ps://oes.gsa.gov

15

among the second group, region does not seem to make a difference. In that case, there will be a
second branch between high- and low-responding regions among those in areas where fewer than
10 percent of people have a HS diploma, but no such split among those who live in the areas with
more than 10 percent of people with HS diplomas. The maximum depth parameter constrains the
number of splits and branches our tree can have.
Chance varia on can lead to very idiosyncra c trees—the classifier tends to “overfit” to the data,
meaning that its par cular set of branches and splits will not do a good job of sor ng responders
from nonresponders in other samples. Random forest models (rf.*) are a solu on to this problem
that generalize the idea of decision trees. The idea is to fit many hundreds of decision trees (a
forest) using two sources of random varia on. One is random samples of the data with replacement;
another is random subsets of the features used for predic on—so, for instance, rather than including
all ACS features in a par cular tree, one tree might have percent renters and racial demographics;
another percent owners and racial demographics. The n_estimators argument changes the number
of trees in the forest.
Finally, we employ gradient-boos ng models (gb.*) and adap ve boos ng (ada). These are two ensemble classifiers—each takes a series of shallow decision trees (“weak learners”). Adap ve boos ng
starts with a weak learner and then improves predic ons over itera ons by successively upweighting observa ons that were poorly predicted in itera on i − 1. Gradient boost operates similarly,
though instead of upweigh ng poorly predicted observa ons, it uses residuals from the previous
itera on in the new model.
Overall, these tree-based classifiers aim to improve predic on by spli ng and combining predictors. They generate what are called feature importances—measures of whether a predictor improves
predic on of nonresponse. Importantly, feature importance metrics are direc onless: that is, they
measures how high up in a tree or how frequently an a ribute is chosen, for example, irrespec ve
of the sign or size of the coefficient.
The second category of classifiers are regulariza on-based. We use different forms of the lasso procedure, which is designed to strike a middle ground between selec ng too many and too few variables
as the best predictors of nonresponse and refusal. The procedure employs “penalized” regression.
Put simply, the algorithm tries to fit a model with a “good” score. As its predic ons of nonresponse
and refusal get more accurate by adding be er predictors, its score improves. However, for each
variable the algorithm adds to a model, the score decreases—there is a penalty for including more
predictors. In theory, if the degree to which new predictors are penalized is calibrated correctly,
the algorithm will include the minimal set of variables that do a good job of predic ng the outcome,
while excluding those that do not add to the predic ve accuracy, either because they are redundant
(collinear with already-included variables) or do a poor job of predic ng.14

14. All of these classifiers were fit in python 3.6 using scikit-learn.
h ps://oes.gsa.gov

16

Table 3: Models used to predict nonresponse and refusal in full sample
Shorthand
Longer descrip on
Tree-based models
Shallow decision
dt_shallow
tree
Deeper decision
dt_deep
tree
Random forest
rf_few
with fewer trees
Random forest
rf_many
with more trees
Gradient boos ng
gb_few
with fewer trees
Gradient boos ng
gb_many
with many trees
ada
AdaBoost
Regulariza on-based models
logit
Logit
Logit with penalty
logitcv
term selected via
cross-valida on
Logit with L1
logitl1
penalty

Parameters
DecisionTreeClassifier(random_state=0,
max_depth = 5)
DecisionTreeClassifier(random_state=0,
max_depth = 50)
RandomForestClassifier(n_estimators = 100,
max_depth = 20)
RandomForestClassifier(n_estimators = 1000,
max_depth = 20)
GradientBoostingClassifier(criterion=
'friedman_mse', n_estimators=100)
GradientBoostingClassifier(criterion=
'friedman_mse', n_estimators=1000)
AdaBoostClassifier()
LogisticRegression()
LogisticRegressionCV()
LogisticRegression(penalty = "l1")

We fit these models to two sets of features:
1. AHS-only features from two sources:
(a) AHS sampling frame or master file variables. We use 48 binary indicators created from
each categorical level of the following variables:
i. DEGREE: this is a measure of area-level temperature, and reflects places with hot
temperatures, cold temperatures, and mild temperatures based on the number of
hea ng/cooling days.
ii. HUDADMIN: this is a categorical variable based on HUD administra ve data for a type
of HUD subsidy such as public housing or a voucher.
iii. METRO: this is a categorical variable for the type of metropolitan area the unit is located in (e.g., metro versus micropolitan) based on OMB defini ons for 2013 metro
areas.
iv. UASIZE: this is a categorical variable for different sizes of urban areas when applicable.
v. WPSUSTRAT: this is a categorical variable for the primary sampling unit strata.
(b) Response and contact a empt variables from the previous waves. We exploit the longitudinal nature of the data and use the unit’s past response-related outcome to predict
its status in a focal wave:
i. total prior contact a empts (a numeric measure);
ii. the total number of interviews in the prior wave (capturing respondents who needed

h ps://oes.gsa.gov

17

mul ple interviews to complete par cipa on);
iii. whether the unit was a nonresponder in the previous wave (binary).
2. AHS + ACS adds the following to the previous list:
(a) American Community Survey (ACS) 5-year es mates of characteris cs of the unit’s Census tract. We list these variables in Appendix Table 9. They were matched to waves as
follows so that the predictor is measured temporally prior to the outcome: 2015 wave
(ACS 5-year es mates 2009-2014); 2017 wave (ACS 5-year es mates 2011-2016); 2019
wave (ACS 5-year es mates 2013-2018). They reflect race/ethnicity, educa onal a ainment, and different housing-related measures.
Finally, we evaluate the models using 5-fold cross-valida on. The sample is randomly split into five
evenly sized groups. Then, the model is fit to the data obtained by pooling four of the five groups
(the training set). That model is used to generate predic ons in the fi h, held-out group (the tes ng
set). We use a set of evalua on metrics described below to measure how much those predic ons in
the fi h group deviate from the actual values the model is trying to predict. The process is repeated,
using each fold as the held-out fold and calcula ng the scores each me. The results are averaged
across the five folds.
We look at three different outcomes of the predic ons to calculate three separate evalua on metrics
in the held-out or test fold. These are based on comparing a unit’s actual nonresponse status to
its predicted nonresponse status. Units can fall into four mutually exclusive categories, and the
evalua on metrics are different summary measures of the categories across the en re held-out
fold:
1. T P : a nonresponder is correctly predicted to be a nonresponder
2. F P : a responder is incorrectly predicted to be a nonresponder
3. F N : a nonresponder is incorrectly predicted to be a responder.15
From there, we can construct three composite measures as ra os of the total number of units falling
into each category:
Total TP
Among predic ons of nonresponders, what propor on are acTotal TP + Total FP
tually nonresponders;

1. Precision:

Total TP
Among actual nonresponders, what propor on do we correctly
Total TP + Total FN
predict to be nonresponders, as opposed to erroneously predic ng that they are responders;

2. Recall:

3. F1 Score: 2 ∗

P recision ∗ Recall
Explained below.
P recision + Recall

If we have precision of 1, that means every me the model predicted a unit was a nonresponder, it
actually was. For example, if there are 50 nonresponders and 50 responders, as long as the model
predicts at least one nonresponder and no responders are falsely predicted to be nonresponders, it
will have precision of 1. If instead, every me the model predicts a nonresponder that unit is actually
a responder, its precision will be 0.

15. We do not need the fourth possible outcome of true nega ves (correctly predicted responders), since T N = 1 −
TP − FN − FT.
h ps://oes.gsa.gov

18

For recall, we have to look at the subset of actual nonresponders. If there are two nonresponders in
a sample of 100 people, and the model predicts every single person in the sample is a nonresponder,
then 100 percent of nonresponders are correctly predicted to be nonresponders and the recall will
be 1. However, if the model does not predict any nonresponders to be nonresponders, its recall will
be 0.
We use the F1 Score as the main summary metric, since it helps us balance between finding all
nonresponders (high recall) while s ll ensuring that the model accurately separates out responders
from nonresponders (precision). Note that one measure may be more useful over another in other
applica ons. For an interven on targe ng nonresponse bias, where there could be a higher cost
to failing to predict nonresponse (false nega ves) than to wrongly predic ng nonresponse (false
posi ves), we may priori ze models with high recall.
While what counts as a “good” F1 Score varies based on the context, generally, scores above 0.7
are considered evidence of a high-performing model. To gain more intui on, consider the simplified
example in Table 4 of predic ons for 20 units and where we use 0.75 as the cutoff for transla ng
a con nuous predicted probability of nonresponse (NR) to a binary label of NR or respond (R).16
3
Our precision is
= 0.75 since we have three true posi ves and one false posi ve. We could
3+1
increase our precision through raising the threshold for what counts as a true predicted nonresponse
3
= 0.5
to 0.8. However, doing so would hurt our recall which in the case of the example is
3+3
due to the presence of false nega ves in the lower predicted probability range. The F1 Score is
less interpretable than either of these since it combines the two, but in this case, it would be 2 ∗
0.75 ∗ 0.5
= 0.6, which is lower than what we observed in our real results. The example also shows
0.75 + 0.5
that we can target our desired metric—for instance, capturing all nonresponders even if it leads to
some false posi ves—by changing the threshold for transla ng a con nuous value (e.g., yˆ = 0.8)
into a binary predic on of nonresponse.

16. The choice of threshold can be calibrated to balance precision with recall. The results presented use sklearn’s auto
threshold for each of the models, which is generally 0.5. Next steps might involve be er calibra ng the threshold to
a value that corresponds to the number of units we can target in an incen ve experiment targe ng units likely to be
underrepresented in the responses (e.g., the 10,000 units with the highest predicted probability of nonresponse).
h ps://oes.gsa.gov

19

Table 4: Illustra on of the evalua on metrics: example predic ons

ID
1537
1177
1879
1005
1187
1034
1159
1181
1071
1082
1603
1762
1319
1359
1238
1490
1465
1338
1766
1807

Pred. yˆ
con nuous
0.99
0.93
0.84
0.78
0.72
0.71
0.60
0.52
0.49
0.47
0.44
0.33
0.29
0.24
0.21
0.17
0.17
0.11
0.07
0.04

Pred. yˆ
binary
NR
NR
NR
NR
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R

True y

error_category

NR
NR
NR
R
R
R
NR
NR
R
R
R
R
R
NR
R
R
R
R
R
R

True pos.
True pos.
True pos.
False pos.
True neg.
True neg.
False neg.
False neg.
True neg.
True neg.
True neg.
True neg.
True neg.
False neg.
True neg.
True neg.
True neg.
True neg.
True neg.
True neg.

3.1 How well can we predict nonresponse and refusal?
Results
Figure 5 focuses on predic ng general nonresponse in the 2019 wave and shows that we are able
to predict nonresponse with a high degree of accuracy.17 Both types of approaches—regulariza on
with the penalty chosen via cross-valida on (logitcv); and tree-based approaches—performed well.
The one model that performed less well was the “deep” decision tree. It is possible that this classifier
overfit to the data because it used a single tree without a high number of predictors. Appendix
Figure 22 shows the results for the 2017 wave, where our ability to predict is substan ally higher
than in the 2019 wave (mean F1 score across models of 0.88 in the 2017 wave compared to a mean
F1 score of 0.85 in the 2019 wave). As we discuss in the sec on summary, this could affect how
well we think we are able to predict nonresponse in the 2021 wave that will be the target of the
proposed incen ves experiment.
Comparing the predic ons from the two types of features—features from the AHS only (including
lagged response-related outcomes); those features and ACS contextual features—the contextual
features from the American Community Survey (1) improve predic ons across all classifiers but
(2) these improvements are small, with only small increases in the F1 Scores for the models with
ACS features compared to the models without. For the second, as the next sec on shows, when
included, these ACS features are important predictors. This is likely due to a combina on of reasons.
First, the most predic ve features in all models were lagged response-related variables—this lessens
the predic ve power of either ACS contextual features or AHS sampling frame features like region.
17. A er fi ng, we ended up excluding two of the logis c-based models from evalua on—logitl1 and logit—because
while they had F1 Scores in the 0.80-0.85 range, the penalty parameters zeroed out nearly all of the predictors.
h ps://oes.gsa.gov

20

Second, since the sampling frame variables are largely geography-based, they may capture similar
informa on as the ACS contextual features.
Figure 5: Ability to predict nonresponse: 2019 wave. The figure shows F1 scores for two types of feature
sets: AHS-only (which includes both sampling frame variables and lagged response/contact a empt variables)
and those plus the ACS contextual features.
0.8452

logitcv

0.8453
0.8456

gb_few

0.8448
0.8456

gb_many

0.8446
0.8473

rf_many

0.8428
0.8468

rf_few

0.8426

dt_shallow

0.8421

ada

0.8393

0.8434

0.8386
0.7633

dt_deep

0.8072

0.00

0.25

0.50

F1 Score
(closer to 1 = better)
AHS

0.75

1.00

AHS +
ACS

Figure 6 focuses on the contrast between our ability to predict nonresponse (previous graph) and
our ability to predict refusal. Each dot represents one of the final models. The fact that all models
are above the 45 degree line shows that while the predic ons of nonresponse and refusal each
have high F1 scores, the higher F1 scores of refusal indicate that we are be er able to predict that
outcome. Appendix Figure 23 shows the raw scores for 2017 and 2019 for the refusal models, for
which we only es mated the models with combined AHS and ACS features. Similar to the results
for nonresponse, the models show substan ally be er performance in the 2017 wave than in the
2019 wave.
As we explore further in the next sec on, our ability to predict refusal be er than nonresponse
could be driven by the fact that our most important predictor for nonresponse is whether the unit
was a nonresponder in the previous wave; for refusal, our most important predictor is whether the
unit is a refuser in the previous wave. In turn, since refusal is a narrower, more behaviorally-rooted
category than more general nonresponse,18 we might be be er able to leverage past refusal to
predict refusal in a focal wave than past nonresponse to predict nonresponse in a focal wave.

18. As we discuss in Sec on 1.1, general nonresponse contains technical forms of nonresponse like Type C nonresponse
(e.g., mobile home moved; permit abandoned) or other forms of Type A nonresponse like not being home.
h ps://oes.gsa.gov

21

Figure 6: Ability to predict refusal versus ability to predict nonresponse: 2019 wave Each dot represents a
model. The x axis shows that model’s performance in predic ng nonresponse (rela ve to all types of response).
The y axis shows that model’s performance in predic ng refusal (rela ve to occupied interviews). We see that
the deep decision tree performs much worse than other models for each type of outcome. For all models, we
are significantly be er at predic ng refusal than nonresponse.

F1 Score for model predicting refusal

0.90

Above line:
better at predicting
refusal

0.85

0.80

0.75

0.70
0.70

0.75

0.80

0.85

F1 Score for model predicting nonresponse
ada
dt_deep

dt_shallow
gb_few

gb_many
logitcv

0.90

rf_few
rf_many

3.2 Top predictors of nonresponse and refusal
Results
The previous results show be er, but not substan ally be er, performance when we include contextual features from the ACS. We can dig deeper into these pa erns by focusing on two of the
be er-performing models that yield different types of “top predictors”: the random forest with
many trees, which yields direc onless feature importances, and the penalized logit, which yields
more tradi onal coefficients that have a posi ve (predicts nonresponse or refusal) or nega ve (predicts response or non-refusal) sign.
Importantly, and as in the panel a ri on analysis we discuss later, all features are predic ve rather
than causal. For instance, there might be unobserved characteris cs of a unit that lead that unit
to refuse in both the 2015 and 2017 waves. The “did not respond in 2015” feature in the model
predic ng 2017 nonresponse is thus a proxy for those unobserved characteris cs, rather than someone’s nonresponse in a previous wave ac vely causing their nonresponse in a focal wave. Second,
within predic ve features, some yield more insight than others into mechanism for nonresponse or
refusal. For instance, knowing that someone needed to be contacted five mes before a response in
the previous wave rather than just once may be highly predic ve of nonresponse in the focal wave.
But we gain li le insight into why they were both “reluctant responders” in the previous wave and
nonresponders in the focal wave. In contrast, features like the ACS variables on the educa onal at-

h ps://oes.gsa.gov

22

tainment of the local area, though possibly subject to ecological fallacy issues, could indicate more
informa ve pa erns.19 In other words, it is more informa ve to know that area-level educa onal
a ainment is predic ve of nonresponse because we may hypothesize that it relates to the level of
trust in a government-sponsored survey, which can be addressed in an interven on, than to know
only that a unit did not respond without addi onal informa on.20
Figure 8 shows the a ributes with the top 20 feature importances for predic ng nonresponse in
the 2019 wave in the models using the combined AHS and ACS features, with Appendix Figure
24 showing the ranking of the remaining features. The figure shows that, perhaps unsurprisingly,
the most important predictor of nonresponse in 2019 is 2017 nonresponse. Similarly, regardless of
response status, the number of overall contact a empts and in-person contact a empts is highly
predic ve. These predictors fall into the category of useful if our goal is pure predic on, but they
are arguably less informa ve for understanding mechanisms behind nonresponse. Contextual ACS
features are perhaps more useful in genera ng hypotheses to explore. The local area’s unemployment rate, age distribu on, and monthly housing costs are all highly predic ve. Yet two limita ons
in interpreta on remain. First, the the graphs reflects highly predic ve features without direc on—
so, for instance, higher median age is highly predic ve but we do not know from the model alone
whether it predicts response or nonresponse.

19. The ecological fallacy occurs when we use aggregate data—in this case, data about Census tract characteris cs—
to infer things about individuals that are part of that aggregate. In the present case, we observe a general correla on
between an area having higher educa onal a ainment and that area having a lower likelihood of nonresponse. However,
it could be the case that within areas with higher educa onal a ainment, lower educa onal a ainment individuals are
actually the most likely to respond.
20. The pa ern—area-level lower SES is associated with a higher likelihood of unit-level nonresponse (e.g., Maitland
et al. 2017)—has been observed in other social surveys. While trust is one mechanism, there might be many others like
work schedules, me pressures, and more.
h ps://oes.gsa.gov

23

Figure 7: Most important predictors of nonresponse: random forest model; 2019 wave The figure illustrates
the top 20 features. The ACS less than 100, 800 to 899, 1500 to 1999 variables refer to the dollar amounts
of monthly housing costs.
NR LAST
CTATEMPT TOTAL PRIOR
CTATEMPT TOTAL IP PRIOR
ACS IN THE LABOR FORCE UNEMPLOYED
ACS SOME COLLEGE OR ASSOCIATES DEGREE
ACS HISPANIC OR LATINO
ACS LESS THAN 100
ACS ESTIMATE MEDIAN HOUSEHOLD INCOME IN THE PAST
12 MONTHS IN 2014 INFLATION ADJUSTED DOLLARS
ACS TWO OR MORE RACES
ACS ESTIMATE MEDIAN AGE TOTAL
ACS 900 TO 999
ACS LIVING IN HOUSEHOLD WITH SUPPLEMENTAL SECURITY
INCOME SSI CASH PUBLIC ASSISTANCE INCOME OR FOOD
STAMPSSNAP IN THE PAST 12 MONTHS
ACS 1500 TO 1999
ACS 800 TO 899
ACS 700 TO 799
ACS BLACK OR AFRICAN AMERICAN ALONE
ACS FOREIGN BORN NONCITIZEN
ACS BACHELORS DEGREE OR HIGHER
ACS 600 TO 699
ACS RENTER OCCUPIED HOUSING UNITS 20000 TO 34999
30 PERCENT OR MORE

0.000

0.025

0.050

0.075

Feature importance from random forest;
(2019 wave; higher = more predictive
either direction)

To address the shortcomings of non-direc onal feature importance, we turn to the top predictors
from the penalized logit, which provides signed coefficients and has significant overlap in top predictors with the random forest model. Figure 8 shows the top 10 most highly posi ve (predic ve
of nonresponse) and highly nega ve (predic ve of response) features from the penalized logis c
regression. The results show that most of the highly-predic ve features in the random forest were
highly predic ve of nonresponse in the penalized logit—for instance, total contact a empts and prior
nonresponse in 2017 are highly associated with 2019 nonresponse. In addi on, features like DEGREE,
which captures area-level temperature, show that areas with more cold and cool days have a higher
likelihood of nonresponse (2 and 3, which represent areas where people need to use heat for a
higher propor on of the year), and areas with mild or mixed temperatures have a higher likelihood
of response. While these predictors may reflect pa erns like the ease of in-person enumerators
reaching households, they could also be proxies for unobserved characteris cs of areas. Meanwhile, some features associated with a higher response level, like having more interview a empts in
the prior wave, likely also reflect that units that respond in previous wave are likely to be responders
again in the next wave.

h ps://oes.gsa.gov

24

Figure 8: Most important predictors of nonresponse: penalized logit; 2019 wave The figure illustrates the
top 10 posi ve and top 10 nega ve features.
NR LAST
CTATEMPT TOTAL IP PRIOR
CTATEMPT TOTAL PRIOR
DEGREE NAN
DEGREE 3.0
UASIZE 21
DEGREE 2.0
WPSUSTRAT 98
METRO 2013 3
ACS ASIAN ALONE
UASIZE 19
HUDADMIN 3.0
ACS RENTER OCCUPIED HOUSING UNITS 50000 TO 74999
30 PERCENT OR MORE
WPSUSTRAT 3
ACS SOME OTHER RACE ALONE
WPSUSTRAT 04
UASIZE 22
INTNMBR PRIOR
DEGREE 4.0
DEGREE 5.0
0.0

0.1

Penalized logit coefficient value

0.2

0.3

The second caveat in interpre ng the results from the main models is that the model focuses on all
forms of nonresponse. But as the previous sec on showed, we are be er able to predict refusal
than nonsreponse more generally. We next turn to the feature importances from the predic ng
refusal models to see which a ributes remain important and which do not.
Figure 9 shows a sca erplot where each dot is a top feature from the random forest model predic ng
refusal. The x axis reflects its importance in the refusal model; the y axis its importance in the
nonresponse model. Features above the 45 degree line are more predic ve of nonresponse than
refusal; features below of refusal. We see some pa erns like the number of contact a empts in the
previous wave being more predic ve of refusal than of nonresponse. However, the generally high
correla on shows that refusal and nonresponse are generally predicted by similar factors.

h ps://oes.gsa.gov

25

Figure 9: Top predictors of refusal versus top predictors of nonresponse: random forest; 2019 wave The
figure illustrates the top 10 posi ve and top 10 nega ve features.
ACS
0.023

ACS IN THE LABOR FORCE UNEMPLOYED
ACS SOME COLLEGE OR ASSOCIATES DEGREE
ACS LESS THAN 100

ACS ESTIMATE MEDIAN HOUSEHOLD INCOME IN THE PAST
12 MONTHS IN 2014 INFLATION ADJUSTED DOLLARS

ACS TWO OR MORE RACES
ACS ESTIMATE MEDIAN AGE TOTAL
ACS 900 TO 999
ACS LIVING IN HOUSEHOLD WITH SUPPLEMENTAL SECURITY
INCOME SSI CASH PUBLIC ASSISTANCE INCOME OR FOOD
ACS 1500 TO 1999
STAMPSSNAP IN THE PAST 12 MONTHS
ACS 800 TO 899

0.022

Nonresponse

0.021

ACS HISPANIC OR LATINO

ACS 700 TO 799
ACS BLACK OR AFRICAN AMERICAN ALONE
ACS FOREIGN BORN NONCITIZEN
ACS BACHELORS DEGREE OR HIGHER
ACS 600 TO 699

ACS NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER
ALONE

0.021

0.022

0.023

Other
REFUSE or NR LAST

0.08

0.06

0.04

CTATEMPT TOTAL PRIOR
CTATEMPT TOTAL IP PRIOR

0.05

0.06

Refuse

0.07

0.08

3.3 Sec on Summary
The results from the predic ve models yield three main findings. First, we are be er at predic ng
both nonresponse and refusal in the 2017 wave than in the 2019 wave. Second, we are be er at
predic ng refusal than at predic ng general nonresponse. As it relates to the planned interven on
discussed in Sec on 1.2, the be er ability to predict refusal could suggest a way to try to improve
the efficacy of targe ng and thereby reduce nonresponse bias. Namely, refusal is a behavior that
we can poten ally modify, but other forms of nonresponse may stem from non-behavioral factors
that are less likely to be affected by an interven on. For the interven on, we will consider carefully
how the outcome we aim to predict—whether refusal or a more specific form of refusal like refusal
over the phone but “yes” in person—correspond to different study goals. Third, the most important
predictors of both nonresponse and refusal are the rela vely “black-box” factors of a unit’s status in

h ps://oes.gsa.gov

26

the previous wave and its contact a empt history. These are arguably less useful for understanding
mechanisms of nonresponse than some of the lower-ranked ACS contextual features.
As we approach the proposed incen ves experiment, we plan to dig more deeply into why our ability
to predict is higher in 2017 than in 2019. One source could be the higher rates of nonresponse in
2019 than in 2017, which could reflect that the nonresponse and refusal categories contain a more
heterogeneous mix of units. Another source is that we did not leverage the full panel nature of the
data when construc ng the “prior waves” variables. In par cular, for the 2019 wave, our models only
used the response status and contact history informa on from the 2017 wave; a be er approach
would be to construct features based on both the 2015 and 2017 waves. For 2021, we will have
three waves of prior data and would be able to leverage the richer history for be er predic on.
Second, prior to the experiment, we will delve more deeply into the unit-level predic ons that generate the overall accuracy measures. For instance, which units are consistently flagged by all classifiers
as having a high risk of nonresponse or refusal, versus which units’ predic ons are less stable across
classifiers? How do the accuracy metrics vary by region? Ques ons like these can help pave the way
for analy c decisions in the proposed experiment like whether to use a single classifier or whether,
for instance, to use classifiers for different regions that perform well in those regions.
Finally, due to the focus on predic on and analy c challenges with including weights in the classifiers’ es ma on procedures,21 the present results do not reweight the data. We may want to
weight the data so that observa ons weighted more heavily via the AHS’ weigh ng procedure are
also weighted more heavily in the loss func ons for each model.

4 Pa erns of Par al Response
Beyond binary classifica ons of units as “responders” and “nonresponders” in a given wave of the
AHS, we can also classify units according to how their response status changes over me, either
between waves or within the survey itself. The present sec on focuses on two forms of “par al
response.” First, our analysis of item-level missingness explores why, within a survey wave, some
households complete enough of the survey to count as a responder but fail to complete many quesons on the survey (Sec on 4.1). Second, our analysis of panel a ri on analyzes why, between
waves, units which respond one year drop out in subsequent waves (Sec on 4.2).

4.1 Characterizing item-level missingness: item’s content versus item’s order
Background
The AHS uses two methods to treat missing values:
1. The majority of variables for which there is item-level missingness have values imputed, with
an ancillary variable then created, the “imputa on flag” variable, that indicates which responders have imputed values for the respec ve variable. The main variable then contains these
imputed values.
2. A smaller subset of variables is not imputed, and the main variable contains missing values.
Figure 10 shows the top 20 items with the most imputa on.22 Figure 11 shows the top 20 items,
among those not imputed, that have the highest rate of nonreport. Focusing on items with high rates
21. In par cular, sklearn classifiers vary in whether they accept a sample_weights argument, making it more straigh orward to first es mate a range of classifiers and then choose the top-performing one that also accepts survey weights.
22. This was calculated by (1) looking at variables that have the J prefix indica ng an edit flag and (2) looking at the
propor on of responses in the 2019 IUF file for responders that have a value of 2 for that edit flag variable.
h ps://oes.gsa.gov

27

of nonreport, we see some pa erns like poten ally-sensi ve items about neighborhood safety or
financial challenges.23 For instance, the following high missingness items might be more sensi ve:
1. NHQPCRIME: Agree or Disagree: This neighborhood has a lot of pe y crime
2. NHQSCRIME: Agree or Disagree: This neighborhood has a lot of serious crime
3. NUMMEMRY: Number of persons living in this unit who have difficulty concentra ng or remembering
Figure 10: Top 20 items with most missingness, as indicated by edit flag variables The figure illustrates the
top 20 items from the 2019 IUF with the highest rate of edi ng using the above criteria. We exclude items
that are edited for over 50 percent of responders on the grounds that these items likely reflect constructed
variables that reflect imputa on due to that construc on process rather than imputa on due to respondents
not answering. Some variables—like YRBUILT AND YRBUILT_IUF—represent the public use version of the
variable and the IUF version (in this case, yrbuilt is a more aggregated categorical value of yrbuilt_iuf for
disclosure reasons).
FINCP
YRBUILT_IUF
YRBUILT
WATERAMT
TRASHAMT
MARKETVAL
REMODAMT
HHRACE
MILHH
RENT
MGRONSITE
HSHLDTYPE
HOTWATER
HHGRAD
ACPRIMARY
DRYER
HHMAR
ACSECNDRY
COOKFUEL
MVG1TEN
0.00

0.05

0.10

Proportion responders
with edit

0.15

23. We also see others like interview mode and language that might be not reported for more survey administra onbased reasons
h ps://oes.gsa.gov

28

Figure 11: Top 20 items with most missingness, as indicated by not reported values on actual variables The
figure illustrates the top 20 items from the 2019 IUF with the highest rate of “Not reported.”
DWNPAYPCT
NHQSCHOOL
UNITSIZE
NHQPUBTRN
DWNPAYSRC
NHQPCRIME
INTLANG
INTMODE
NHQSCRIME
NHQRISK
NEARBARCL
LANDLINE
FS
FIRSTHOME
GATED
NEARABAND
DISHH
NUMMEMRY
HHMEMRY
HHERRND
0.00

0.05

Proportion responders
not report

0.10

Yet, while “sensi ve” ques ons might have higher rates of responders choosing to not report, there
may be confounding at work. Namely, if the survey is designed to place less essen al or more
sensi ve ques ons at the end, and if survey-takers also get more fa gued and inclined to skip as
the survey progresses, the correla ons between item content and item missingness might be confounded by item placement. Put differently, among those who complete enough of the survey to
count as responders, this missigness could stem from two sources:
• Missingness due to the item itself—for instance, sensi ve ques ons having higher missingness;
or
• Missingness due to the item appearing later in the survey, a point at which respondents may
have more survey fa gue and may either be (1) more likely to stop the survey altogether, or
(2) complete the survey but skip more items to reduce me.
To examine these two possible sources of nonreport, we conduct an analysis of the impact of an
item’s order on nonresponse for that item. This analysis focuses on all items for which we can
match the raw survey instrument names to the final analysis names, which includes some of the
items discussed above as well as others we can match.
Methods
For these analyses, we use the 2019 trace file data. Each unit sampled has a text-based trace file
that records the enumerator’s keystrokes as they contact the respondent and move through the
survey items. We parsed the trace files to extract the following informa on:
1. The unit’s iden fier, and
2. The “instrument item name.”
As we discuss below, the instrument name for items is some mes dis nct from the name the item
is later given in the survey. Some mes, there is a 1:1 mapping between an instrument item and
survey item. Other mes, mul ple instrument items are combined to create a single survey item.

h ps://oes.gsa.gov

29

The parsing process was not perfect. In par cular, for 18 units (less than 0.001 percent of the
total occupied interviews included in the analysis), there were issues in how the mestamps were
recorded that led us to exclude them from the analysis.
A er parsing the files, we then created what we call the raw item dura on. Broadly, this is the distance in me (minutes + seconds) between the focal item and the earliest mestamp for a par cular
day for that respondent (respondents can have interviews on mul ple days if they start and stop
the survey). More precisely, raw item dura on is defined as follows, where i indexes a respondent,
k indexes a par cular item, and d indexes a calendar day:
Raw item dura on = mestampidk − min( mestampid )
Figure 12 shows the distribu on of dura ons using this raw measure. We see a bimodal distribu on
that stems from the fact that certain respondents have mul ple interview sessions on the same day.
This fact complicates measuring an item’s dura on from the day-specific minimum.
Figure 12: Distribu on of item-specific rela ve dura ons (raw) The second small peak at closer to 8-10
hours shows that some respondents had mul ple dis nct sessions in the same day.

Due to this challenge, we use a rough measure of the start of the survey—the keystroke indica ng
the ini a on of a new survey.24 We then code what we call a cleaned item dura on where a focal
item is matched to (1) the nearest start of survey keystroke, that (2) is two hours or less away from
that focal item. So if a respondent has two sessions in the same day, which have a mix of overlapping
items (e.g., things asked twice to get a response) and different items, the items will be repeated
within the respondent-day dyad based on the two session starts. Figure 13 shows the distribu on
of rela ve dura ons a er this cleaning.

24. The ac on of “Enter Field” on STARTCP.
h ps://oes.gsa.gov

30

Figure 13: Distribu on of item-specific rela ve dura ons (clean)

While Figure 13 shows the distribu on of dura ons across items and responders/nonresponders,
our strategy for es ma ng the impact or item order on whether the item had a response relies on
within-item varia on in when the item is posed to a par cular respondent. More specifically, we
es mate the following model with linear regression, indexing respondents with i and items with k:

Do not respond to item (1 = yes)ik = α + β1 Rela ve dura onik + γi + δk + ϵik .
Thus, in understanding how the me at which item k is presented to responent i affects the probability of not responding to that item, we use the respondent-specific fixed effect γi to hold constant
the average rate at which people respond to any given item, and the item-specific fixed effect, δk ,
to hold constant the rate at which all respondents across the sample generally respond to that item.
The model, focusing on responders, thus exploits between-responder varia on in when an item
occurs rela ve to the start of the survey for different responders (e.g., due to different skip logic or
whether the respondent completes the survey in one session or mul ple sessions). In addi on to
the rela ve dura on item, we construct the analy c sample of responder-item pairs as follows:
1. Restrict to responders: even though we have trace file data on both responders and some nonresponders, the distribu on shows that, as expected, nonresponders lack a meaningful number
of items with dura ons. In addi on, since our outcome variable depends on the post-edit IUF
file, we lack data on response status for those who might be classified as nonresponders due
to comple ng very few items.
2. Match items between the trace file and the post-edit IUF file: since survey items differ from instrument items, we use the AHS data dic onary as a crosswalk between variable name and
instrument variable name.
3. Code two versions of whether a person responds to a focal item: one version just contains “not
reported”; another version counts nonresponse if either “not reported” or imputed on the edit
flag variable. We also create a separate binary indicator for whether a responder is marked as
not applicable to that item.
h ps://oes.gsa.gov

31

4. We then merge the informa on on the respondent’s survey item response status to the rela ve
dura on of that item for that par cular respondent25
5. Since we do not observe all items in the trace file for each responder, we create two versions of
the dura on variable: one with the values from above; the other that imputes respondents
missing dura on for a par cular item to the mean dura on for that item. We filter out itemrespondent pairs for which the response was “not applicable,” under the logic that these items
might have higher missingness of dura ons and that not applicable might reflect skip logics
rather than affirma ve responses or ac ve decisions to skip.
We es mate the regression using the felm func on in R’s lfe package, which helps with efficient
es ma on given the large number of respondent-specific fixed effects. For comparison, we also
es mate models with responder fixed effects only.
If the coefficient on β1 is significant and posi ve, it means that respondents are less likely to respond
to items later in the survey.26 Figure 14 shows how we have sufficient within-respondent varia on
in rela ve dura on to iden fy an effect. Appendix Table 10 shows the 120 items included in the
analysis and their mean dura on. To validate a few with reference to the AHS item booklet:
• ENTRYSYS: whether mul family household has entry system; ranked 2 in inferred item order
and is on page 14 of the item booklet. This makes sense given items in the item booklet that
are labeled in the trace file in ways that make matching with the survey instrument difficult.
• GARAGE: presence of garage; ranked 18 in inferred item order and is on page 51 of the item
booklet
• NHQSCRIME: measure of percep ons of serious crime discussed earlier; ranked 100 in inferred
item order and is on page 224 of the item booklet
Figure 14: Between respondent varia on in rela ve dura on for the same item The x axis contains each of
the items used in the dura on analysis. They are ordered by their mean dura on across respondents. The box
plot shows substan al between-respondent varia on in when exactly the item was posed to the respondent.

25. Since a given survey item might be comprised of mul ple instruments, where that occurs, we take the max dura on
across instrument items for a given survey item.
26. Since our outcome variable is nonresponse to the item.
h ps://oes.gsa.gov

32

Overall, the parsing shows how some items that might be more sensi ve like the percep ons of
neighborhood items are also towards the end of the survey instrument—but that we have sufficient
between-respondent varia on in item placement to look at the causal effect of ordering.
Results
We show results from two models, each with two specifica ons. First is a model that only includes
fixed effects for the responder—this is meant to net out responder-specific propensi es of skipping
certain items or ending the survey at a certain point. The es mate on dura on for this model thus
reflects a mixture of an item’s order and its intrinsic content. Second is the model specified earlier
that supplements the responder fixed effects with item fixed effects—the causal effects of the item’s
order in this model are iden fied solely off of an item’s rela ve dura on for responder i compared
to other responders. For instance, if two responders each receive the item about percep ons of
serious crime in the neighborhood, but one responder receives the item 35 minutes into the survey
based on their speed and skip logics; another responder 39 minutes into the survey, if the 39-minute
respondent is less likely to respond than the 35 minute respondent, that would be evidence of an
effect of dura on net of the item’s content and general survey placement.
Table 5 shows the results, with all models predic ng nonresponse so a posi ve coefficient indica ng
that higher dura on is associated with a higher likelihood of nonresponse. We see that results
from the model with respondent fixed effects are in the expected direc on—ne ng out general
respondent propensi es to respond, we see that items with a higher rela ve dura on are more likely
to be not responded to. But the model with both item and respondent fixed effects, which analyzes
order effects only off of between-respondent varia on in when a par cular item was reached, does
not show that pa ern. Further inves ga on is needed, but the analysis shows generally that the
dual placement of poten ally-sensi ve items at the end of the survey might lead to nonresponse
due to a mix of item content and order, since the rela ve dura on of those items among the same
respondents does not produce results in the expected direc on.
Table 5: Effect of item order on item-level nonresponse All models (1) subset to only respondents in the
2019 wave, (2) exclude respondent-item dyads for which “not applicable” was the variable’s value. We see
that impu ng dura on for items missing from a respondent’s trace file (either actual missing or poten ally due
to parsing challenges) does not substan ally change the results. Instead, the main change is from contras ng
the respondent fixed effects model with the respondents + item fixed effects model.
Model
Respondent FE
Respondent FE
Respondent + item
FE
Respondent + item
FE

Treatment of items
missing dura on
Listwise
Impute to mean item
dura on

Coefficient

p value

0.000334500

p < 0.001

0.000542200

p < 0.001

Listwise

-0.000414800

p < 0.001

Impute to mean item
dura on

-0.000274700

p < 0.001

4.2 Predic ng panel a ri on
Background
In this analysis, we leverage the longitudinal nature of the AHS to shed light on what kinds of units
drop out of the panel. Specifically, we look at which variables in the 2015 AHS best predict respondent refusal in the 2017 AHS. We focus on refusal because it has a clear behavioral dimension and
is the main reason for noninterviews in the 2017 AHS (70 percent of noninterviews were due to
h ps://oes.gsa.gov

33

refusal).
Unlike the other analyses presented in this memo, we are able to predict refusal here using the full
set of variables measured in the AHS, since we are interested in the 2017 behavior of people in
units where an interview was conducted in 2015.27 There are hundreds of categorical and numeric
variables to choose from in the 2015 AHS. We therefore rely on an automated procedure that idenfies the best (linear) predictors of 2017 refusal, called a penalized lasso regression (see Sec on 3
above for a longer descrip on of this procedure).
Methods
We begin by restric ng the sample to occupied interviews in the 2015 na onal survey.28 For categorical variables, we create one dummy variable corresponding to each level, including one that
indicates whether the response was missing or inapplicable for that ques on. Missing responses
pose a bigger problem for numeric variables. For those, we impute missing values using the average of the non-missing responses, and include in the set of poten al predictors a dummy variable
for each numeric variable, indica ng whether mean imputa on was employed. Finally, we code a
variable that indicates the propor on of all predictors that were flagged as edited in the AHS (i.e.,
the propor on of the so-called “J” variables that was not equal to zero for a given respondent).
With the cleaned set of predictors in hand, we have a total of 463 possible predictors of 2017
refusal. We weight observa ons by the composite weight variable in all analyses, which adjusts
for nonresponse bias in a given wave, but does not account for wave-on-wave pa erns. We use
the lasso variable selec on procedure discussed in Sec on 3, though implement it using glmnet in
R. As with all such analyses, an issue that arises is what penalty to apply to the addi on of each
predictor in the model. Here, we simply show how the model changes as we change the penalty (λ),
and locate the penalty in a range that contains a sharp discon nuity in the number of parameters
included.
Results
Figure 15 plots the different models that result from applying an increasingly stronger penalty, λ,
for including addi onal predictors. For example, if we set λ = 0.014, the lasso regression drops all
variables except INTMONTH8 (an indicator for whether the 2015 interview took place in August) and
HHAGE (the age of the householder) in its search for the model that best predicts 2017 refusal. The
ver cal do ed line at 0.005 indicates the level of penalty chosen for this analysis, because this level
appears to represent a sharp discon nuity in the number of variables selected.

27. Note that this is a simplified way to refer to people within units. Given that the AHS is a survey of housing units
and not households, the people residing within a unit may change between waves. Even s ll, the characteris cs of the
people responding in one wave can be predic ve of the unit responding in a separate wave.
28. For these analyses, we use the Public Use File (PUF) combined with the public case history file. This means that the
universe of variables is the PUF-only variables rather than PUF variables + the IUF-only variables.
h ps://oes.gsa.gov

34

Figure 15: Predictors included in the lasso model as a func on of the severity of the penalty for including
addi onal predictors. The ver cal axis lists candidate predictors of 2017 a ri on contained in the 2015 AHS,
in descending order of their probability of inclusion for a given penalty. The horizontal axis lists various levels
of λ used to fit successive lasso regression models. As the penalty increases, so too does the likelihood that
variables will be excluded or “zeroed out” from the model. Each point indicates that a predictor was included
at a given level of λ . The ver cal line indicates the level of λ used to fit the regression model discussed
further below.

Predictor included

PAINTPEEL2
NHQRISK2
NEARABAND4
NEARABAND1
MOLDBATH1
MILHH3
HHSEE2
HHMAR6
HHERRND1
HHCITSHP5
HEATFUEL4
DRYER5
DRYER4
COOKFUEL5
BLD2
ACSECNDRY10
ACPRIMARY12
RATINGHS
NUMHEAR2
HHRACE12
HHMEMRY2
HHNATVTY20
ROACH5
NUMPEOPLE
HHSPAN1
HHCITSHP1
GARAGE1
RATINGNH
NUMELDERS
NHQSCHOOL1
DIVISION7
DIVISION2
RODENT5
INTMONTH7
HHNATVTY92
DISHWASH2
MOLDBASEM2
HHCARE2
MOLDOTHER2
INTMODE2
DISHH1
INTMONTH5
FIREPLACE3
INTMONTH8
HHAGE
0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014

Lambda (penalty for including additional predictors)

Using the variables indicated as selected with squares on the ver cal line on Figure 15, we fit a
weighted linear model predic ng 2017 refusal. To es mate variance, we employ the standard replicate weights.
The first test described in our pre-analysis plan is an F-test of whether adding these variables produces a sta s cally significant improvement in our ability to predict 2017 refusal. Intui vely, the
F-test answers the ques on: given that adding any variables to a model will improve its predic ve
accuracy just due to chance correla ons, what is the probability that we would see an improvement
in predic ve accuracy as large as the one we do observe if none of the variables were actually related to 2017 refusal? The p-value indicates that this probability is very low (p < .0001). In other
words, adding these 28 predictors produces a sta s cally significant improvement in our ability to
predict refusal. This cons tutes prima facie evidence that 2017 refusal is systema cally related to
characteris cs of units measured in 2015.
When characterizing which variables do a good job of predic ng whether 2015 responders drop
h ps://oes.gsa.gov

35

out in 2017, some caveats are in order. First, we cannot be sure that the respondent who answered
the survey in 2015 is the same person who refused the survey pertaining to that unit in 2017—it is
possible that in many cases the respondent has changed, and we are predic ng turnover between
different residents of the same unit as much as we are predic ng dropout of the same residents.
Second, we must be careful in drawing causal inferences. Correla ons between responses in 2015
and refusal in 2017 in many cases will arise due to unmeasured common causes.
Table 6 presents the 15 predictors that the lasso is least likely to drop as the penalty increases (e.g.,
those at the bo om of Figure 15).
Table 6: Fi een predictors of 2017 survey refusal among 2015 respondents that are least likely to be
dropped by the lasso. A subset of coefficients es mated through lasso regression. The model uses variables from the 2015 AHS to predict refusal in the 2017 wave. For a given penalty level, lasso regression
selects the subset of predictors that trade off improvements in predic ve accuracy against a penalty incurred
by increasing the number of predictors in the model. In theory, if the penalty is set correctly, the algorithm
will include the minimal subset of variables that do a good job of predic ng the outcome, and will exclude
those that do not add to the predic ve accuracy, either because they are redundant (collinear with alreadyincluded variables), highly correlated with a variable that is chosen, or do a poor job of predic ng. This model
uses the model corresponding to the λ penalty indicated with a ver cal line on Figure 15 (0.005). Standard
errors and p-values are derived from the composite replicate weights produced by the Census Bureau, and
employ Fay’s BRR method (see footnote 8 above).
term
HHAGE
INTMONTH8
FIREPLACE3
INTMONTH5
DISHH1
INTMODE2
MOLDOTHER2
HHCARE2
MOLDBASEM2
DISHWASH2
HHNATVTY92
INTMONTH7
RODENT5
DIVISION2
DIVISION7

es mate
-0.001
0.040
0.153
-0.010
-0.009
-0.011
-0.028
-0.021
-0.015
-0.014
-0.010
0.019
0.025
0.029
-0.019

std.error
0.000
0.006
0.044
0.004
0.005
0.004
0.021
0.007
0.020
0.003
0.008
0.005
0.005
0.006
0.005

sta s c
-5.926
7.076
3.516
-2.450
-1.896
-2.738
-1.319
-3.149
-0.779
-4.026
-1.369
4.146
5.235
5.161
-3.908

p.value
0.000
0.000
0.001
0.016
0.060
0.007
0.190
0.002
0.437
0.000
0.173
0.000
0.000
0.000
0.000

Turning to the first coefficient, HHAGE, we see that age is nega vely correlated with the probability of
refusal. As depicted on 16, the rela onship is approximately linear: as the age of the householder
interviewed in 2015 decreases, so too does the probability of that household refusing to do the
survey in 2017.

h ps://oes.gsa.gov

36

Figure 16: Households with young respondents in 2015 are much more likely to refuse in 2017. Each point
indicates a weighted es mate of the propor on of 2017 refusers (ver cal axis) for each year of age bin (horizontal axis) for householders in the 2015 AHS. The size of each point corresponds to the sample size of
responders in 2015. The line is a linear least squares regression slope.

Average rate of refusal
in 2017

20.0%

15.0%

N Interviewed
in 2015
500
1000
1500
2000

10.0%

5.0%

0.0%
20

40

60

80

Age of householder interviewed in 2015

As described above, variables labeled INTMONTH on Table 6 are binary indicators for whether the
2015 survey was conducted in the month corresponding to the final integer. The bivariate linear
rela onship between 2015 interview month and 2017 refusal is depicted on Figure 17.
Figure 17: Units that were interviewed later in the 2015 round of surveying are much more likely to refuse
in 2017. Each point indicates a weighted es mate of the propor on of 2017 refusers (ver cal axis) for each
2015 month of interview bin (horizontal axis). The size of each point corresponds to the sample size of
responders in 2015. The line is a linear least squares regression slope.

Average rate of refusal
in 2017

20.0%

15.0%

N Interviewed
in 2015
5000
10000
15000

10.0%

5.0%

0.0%
April

May

June

July

August SeptemberOctober

Month interviewed in 2015

By June, two-thirds of the 2015 sample had already been interviewed. Roughly 10 percent of those
units would have refusing respondents two years later. The rate of refusal is higher for those interviewed in July, at 13 percent, but not substan ally above average. Those remaining two percent of
units whose respondents were interviewed in the final months of the 2015 survey, however, exhibit

h ps://oes.gsa.gov

37

a very high likelihood of 2017 refusal. One obvious explana on is that the respondents who are
interviewed late in the survey are those who are the most unavailable: it is then quite unsurprising
that, when those same people are sought out two years later, they are s ll hard to contact or just
refuse to be interviewed.29
The coefficients on DIVISION2-7 on Table 6 indicate that 2017 refusal rates also vary by the geographic area in which the AHS is conducted. Looking at the raw data, refusal rates are highest in the
Mid-Atlan c (13 percent), New England (12 percent), and East North Central (12 percent) Census
divisions, and lowest in the West South Central division (9 percent).
The other coefficients do not present rela onships that are quite as clear, and some appear to be
the result of sparse categories happening to capture many 2017 refusers or non-refusers.30 Briefly,
though, the lasso suggests 2017 refusal is more likely in houses that, in 2015, had: no people with
disabili es living in them (DISHH1 nega ve); mold (MOLDOTHER2 and MOLDBASEM2 nega ve); householders who have difficulty dressing themselves (HHCARE2 nega ve); no dishwasher (DISHWASH2 nega ve); no problems with rodents (RODENT5 posi ve).
More inves ga on into the causes of panel a ri on is encouraged. However, from this analysis, it
is clear that even without a clear understanding of mechanisms, there are systema c pa erns to
units dropping out of the panel, which implies nonresponse bias.

4.3 Sec on Summary
This sec on explores how nonresponse bias can find its way into a sample of already-responding
units. Ques ons that are par cularly sensi ve—such as those pertaining to the amount of crime in
the neighborhood—are most likely to go unanswered by responders. We do not find strong evidence
that the placement of ques ons later in the survey leads to lower likelihood of answers. Turning our
a en on to the ques on of which 2015 responders drop out in 2017, we find units with younger
householders interviewed later in the 2015 survey were most likely to drop out in 2017. A host
of other characteris cs measured in the 2015 survey are also associated with the probability of
dropping out, but no clear pa ern emerges.

5 Consequences of Nonresponse
Stakeholders within and without government use the AHS to generate insights that can feed into
important regulatory and investment decisions. This sec on discusses some consequences of the
pa erns of nonresponse analyzed in this report for applied researchers using the AHS to inves gate
substan ve ques ons.
29. However, there are other explana ons for this trend that may be worth exploring. One explana on could be a
shared scheduling structure between waves—if interviewers, for instance, schedule interviews for the ”inner core” of a
metropolitan area first and schedule interviews for the ”outer suburbs” later, it might be that units are both interviewed
later in the first wave and then refuse in the later wave because they are scheduled for a me when there is less me
for follow up before the end of the closeout period. Figure 25 in the appendix inves gates this hypothesis, focusing on
whether there is between-region varia on in interview ming that might point to this form of confounding. The figure
shows no clear differences in the distribu on of 2015 interviews across months by region, which goes against the idea that
respondents in certain regions are both more likely to refuse and are scheduled later. Future analyses could inves gate
within-region varia on in scheduling as an explana on.
Alterna vely, refusal rates may be driven by some kind of interviewer selec on, whereby interviewers put ‘harder’
cases lower on their list of places to visit so respondents in these units are perhaps not harder to find but were less likely
to be targeted. We do not have a level of effort measure in these data and so leave this as a topic for future explora on.
30. For example, the coefficient on FIREPLACE3 indicates that the approximately 1 percent of households whose useable
fireplaces may or may not be hea ng equipment in 2015 are 15 percentage points more likely to refuse in 2017.
h ps://oes.gsa.gov

38

5.1 How panel a ri on affects correla onal analysis
Background
If a ributes of both the householder (e.g., age) and housing unit (e.g., mold, rodent infesta ons) in
2015 can help us predict whether or not a household refuses to be interviewed in the 2017 wave
(see Sec on 4.2), what consequences does this entail for analyses?
One way to address this ques on is to inves gate how a ri on changes correla ons that researchers might be interested in examining. For our working example, suppose a researcher is
interested in examining the rela onship between household income and housing inadequacy. They
have a hypothesis that more affluent households are less likely to live in inadequate housing condions. The researcher might be interested in using the mul -wave structure of the AHS to assess
this rela onship, either to (1) increase their power to examine a rela onship by pooling mul ple
waves, or (2) explore how the rela onship changes over me (e.g., whether improved oversight of
rental housing condi ons might be associated with a fla er income-adequacy rela onship).
Focusing on the second, if households with a certain combina on of a ributes is more likely to a rit
than others—e.g., low-income households living in adequate housing being more likely to a rit than
low-income households living in inadequate housing—this nonrandom a ri on causes par cular bias
for inves ga ng longitudinal trends.
Methods
To assess this form of bias, we use the Becke , Gould, Lillard and Welch (BGLW) pooling test to
explore poten al bias caused by a ri on between the two panels. In the main text analysis, we
focus on exploring varia on between two groups: respondents interviewed in 2015 who respond
in 2017 and respondents interviewed in 2015 who refuse an interview in 2017.31 We examine
the rela onship between the household total income (HINCP) and whether the respondent lives
in inadequate housing.32 We also control for the respondent’s region, which the previous sec on
showed is a significant predictor of refusal rates.
Results
Appendix Table 13 shows the results. As expected, households with higher income are significantly
less likely to live in inadequate housing condi ons (nega ve and sta s cally significant coefficient).
Appendix Figure 26 shows the uncondi onal rela onship between housing adequacy and refusal,
showing that a slightly higher propor on of refusers live in adequate housing. Yet, more important
for this test are the interac on terms. We see that, in 2015, respondents who go on to refuse parcipa on in 2017 have a significantly different rela onship between income and housing adequacy
than those who remain in the survey (significant interac on terms). In other words, an analyst looking at this rela onship using the 2017 data may get a different result if all of the units that responded
in 2015 also responded in 2017.33
How might significant interac ons between refusal and income affect the inferences an analyst
makes about the rela onship between income and housing adequacy? Figure 18 shows the 2015
rela onship between housing adequacy (ver cal axis) and income (horizontal axis) in the Middle Atlan c and South Atlan c divisions. Blue lines correspond to units who responded both in 2015 and
31. These are based on the NOINT variable in the case history file.
32. More specifically, we used the ADEQUACY variable and constructed a binary measure of the unit not being adequate
if the response was either moderately or severely inadequate. The regression presents income scaled by $10,000 for the
purposes of interpre ng coefficients; the predicted values present the non-scaled version.
33. An F-test that looks at whether adding the interac on terms between a ri on and each variable produces significant
improvements over a model with main effects for each variable and no interac ons with a ri on (p = 0.03).
h ps://oes.gsa.gov

39

in 2017 while yellow lines indicate those who responded in 2015 but refused in 2017. The main
takeaway from this graph is that, in the Middle Atlan c division, the 2015 rela onship between
housing adequacy and income looks very different among those who respond in both waves compared to those who respond in 2015 only, whereas in the South Atlan c the rela onship is much
more similar. In the Middle Atlan c region, among those who did not a rit from the survey, there
is a clear nega ve rela onship: those with higher incomes are less likely to live in inadequate housing. Among those who a rit from the survey, the rela onship is much fla er. In the South Atlan c
region, there are no clear differences in adequacy between a ritors and non-a ritors with similar
income levels. Researchers who restrict an analysis of longitudinal trends to households that appear
in both waves would essen ally only es mate the blue line, ignoring the yellow. This would overstate the rela onship between income and adequacy. The composite AHS survey weights would
not necessarily correct for this bias, as they do not include informa on on income in the reweigh ng
scheme, and likely do not reweight for par al a ri on between panels.
Figure 18: Predicted inadequacy by income in 2015: respondents who then refuse in 2017 versus respondents who respond both waves The analysis constucts a binary measure of inadequacy from the broader
three-level adequacy variable. It is restricted to occupied interviews and refusers.
Middle Atlantic

South Atlantic

0.10

0.05

14

12

10

8

6

4

2

14

12

10

8

6

4

0.00
2

Predicted inadequate housing
(higher = inadequate)

0.15

Household income
(scaled by $10,000; 10th to 90th percentile in region)
Int. 2015; int. 2017
Int. 2015; refused 2017

h ps://oes.gsa.gov

40

5.2 How nonresponse affects metro-level es mates
Background
Finally, one of the core uses of the AHS is to derive accurate metropolitan-level es mates of certain
important housing stock features. Here, we inves gate the extent to which 2015 AHS es mates
diverge from the 2010 Decennial popula on count at the metropolitan area level.
Methods
See sec on 2.1 above for the methods employed in the na onal-level benchmarking analysis. Here,
we apply the same method at the metropolitan area level. For illustra ve purposes, we restrict a enon to two variables that appeared to diverge strongly in the na onal-level analysis: the propor on
of householders es mated to own their house while owing a loan or mortgage and the propor on
of householders who iden fy as white alone.
Results
The comparison of metropolitan-level divergences reveals an interes ng pa ern. Es mates of the
propor on of householders who own their house while owing a loan or mortgage consistently underrepresent the Census count across metropolitan areas. When it comes to the count of white
householders, however, the divergences vary by state. In Arizona, California, and Texas, white people are overrepresented in the AHS rela ve to the Decennial Census—in some cases by up to 15
percentage points—whereas in most other areas there is no sta s cally significant divergence. As
with the prior analyses, we cau on that we may be missta ng the true magnitude of bias due to
differen al demographic changes across regions.
Figure 19: Metro-level divergences between 2015 AHS es mates and 2010 Decennial Census counts of
the propor on of householders who own their house with a mortgage or loan owing and of the propor on
of householders who iden fy as white alone. See note on Figure 1.
% own house (mortgage / loan)

% white alone

Phoenix-Mesa-Glendale, AZ Metro Area

0.5

0.79

Los Angeles-Long Beach-Santa Ana, CA Metro Area

0.4

0.59

Riverside-San Bernardino-Ontario, CA Metro Area

0.52

0.66

San Francisco-Oakland-Fremont, CA Metro Area

0.42

0.59

Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area

0.54

0.59

Miami-Fort Lauderdale-Pompano Beach, FL Metro Area

0.45

0.75

Atlanta-Sandy Springs-Marietta, GA Metro Area

0.54

0.59

Chicago-Joliet-Naperville, IL-IN-WI Metro Area

0.5

0.7

Boston-Cambridge-Quincy, MA-NH Metro Area

0.46

0.83

Detroit-Warren-Livonia, MI Metro Area

0.5

0.72

New York-Northern New Jersey-Long Island, NY-NJ-PA Metro Area

0.36

0.64

Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area

0.5

0.72

Dallas-Fort Worth-Arlington, TX Metro Area

0.46

0.69

Houston-Sugar Land-Baytown, TX Metro Area

0.44

0.64

Seattle-Tacoma-Bellevue, WA Metro Area

0.49

-0.15 -0.10 -0.05

0.00

0.78

0.05

0.10

0.15

0.20-0.15 -0.10 -0.05

0.00

Difference between 2010 Census and 2015 AHS
(Positive indicates overrepresentation)

Size of AHS Sample:

2100
2300

h ps://oes.gsa.gov

2200
2400

Weighting by:

0.05

0.10

0.15

0.20

Sampling weight only
(BASEWGT)
Bias-adjusted weights
(WEIGHT)

41

5.3 Sec on Summary
The results of this sec on suggest nonresponse bias present in the AHS may affect key sta s cs,
even with the use of weights designed to address nonresponse bias. We conducted an analysis
of how panel a ri on affects es mates of important correla ons such as that between income
and housing adequacy. Among units that responded in 2015, those that also responded in 2017
exhibit a very different rela onship between income and adequacy than those who dropped out.
This dis nc on is par cularly sharp in the Middle Atlan c. Any analysis of longitudinal trends that
restricted a en on to units who respond in all waves of the panel would consequently overes mate
the nega ve rela onship between income and adequacy, even when employing weights. Similarly,
metropolitan-level es mates from the 2015 AHS differ from the 2010 Census in ways that ma er
more for some regions and for some variables than for others. Whereas those who own a house with
a mortgage or loan owing are consistently undercounted in all metropolitan areas, the propor on of
non-white respondents is most severely undercounted in metropolitan areas located in the states
of California, Arizona, and Texas.

h ps://oes.gsa.gov

42

Conclusion
This memorandum has described several methods for characterizing nonresponse bias. Among the
conclusions are that: the AHS fails to reproduce popula on features from the 2010 Census and that
the characteris cs of responding and nonresponding units are different to an extent that cannot be
explained by chance. Taken as a whole, the analyses documented in this memorandum demonstrate
strong evidence that nonresponse is systema cally related to the characteris cs of housing units
and the respondents living within, which is evidence of nonresponse bias. Our analysis suggests
that the nonresponse adjustment factors u lized to produce popula on es mates help to correct
for issues of nonresponse bias, but do not completely mi gate the problem. The evidence produced
in this document suggest the AHS could be strengthened with efforts designed to increase the
representa veness of the responding units. This does not call for an increase in overall response
rates but instead calls for efforts to increase the response rate especially among units that are currently
underrepresented. It is encouraging that our models for predic ng nonresponse perform well. This
suggests that interven ons can be designed to target specific units, induce a higher response rate
among such units, and ul mately create a stronger, more reliable survey product.

h ps://oes.gsa.gov

43

References
Lewis, Taylor. 2015. “Replica on Techniques for Variance Approxima on.” SAS Support Paper, nos.
2601-2015.
Maitland, Aaron, Amy Lin, David Cantor, Mike Jones, Richard P Moser, Bradford W Hesse, Terisa
Davis, and Kelly D Blake. 2017. “A nonresponse bias analysis of the Health Informa on Na onal
Trends Survey (HINTS).” Journal of health communica on 22 (7): 545–553.
Schouten, Barry, Fannie Cobben, Jelke Bethlehem, et al. 2009. “Indicators for the representa veness
of survey response.” Survey Methodology 35 (1): 101–113.
U.S. Census Bureau and Department of Housing and Urban Development. 2018. 2015 AHS Integrated Na onal Sample: Sample Design, Weigh ng, and Error Es ma on. Technical report. https:
//www2.census.gov/programs-surveys/ahs/2015/.

h ps://oes.gsa.gov

44

A Appendix
A.1 Addi onal results from the chi-squared analysis
Table 7: P-values from chi-squared analysis of differences between responders and nonresponders All differences are significant at the p < 0.001 level.
var_tomerge
DIVISION
DIVISION
DIVISION
DIVISION
DIVISION
DIVISION
DIVISION
DIVISION
DIVISION
FL_SUBSIZ
FL_SUBSIZ
HUDSAMP
HUDSAMP
METRO_2013
METRO_2013
METRO_2013
METRO_2013
REGION
REGION
REGION
REGION
RENTSUB
RENTSUB
RENTSUB
RENTSUB
RENTSUB
RUCC_2013
RUCC_2013
RUCC_2013
RUCC_2013
RUCC_2013
RUCC_2013
RUCC_2013
RUCC_2013
RUCC_2013
SPSUTYPE
SPSUTYPE
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT
WPSUSTRAT

level_lab_cleaner
East North Central
East South Central
Middle Atlan c
Mountain
New England
Pacific
South Atlan c
West North Central
West South Central
No
Yes
No
Yes
Metro, Central City
Metro, Not Central City
Micropol.
Non Micropol.
Midwest
Northeast
South
West
Missing
No rental subsidy or reduc on
Other government subsidy
Public housing
Rent reduc on
Completely rural; metro adj.
Completely rural; non-metro adj.
Metro. county (<250k)
Metro. county (1+ mil)
Metro. county (250k-1mil)
Urban <20k; metro adj.
Urban <20k; non-metro adj.
Urban 20k+; metro adj.
Urban 20k+; non-metro adj.
Not self-rep
Self-rep
CI record
HUD records
Mobile home
Other
Other
Owners; 1 unit
Owners; 2+ unit
Renters; 1 unit
Renters; 2+ unit
Vacant; 1 unit
Vacant; 2+ unit

h ps://oes.gsa.gov

diffRNR_2019
0.0060
0.0060
-0.0169
-0.0224
-0.0147
0.0262
0.0404
-0.0086
-0.0161
-0.0016
0.0016
-0.0016
0.0016
-0.0033
-0.0028
-0.0074
0.0135
-0.0026
-0.0316
0.0304
0.0039
0.0087
-0.0967
0.0174
0.0247
0.0056
0.0013
-0.0009
-0.0046
-0.0115
0.0101
0.0124
-0.0050
-0.0001
-0.0016
0.0136
-0.0136
0.0000
0.0026
-0.0011
-0.0207
-0.0050
0.0172
0.0037
-0.0013
0.0000
0.0037
0.0009

p_forprint
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001
p < 0.001

45

Figure 20: Differences between responders and nonresponders: 2017 wave.
DIVISION

FL_SUBSIZ

HUDSAMP

West South Central
West North Central
South Atlantic

Yes

Yes

No

No

Pacific
New England
Mountain
Middle Atlantic
East South Central
East North Central
METRO_2013

REGION

Non Micropol.

West

Micropol.

South

RENTSUB
Public housing
Portable voucher
Other government subsidy

Metro, Not Central City

Northeast

Metro, Central City

Midwest

No rental subsidy or reduction
Missing

RUCC_2013

SPSUTYPE

WPSUSTRAT
Vacant; 2+ unit
Vacant; 1 unit
Renters; 2+ unit
Renters; 1 unit
Owners; 2+ unit
Owners; 1 unit
Other
Mobile home
HUD records
CI record

Urban 20k+; non-metro adj.
Urban 20k+; metro adj.
Self-rep

Urban <20k; non-metro adj.
Urban <20k; metro adj.
Metro. county (250k-1mil)
Metro. county (1+ mil)
Metro. county (<250k)

Not self-rep

Completely rural; non-metro adj.
Completely rural; metro adj.
-0.10

-0.05

0.00

0.05

-0.10

-0.05

0.00

0.05

-0.10

Difference in proportions
between responders and nonresponders
(positive = overrepresented in responders)
unweighted

-0.05

0.00

0.05

weighted

Figure 21: Differences between responders and nonresponders: 2015 wave.
DIVISION

FL_SUBSIZ

HUDSAMP

West South Central
West North Central
South Atlantic

Yes

Yes

No

No

Pacific
New England
Mountain
Middle Atlantic
East South Central
East North Central
METRO_2013

REGION

Non Micropol.

West

Micropol.

South

Metro, Not Central City

Northeast

Metro, Central City

Midwest

RENTSUB
Rent reduction
Public housing
Portable voucher
Other government subsidy
Non-portable voucher
No rental subsidy or reduction
Missing

RUCC_2013

SPSUTYPE

WPSUSTRAT
Vacant; 2+ unit
Vacant; 1 unit
Renters; 2+ unit
Renters; 1 unit
Owners; 2+ unit
Owners; 1 unit
Other
Mobile home
HUD records
CI record

Urban 20k+; non-metro adj.
Urban 20k+; metro adj.
Self-rep

Urban <20k; non-metro adj.
Urban <20k; metro adj.
Metro. county (250k-1mil)
Metro. county (1+ mil)
Metro. county (<250k)

Not self-rep

Completely rural; non-metro adj.
Completely rural; metro adj.
-0.10

-0.05

0.00

0.05

-0.10

-0.05

0.00

0.05

Difference in proportions
between responders and nonresponders
(positive = overrepresented in responders)
unweighted

h ps://oes.gsa.gov

-0.10

-0.05

0.00

0.05

weighted

46

A.2 Addi onal results from R indicator analysis
Wave

Es mated R

2015
2017
2019

0.90
0.92
0.90

permuta on LRT Sta s- LRT p-value
p-value
c
0
1291
0.00
0
3228
0.00
0
5964
0.00

Table 8: Results from R-indicator analysis.

A.3 Addi onal results from the predic ng nonresponse and refusal analysis
Table 9: Tract-level predictors from the American Community Survey The first set of predictors (ACS 100
to 199, 1500 to 1999, etc.) represent monthly housing costs. Other predictors reflect race/ethnicity, educaonal a ainment, and housing costs as a propor on of income.
feature
acs_100_to_199_prop
acs_1500_to_1999_prop
acs_200_to_299_prop
acs_2000_or_more_prop
acs_300_to_399_prop
acs_400_to_499_prop
acs_500_to_599_prop
acs_600_to_699_prop
acs_700_to_799_prop
acs_800_to_899_prop
acs_900_to_999_prop
acs_asian_alone_prop
acs_at_or_above_150_percent_of_the_poverty_level_prop
acs_bachelors_degree_or_higher_prop
acs_black_or_african_american_alone_prop
acs_es mate_median_age_total
acs_es mate_median_household_income_in_the_past_12_months_in_2014_infla on_adjusted_dollars
acs_foreign_born_nonci zen_prop
acs_hispanic_or_la no_prop
acs_in_the_labor_force_unemployed_prop
acs_less_than_100_prop
acs_less_than_high_school_graduate_prop
acs_living_in_household_with_ssio rsnap_prop
acs_na ve_hawaiian_and_other_pacific_islander_alone_prop
acs_owner_occupied_housing_units_zero_or_nega ve_income_prop
acs_renter_occupied_housing_units_20000_to_34999_20_to_29_percent_prop
acs_renter_occupied_housing_units_20000_to_34999_30_percent_or_more_prop
acs_renter_occupied_housing_units_35000_to_49999_20_to_29_percent_prop
acs_renter_occupied_housing_units_35000_to_49999_30_percent_or_more_prop
acs_renter_occupied_housing_units_50000_to_74999_20_to_29_percent_prop
acs_renter_occupied_housing_units_50000_to_74999_30_percent_or_more_prop
acs_renter_occupied_housing_units_75000_or_more_20_to_29_percent_prop
acs_renter_occupied_housing_units_75000_or_more_30_percent_or_more_prop
acs_renter_occupied_housing_units_less_than_20000_20_to_29_percent_prop
acs_renter_occupied_housing_units_less_than_20000_30_percent_or_more_prop
acs_renter_occupied_housing_units_zero_or_nega ve_income_prop
acs_some_college_or_associates_degree_prop
acs_some_other_race_alone_prop
acs_two_or_more_races_prop
acs_unweighted_sample_count_of_the_popula on

h ps://oes.gsa.gov

47

Figure 22: Ability to predict nonresponse: 2017 wave. The figure shows F1 scores for two types of feature
sets: AHS-only (which includes both sampling frame variables and lagged response/contact a empt variables)
and those plus the ACS contextual features.

logitcv

0.8801

gb_few

0.8799

gb_many

0.8784

0.88

0.8799

0.8785
0.8793

rf_many

0.8769

dt_shallow

0.8768

rf_few

0.8785

0.879

0.8763
0.8752

ada

0.8748
0.8052

dt_deep

0.8486

0.00

0.25

0.50

F1 Score
(closer to 1 = better)
AHS

h ps://oes.gsa.gov

0.75

1.00

AHS +
ACS

48

Figure 23: Ability to predict refusal: 2017 and 2019 wave.
Wave: 2017

Wave: 2019

gb_few

0.9024

0.8695

rf_many

0.9022

0.8693

gb_many

0.9015

0.8694

rf_few

0.9016

0.8686

logitcv

0.9015

0.8685

dt_shallow

0.8995

0.8661

ada

0.8994

0.8645

dt_deep

0.8341

0.00

0.25

0.50

0.75

0.7783

1.00 0.00

0.25

F1 Score
(closer to 1 = better)

0.50

0.75

1.00

AHS +
ACS

Figure 24: Remaining feature importances outside of the top 20: random forest; 2019 wave.
ACS RENTER OCCUPIED HOUSING UNITS 20000 TO 34999
30 PERCENT
OR599
MORE
ACS AND
500 TO
ACS NATIVE HAWAIIAN
OTHER
PACIFIC ISLANDER
ALONE OF THE POVERTY LEVEL
ACS AT OR ABOVE 150 PERCENT

ACS SOME HOUSING
OTHER RACE
ALONE
ACS RENTER OCCUPIED
UNITS
LESS THAN 20000
30ACS
PERCENT
OR
MORE
2000 OR
MORE
ACSHOUSING
400 TO 499
ACS RENTER OCCUPIED
UNITS 50000 TO 74999
ACS RENTER OCCUPIED
UNITS 35000 TO 49999
20 TOHOUSING
29 PERCENT
30 PERCENT
OR399
MORE
ACS 300 TO

ACSHOUSING
200 TO 299
ACS RENTER OCCUPIED
UNITS 35000 TO 49999
ACS RENTER OCCUPIED
HOUSING
UNITS 75000 OR MORE 20
20 TO
29 PERCENT
ACS RENTER OCCUPIED
HOUSING
UNITS 50000 TO 74999
TO 29
PERCENT
30ACS
PERCENT
MORE
ASIANOR
ALONE

ACS
100 TO UNITS
199 ZERO OR NEGATIVE
ACS OWNER OCCUPIED
HOUSING
INCOME
ACS
UNWEIGHTED
SAMPLE
COUNT
OF THE
POPULATION
ACS
RENTER
OCCUPIED
HOUSING
UNITS
ZERO
OR NEGATIVE
ACS RENTER OCCUPIED HOUSING
INCOME UNITS 20000 TO 34999
ACS RENTER OCCUPIED
HOUSING
UNITS 75000 OR MORE 30
20 TO
29 PERCENT
ACS RENTER OCCUPIED
HOUSING
UNITS LESS THAN 20000
PERCENT
OR MORE
20
TO
29
PERCENT
ACS LESS THAN HIGH SCHOOL GRADUATE
WPSUSTRAT 3
WPSUSTRAT 6
WPSUSTRAT 5
UASIZE 22
DEGREE 3.0
HUDADMIN 4.0
WPSUSTRAT 03
METRO 2013 2
UASIZE 23
DEGREE 5.0
METRO 2013 1
UASIZE 0
DEGREE 2.0
DEGREE 4.0
UASIZE 21
WPSUSTRAT 7
WPSUSTRAT 1
WPSUSTRAT 99
WPSUSTRAT 98
HUDADMIN 2.0
INTNMBR PRIOR
WPSUSTRAT 05
WPSUSTRAT 8
WPSUSTRAT 4
HUDADMIN NAN
WPSUSTRAT 2
DEGREE 6.0
WPSUSTRAT 06
UASIZE 18
UASIZE 20
UASIZE 19
METRO 2013 3
DEGREE NAN
WPSUSTRAT 01
WPSUSTRAT 07
DEGREE 1.0
HUDADMIN 3.0
UASIZE 17
WPSUSTRAT 02
UASIZE 14
HUDADMIN 1.0
UASIZE 16
UASIZE 13
METRO 2013 4
UASIZE 12
WPSUSTRAT 08
UASIZE 15
WPSUSTRAT 04
WPSUSTRAT MISSING

0.000

0.005

0.010

0.015

Feature importance from random forest;
(2019 wave; higher = more predictive
either direction)
h ps://oes.gsa.gov

0.020

49

Table 10: Items in order effects analysis based on trace file The items are ordered by their average dura on
across respondents.
Variable
MHMOVE
ENTRYSYS
HHSEX
GUTREHB
HOA
MHANCHOR
STORIES
STORIES_IUF
UNITFLOORS
TPARK
NOSTEP
HHMOVE
BEDROOMS
DINING
SOLAR
LIVING
KITCHENS
GARAGE
UNITSIZE
UNITSIZE_IUF
KITEXCLU
PORCH
WASHER
OTHFN
DENS
HHSPAN
FIREPLACE
LAUNDY
MONOXIDE
UFINROOMS
FRIDGE
COLD
KITCHSINK
SEWUSERS
HEATFUEL
FAMROOMS
NOWAT
WATSOURCE
COLDEQ
RECROOMS
VACMONTHS
NOWIRE
WALLCRACK
FLOORHOLE
TIMESHARE
FNDCRUMB
VACRESDAYS
ROOFSAG
WALLSIDE
ROOFSHIN
WALLSLOPE
COLDEQFREQ

Average dura on
10.17
11.83
12.81
15.07
15.29
15.40
15.89
15.89
15.89
15.90
16.21
17.05
17.37
17.52
17.61
17.70
17.71
17.74
17.91
17.91
18.01
18.03
18.45
18.48
18.51
18.57
18.75
18.82
19.10
19.10
19.15
19.20
19.28
19.34
19.69
19.74
19.74
19.81
20.37
20.42
20.51
20.67
21.03
21.15
21.15
21.26
21.26
21.27
21.31
21.34
21.37
21.41

Variable
ROOFHOLE
WINBOARD
MONLSTOCC
VACRNTDAYS
WINBROKE
WINBARS
NOWATFREQ
PLUGS
RENT
LOTVAL
YEARBUY
SUITYRRND
DWNPAYSRC
FIRSTHOME
FORSALE
LEADINSP
OILAMT
PROTAXAMT
TRASHAMT
WATERAMT
OTHERAMT
MOVWHY
RMJOB
RMOWNHH
RMCHANGE
RMCOMMUTE
RMFAMILY
RMHOME
RMCOSTS
RMHOOD
RMOTHER
SEARCHFAM
SEARCHNET
HHGRAD
NRATE
HHNATVTY
SEARCHPUB
HRATE
SEARCHOTH
SEARCHREA
SEARCHLIST
SEARCHSIGN
NEARWATER
HMRACCESS
NHQPCRIME
NHQSCHOOL
HMRENEFF
NHQSCRIME
HHINUSYR
HMRSALE
NHQPUBTRN
NHQRISK
RATINGHS
WATFRONT
SUBDIV
AGERES
FSWORRY
CROPSL
RATINGNH
NORC
FSLAST
FSAFFORD
FSSKIPMEAL
FSEATLESS
FSMEALDAYS
FSHUNGRY
FSLOSTWGT
INTLANG

Average dura on
21.45
21.49
21.50
21.58
21.60
21.63
21.66
22.12
22.39
22.49
22.68
22.94
23.35
23.96
24.25
24.34
27.06
27.73
28.28
28.42
29.02
34.15
34.32
34.48
34.61
34.63
34.65
34.82
34.98
35.03
35.08
35.25
35.38
35.42
35.63
35.68
35.68
35.85
35.86
35.88
35.93
35.93
36.53
37.47
37.51
37.51
37.56
37.69
37.70
37.75
37.97
38.08
38.13
38.16
38.18
38.20
38.30
38.40
38.43
38.53
38.73
38.85
40.59
40.73
40.76
40.82
40.82
41.13

A.4 Item order effects: addi onal analyses
A.5 Predic ng panel a ri on: addi onal analyses
Table 11: Examples of variables removed during LASSO preprocessing.
step
Edit flag variables (J variables)
High NA (over 20% missing)

cols_removed
312
196

example_vars_removed
JNOTOIL; JNUMADULTS; JHHINUSYR; JVACRNTDAYS; JRMCHANGE
SP1REPWGT68; HHYNGKIDS; PLUGS; RATINGNH; SP2REPWGT137

h ps://oes.gsa.gov

50

Figure 25: Rate of refusal in 2017 by month and division of interview in 2015.
New England

Middle Atlantic

East North Central

West North Central

South Atlantic

East South Central

West South Central

Mountain

Pacific

Average rate of refusal
in 2017

20.0%
15.0%
10.0%
5.0%
0.0%
20.0%
15.0%
10.0%
5.0%
0.0%
20.0%
15.0%
10.0%
5.0%
0.0%

May

July

September

May

July

September

Month interviewed in 2015

May

July

N Interviewed
in 2015
1000
2000
3000
4000
5000

September

Table 12: Twenty-five next-strongest predictors of 2017 survey refusal among 2015 respondents.
term
FIREPLACE3
HHCARE2
HHCITSHP1
ROACH5
INTMODE2
NHQSCHOOL1
HHNATVTY20
HHNATVTY92
RATINGNH
NUMHEAR2
NUMELDERS
DISHH1
RATINGHS
HHMEMRY2
MOLDOTHER2

es mate
0.153
-0.021
0.014
0.013
-0.011
-0.009
0.154
-0.010
-0.002
-0.012
-0.005
-0.009
-0.002
-0.011
-0.028

std.error
0.044
0.007
0.005
0.004
0.004
0.004
0.052
0.008
0.001
0.006
0.003
0.005
0.001
0.007
0.021

h ps://oes.gsa.gov

sta s c
3.516
-3.149
2.929
2.873
-2.738
-2.169
2.965
-1.369
-2.070
-2.041
-1.912
-1.896
-1.834
-1.627
-1.319

p.value
0.001
0.002
0.004
0.005
0.007
0.032
0.004
0.173
0.041
0.043
0.058
0.060
0.069
0.106
0.190

51

A.6 A ritor heterogeneity: addi onal analyses
Figure 26: Adequacy across 2017 refusers and non-refusers The propor ons reweight using the composite
weight.

Int. 2015; int. 2017
Int. 2015; refused 2017

Proportion in group

0.75

0.50

0.25

0.00
Adequate

Moderately Inadequate

h ps://oes.gsa.gov

Severely Inadequate

52

Table 13: A ritor heterogeneity in rela onship between income and adequacy: refusers in 2017; regression.
The table shows that in addi on to main rela onships where those with higher incomes are less likely to have
inadequate housing, we see heterogeneity in this income-adequacy rela onship between a ritors and nona ritors.
Dependent variable:
Yes inadequate
0.012
(0.008)

division_descrip veMiddle Atlan c
division_descrip veEast North Central

−0.021∗∗∗
(0.007)

division_descrip veWest North Central

−0.018∗∗
(0.009)
−0.028∗∗∗
(0.007)

division_descrip veSouth Atlan c

division_descrip veEast South Central

−0.003
(0.008)

division_descrip veWest South Central

−0.001
(0.008)
−0.029∗∗∗
(0.008)

division_descrip veMountain

division_descrip vePacific

−0.017∗∗
(0.007)

refusal_17

−0.032∗∗
(0.014)

inc_scaled

−0.002∗∗∗
(0.0001)
−0.0003
(0.016)

division_descrip veMiddle Atlan c:refusal_17

division_descrip veEast North Central:refusal_17

0.023
(0.015)

division_descrip veWest North Central:refusal_17

0.026
(0.018)
0.033∗∗
(0.014)

division_descrip veSouth Atlan c:refusal_17
division_descrip veEast South Central:refusal_17

0.001
(0.018)

division_descrip veWest South Central:refusal_17

0.004
(0.016)

division_descrip veMountain:refusal_17

0.023
(0.017)

division_descrip vePacific:refusal_17

0.014
(0.015)

refusal_17:inc_scaled

0.001∗
(0.001)
0.083∗∗∗
(0.007)

Constant

Observa ons
Log Likelihood
Akaike Inf. Crit.
Note:

60,487
−5,077.268
10,194.540
∗

p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01

h ps://oes.gsa.gov

53


File Typeapplication/pdf
File Modified0000-00-00
File Created2020-09-25

© 2024 OMB.report | Privacy Policy