Download:
pdf |
pdfSource of the Data and Accuracy of the Estimates for the
Census Household Panel Survey – Topical 12
SOURCE OF THE DATA
The Census Household Panel (CHP or the Panel), an experimental data product, is a
probability-based nationwide survey panel to test methods of collecting data on a variety of
topics of interest, and for conducting experiments on alternative question wording and
methodological approaches. The goal of the Census Household Panel is to ensure readily
available sample cases for frequent data collection on a variety of topics for a variety of
population subgroups, producing estimates that meet quality standards of the Federal
Statistical Agencies and the Office of Management and Budget (OMB).
The initial goal for the size of the Panel is 15,000 households selected from the Census
Bureau’s Master Address File (MAF). This ensures the Panel is rooted in this rigorously
developed and maintained frame and available for linkage to administrative records
securely maintained and curated by the Census Bureau. Initial invitations to enroll in the
Panel were sent by mail and post-recruitment panel questionnaires were collected mainly
by internet self-response. The Panel maintains representativeness by allowing respondents
who do not use the internet to respond via in-bound computer-assisted telephone
interviewing (CATI). All panelists received an incentive for completing the questionnaire.
The Census Bureau has reviewed this data product to ensure appropriate access, use, and
disclosure avoidance protection of the confidential source data (Project No. P-7532382,
Disclosure Review Board (DRB) approval number: CBDRB-FY25-0079).
Topical 12
The purpose of collecting information in Topical 12 (October) is to conduct the Household
Pulse Survey (HPS) questionnaire.
The Household Pulse Survey measures how emergent social and economic issues are
impacting households across the country.
The HPS also asks about core demographic household characteristics, as well as the
following topics:
Access to infant formula
•
•
•
•
•
•
Children’s mental health treatment
COVID-19 vaccinations and long COVID symptoms and impact
Education, specifically K-12 enrollment
Childcare arrangements
Employment
Food sufficiency
2
•
•
•
•
•
•
•
•
•
•
•
•
Housing security
Household spending, including energy expenditures and consumption
Inflation concerns and changes in behavior due to increasing prices
Physical and mental health
Feelings of pressure to move from rental home
Transportation, including behavioral changes related to the cost of gas
Health insurance coverage (including Medicaid)
Social isolation
Shortage of Critical Items
Participation with the arts
Internet Access
Impact of living through natural disasters
Table 1 provides the start and end dates for the current cycle of data collection.
Table 1. Data Collection Periods for Topical 12 for the
Household Panel Survey
Data Collection Period
Topical 12
Start Date
October 15, 2024
Finish Date
October 29, 2024
Sample Design
The CHP utilizes the Census Bureau’s MAF as the universe for the CHP sampled housing
units (HUs). HUs on the MAF were stratified based on information obtained from the
Demographic Frame 1 and the 2022 Block-Group Level Planning Database (PDB)2.
The Demographic Frame is a comprehensive database of person-level data that contains
demographic characteristics and addresses associated with each person. It is derived from
administrative, third-party, census, and survey data sources. The Demographic
Frame includes unique person-level identifiers used to link individuals across datasets.
Extracts from the Demographic Frame are available only to approved, internal users in a
secured computing environment.
The 2022 Block-Group Level Planning Database (also called the PDB) is a dataset that
contains a range of housing, demographic, socioeconomic, and census operational data. The
estimates in the PDB are derived from 2020 Census counts and 2016-2020 American
Community Survey (ACS) 5-year estimates. Data are summarized for all block groups in the
country and the territory of Puerto Rico.
The MAF HUs were first matched to the Demographic Frame. Matching records were
stratified into one of six strata based on the racial and Hispanic origin characteristics of the
matching records. Non-matching records to the Demo Frame were then matched to the
PDB. Information on the PDB of where the housing unit is located was used to stratify the
1
For more information on the Demographic Frame see the Frames Program (census.gov) website.
2 For more information of the PDB see the 2022 Planning Database (census.gov) website.
3
housing units into one of four strata based on the racial and Hispanic origin characteristics
of the most likely race and Hispanic origin based of the block group. Non-matches to both
the Demographic Frame and the PDB were put into their own stratum. Table 2 provides
the resulting strata.
Table 2. Stratum Definitions and Size of Stratum for the Census
Household Panel from the July 2023 MAF
Stratum
DHPBK
DHPOT
DHPWH
DNHBK
DNHOT
DNHWH
MHPHP
MNHBK
MNHOT
MNHWH
MZZZZ
All
Characteristics
Demo Frame Match -Hispanic Black
Demo Frame Match -Hispanic Other Race
Demo Frame Match -Hispanic White
Demo Frame Match - Non-Hispanic Black
Demo Frame Match - Non-Hispanic Other
Race
Demo Frame Match - Non-Hispanic White
PDB Match – Hispanic – All Races
PDB Match – Non-Hispanic Black
PDB Match – Non-Hispanic Other Race
PDB Match – Non-Hispanic White
Non-matches to the Demo Frame and PDB
Total Household on MAF
Stratum
Size+
934,000
8,151,000
7,175,000
14,318,000
12,465,000
69,461,000
5,037,000
3,401,000
1,282,000
23,083,000
2,352,000
147,659,000
Source: U.S. Census Bureau, Census Household Panel Baseline Survey.
+Stratum sizes are rounded to the thousands.
The sample for the CHP survey was then selected systematically within strata, with
adjustments applied to the sampling intervals to enable estimates for the four Census
regions3. Sample sizes were determined such that a 2.2 percent coefficient of variation (CV)
for an estimate of 40 percent of the population would be achieved for each Census region.
The sample size calculation assumed a 20 percent response rate, which yielded a national
sample size requirement of 75,000 HUs. The initial sample size actually selected was
75,001 HUs. Oversampling occurred in all strata except the two non-Hispanic White
stratum to ensure reliable estimates of minority subgroups.
In March of 2024, a supplementary sample, referred to as a “replenishment sample,” was
introduced to the baseline sample. These households received the baseline questionnaire
March 05, 2024, through April 09, 2024, and started receiving the topical questionnaires in
May 2024 – Topical 06. An additional 30,000 sampled households were introduced
increasing the total sample size to 105,001 households. Base weights for all sampled
households were adjusted to account for the additional sampled households.
Data Collection
Development of the CHP began with an initial recruitment operation, during which
participants responded to a baseline survey. Following the initial Baseline survey,
panelists are enrolled in the CHP and receive invitations to monthly topical surveys for up
3
See census.gov for a map of the Census regions.
4
to three years. Data for the CHP is collected online via self-response using the Qualtrics
data collection platform.
Initial baseline survey invitations were distributed via postal letter that included a visible
$5 pre-paid incentive to encourage participation. Outbound telephone follow-up and
inbound call operations were employed to encourage participation, answer any respondent
questions, and assist respondents in completing the questionnaire. Recruitment
operations were conducted from September 12, 2023, through October 10, 2023, and the
first replenishment operations were conducted from March 05, 2024, through April 09,
2024. Responding households received a $20 cash incentive for completing the initial
baseline questionnaire.
Once the Baseline data were reviewed and respondents were confirmed as enrolled
panelists, monthly topical survey invitations were distributed via emails and/or texts,
based on the contact information provided by the respondent in the baseline survey. For
cases where no email or cell phone number was provided, an outbound telephone
operation was conducted to inform respondents of the available monthly survey. Topical
survey respondents receive a $10 incentive for each completed survey.
The Census Bureau conducted the CHP online using Qualtrics as the data collection
platform. Qualtrics is currently used at the Census Bureau for research and development
surveys and provides the necessary agility to deploy the CHPCHP quickly and securely. It
operates in the Gov Cloud, is FedRAMP 4 authorized at the moderate level, and has an
Authority to Operate from the Census Bureau to collect personally identifiable and Titleprotected data.
Approximately 18,500 respondents answered the baseline questionnaire and agreed to
participant in the topical follow-on surveys. Table 3 shows the sample sizes and the
number of responses for each topical data collection.
Table 3. Sample Size and Number of Respondents at the
National Level
Data Collection
Baseline Sample
Topical 12
Sample Size
105,001
18,501
Number of Respondents
18,501
9,355
Source: U.S. Census Bureau, Census Household Panel Baseline and Topical 12
Survey.
Estimation Procedure
The weighting procedures for both the baseline sample and topical samples apply the same
general methods for adjustments. However, the topical surveys start with the baseline
nonresponse adjusted weight.
4
For more information on FedRAMP see FedRAMP.gov
5
The final CHP weights are designed to produce national and region-level estimates for the
total population aged 18 and older living within HUs. These weights were created by
adjusting the HU-level sampling base weights by various factors to account for
nonresponse, adults per household, and coverage.
The sampling base weights in each of the four sample regions are calculated as the total
eligible HUs in the sampling frame divided by the number of eligible HUs selected for
interviews. Therefore, the base weights for all sampled HUs sum to the total number
eligible HUs on the MAF within each region.
The final CHP person weights are created by applying the following adjustments to the
sampling base weights:
1. Nonresponse adjustment – the weight of all sample units that did not respond to
the CHP are evenly allocated to the units that did respond within each stratum
and sample region. After this step, the weights of all respondents sum to the total
HUs on the MAF.
2. Occupied HU ratio adjustment – this adjustment corrects for undercoverage in
the sampling frame by inflating the HU weights after the nonresponse
adjustment to match independent controls for the number of occupied HUs
within each region. For this adjustment, the independent controls are the 2022
American Community Survey (ACS) one-year, region-level estimates available at
www.census.gov 5.
3. Person adjustment – this adjustment converts the HU weights into person
weights by multiplying them by the number of persons aged 18 and older that
were reported to live within the household. The number of adults is based on
subtracting the number of children under 18 in the household from the number
of total persons in the household. This number was capped at 10 adults.
4. Iterative raking ratio to population estimates – this procedure controls the
person weights to independent population controls by various demographics
within each region. The ratio adjustment is done through an iterative raking
procedure to simultaneously control the sample estimates to two sets of
population: Educational attainment estimates from the 2022 1-year ACS
estimates (Table B15001)6 by age and sex, and the July 1, 2024 Hispanic
origin/race by age and sex estimates from the Census Bureau’s Population
Estimates Program (PEP). PEP provided July 1, 2024 household population
estimates by single year of age (0-84, 85+), sex, race (31 groups), and Hispanic
origin for regions from the Vintage 2024 estimates series 7. The ACS 2022
estimates were adjusted to match the 2024 pop controls within region by sex,
The one-year estimates are at this URL: B25002: Occupancy Status - Census Bureau Table
The1-year state-level detailed table B15001 is located at this URL: B15001 - Census Bureau Tables.
7 The Vintage 2023 estimates methodology statement is available at this URL: methods-statement-v2023.pdf
(census.gov). Note: The Vintage 2024 methodology has not yet been released – The Vintage 2023 methodology has been
provided for reference.
The Modified Race Summary File methodology statement is available at this URL: https://www2.census.gov/programssurveys/popest/technical-documentation/methodology/modified-race-summary-file-method/mrsf2010.pdf
5
6
6
and the five age categories in the ACS educational attainment estimates. Tables 4
and 5 show the demographic groups formed. The raking procedure ran until
convergence or a maximum of 10 iterations.
Before the raking procedure was applied, cells containing too few responses were
collapsed to ensure all cells met the minimum response count requirement of 30 cases. The
cells after collapsing remained the same throughout the raking. These collapsed cells were
also used in the calculation of replicate weights for variance estimation. Collapsing
occurred only before raking; there was no collapsing during the first three steps of
weighting.
Table 4: Educational Attainment Population Adjustment Cells within Region
Age
No HS
diploma
Male
No HS
diploma
Female
HS
diploma
Male
HS
diploma
Female
Some
college or
Associate’s
degree
Male
Some
college or
Associate’s
degree
Female
Bachelor’s
degree or
higher
Male
Bachelor’s
degree or
higher
Female
NonHispanic
Other
Races Male
NonHispanic
Other
Races
Female
18-24
25-34
35-44
45-64
65+
Table 5: Race/Ethnicity Population Adjustment Cells within Region
Age
Hispanic
Any Race
Male
Hispanic
Any Race
Female
NonHispanic
WhiteAlone
Male
NonHispanic
WhiteAlone
Female
NonHispanic
BlackAlone
Male
NonHispanic
BlackAlone
Female
18-24
25-29
30-34
35-39
40-44
45-49
50-54
55-64
65+
The final CHP HU weights are created by applying the following adjustments to the final
CHP person weights:
1. HU adjustment – this adjustment converts the person level weight back into a
HU weight by dividing the person level weight by the number of persons aged 18
and older that were reported to live within the household. The number of adults
is the same value used to create the person adjustment in Step 3 above.
2. Occupied HU ratio adjustment – this adjustment ensures that the final CHP HU
weights will sum to the 2022 American Community Survey (ACS) one-year,
region-level estimates available at www.census.gov5. This ratio adjustment is the
7
same adjustment applied to the person weights in Step 2 above but is needed
again because region totals may have changed as a result of the iterative raking
adjustment in the final step of the person weight creation.
The detailed tables released for this experimental CHP show frequency counts rather than
percentages. Showing the frequency counts allows data users to see the count of cases for
each topic and variable that are in each response category and in the ‘Did Not Report’
category. This ‘Did Not Report’ category is not a commonly used data category in U.S.
Census Bureau tables. Most survey programs review these missing data and statistically
assign them to one of the other response categories based on other characteristics.
In these tables, the Census Bureau recommends choosing the numerators and
denominators for percentages carefully, so that missing data are deliberately included or
excluded in these counts. In the absence of external information, the percentage based on
only the responding cases will most closely match a percentage that would result from
statistical imputation. Including the missing data in the denominator for percentages will
lower the percentages that are calculated.
Users may develop statistical imputations for the missing data but should ensure that they
continue to be deliberate and transparent with their handling of these data.
ACCURACY OF THE ESTIMATES
A sample survey estimate has two types of error: sampling and nonsampling. The accuracy
of an estimate depends on both types of error. The nature of the sampling error is known
given the survey design; the full extent of the nonsampling error is unknown.
Sampling Error
Since the CHP estimates come from a sample, they may differ from figures from an
enumeration of the entire population using the same questionnaires, instructions, and
enumeration methods. For a given estimator, the difference between an estimate based on
a sample and the estimate that would result if the sample were to include the entire
population is known as sampling error. Standard errors, as calculated by methods
described below in “Standard Errors and Their Use,” are primarily measures of the
magnitude of sampling error. However, the estimation of standard errors may include
some nonsampling error.
Nonsampling Error
For a given estimator, the difference between the estimate that would result if the sample
were to include the entire population and the true population value being estimated is
known as nonsampling error. There are several sources of nonsampling error that may
occur during the development or execution of the survey. It can occur because of
circumstances created by the respondent, the survey instrument, or the way the data are
collected and processed. Some nonsampling errors, and examples of each, include:
8
•
•
•
•
Measurement error: The respondent provides incorrect information, the
respondent estimates the requested information, or an unclear survey question
is misunderstood by the respondent. The interviewer may also be a source of
measurement error.
Coverage error: Some individuals who should have been included in the survey
frame were missed.
Nonresponse error: Responses are not collected from all those in the sample or
the respondent is unwilling to provide information.
Imputation error: Values are estimated imprecisely for missing data.
To minimize these errors, the Census Bureau applies quality control procedures during all
stages of the production process including the design of the survey, the wording of
questions, and the statistical review of reports.
Two types of nonsampling error that can be examined to a limited extent are nonresponse
and undercoverage.
Nonresponse
The effect of nonresponse bias cannot be measured directly, but one indication of its
potential effect is the nonresponse rate. Tables 6 and 7 show the unit response rates by
collection period. The expected baseline response rate was lower than we anticipated at
16.8 percent unweighted (17.4 percent weighted). For the topical data collections, the
response rates are also lower than we anticipated at 57.6 percent unweighted (59.5
percent weighted) for the first topical collection, making the overall topical response rate
9.7 percent unweighted (10.4 percent weighted).
Table 6. Unweighted National Level Response Rates by Collection
Period for the Census Household Panel Survey
Data Collection
Baseline
Invitation
Topical 12
Response Rate (Percent) of Data
Collection
Overall Response Rate
(Percent)
17.6
17.6
50.6
8.9
Source: U.S. Census Bureau, Census Household Panel Baseline and Topical 12 Survey.
Table 7. Weighted National Level Response Rates by Collection
Period for the Census Household Panel Survey
Data Collection
Baseline
Invitation
Topical 12
Response Rate (Percent) of Data
Collection
Overall Response Rate
(Percent)
18.3
18.3
52.5
9.6
Source: U.S. Census Bureau, Census Household Panel Baseline and Topical 12 Survey.
Responses are made up of complete interviews and sufficient partial interviews. A
sufficient partial interview is an interview in which the household or person answered
enough of the questionnaire to be considered a complete interview. Some remaining
9
questions may have been edited or imputed to fill in missing values. Insufficient partial
interviews are considered nonrespondents.
In accordance with Census Bureau and OMB Quality Standards, the Census Bureau will
conduct a nonresponse bias analysis to assess nonresponse bias in the CHP.
Undercoverage
The concept of coverage with a survey sampling process is defined as the extent to which
the total population that could be selected for sample “covers” the survey’s target
population. Missed housing units and missed people within sample households create
undercoverage in the CHP. A common measure of survey coverage is the coverage ratio,
calculated as the estimated population before poststratification divided by the independent
population control.
CHP person coverage varies with age, sex, Hispanic origin/race, and educational
attainment. Generally, coverage is higher for females than for males and higher for nonBlacks than for Blacks. This differential coverage is a general issue for most householdbased surveys. The CHP weighting procedure tries to mitigate the bias from
undercoverage within the raking procedure. However, due to small sample sizes, some
demographic cells needed collapsing to increase sample counts within the raking cells. In
this case convergence to both sets of the population controls was not attained. Therefore,
the final coverage ratios are not perfect for some demographic groups. Table 8 shows the
coverage ratios for the person demographics of age, sex, Hispanic origin/race, and
educational attainment before and after the raking procedure is run.
10
Table 8. Person-Level Coverage Ratios at the National Level
for Household Pulse Survey Before and After Raking for
Topical 12
Demographic Characteristic
Total Population
Male
Female
Age 18-24
Age 25-29
Age 30-34
Age 35-39
Age 40-44
Age 45-49
Age 50-54
Age 55-64
Age 65+
Hispanic
Non-Hispanic white-only
Non-Hispanic black-only
Non-Hispanic other races
No high-school diploma
High-school diploma
Some college or associate’s degree
Bachelor’s degree or higher
Before Raking
0.96
0.87
1.05
0.16
0.53
0.75
0.93
1.09
1.05
1.10
1.20
1.32
0.72
1.07
0.78
0.95
0.30
0.47
0.88
1.64
After Raking
1.00
1.00
1.00
0.48
1.12
1.23
1.10
1.03
1.01
1.04
1.07
1.00
1.00
1.00
1.00
1.00
0.85
1.06
1.00
1.00
Source: U.S. Census Bureau, Census Household Panel Baseline and Topical 12
Survey.
Biases may also be present when people who are missed by the survey differ from those
interviewed in ways other than age, sex, Hispanic origin/race, and educational attainment.
How this weighting procedure affects other variables in the survey is not precisely known.
All of these considerations affect comparisons across different surveys or data sources.
Comparability of Data
Data obtained from the CHP and other sources are not entirely comparable. This is due to
differences in data collection processes, as well as different editing procedures of the data,
within this survey and others. These differences are examples of nonsampling variability
not reflected in the standard errors. Therefore, caution should be used when comparing
results from different sources.
A Nonsampling Error Warning
Since the full extent of the nonsampling error is unknown, one should be particularly
careful when interpreting results based on small differences between estimates. The
Census Bureau recommends that data users incorporate information about nonsampling
errors into their analyses, as nonsampling error could impact the conclusions drawn from
the results. Caution should also be used when interpreting results based on a relatively
small number of cases.
11
Standard Errors and Their Use
A sample estimate and its standard error enable one to construct a confidence interval. A
confidence interval is a range about a given estimate that has a specified probability of
containing the true value. For example, if all possible samples were surveyed under
essentially the same general conditions and using the same sample design, and if an
estimate and its standard error were calculated from each sample, then approximately
90 percent of the intervals from 1.645 standard errors below the estimate to 1.645
standard errors above the estimate would include the true value.
A particular confidence interval may or may not contain the average estimate derived from
all possible samples, but one can say with the specified confidence that the interval
includes the average estimate calculated from all possible samples.
The context and meaning of the estimate must be kept in mind when creating the
confidence intervals. Users should be aware of any “natural” limits on the bounds of the
confidence interval for a characteristic of the population when the estimate is near zero –
the calculated value of the lower bound of the confidence interval may be negative. For
some estimates, a negative lower bound for the confidence interval does not make sense,
for example, an estimate of the number of people with a certain characteristic. In this case,
the lower confidence bound should be reported as zero. For other estimates such as
income, negative confidence bounds can make sense; in these cases, the lower confidence
interval should not be adjusted. Another example of a natural limit is 100 percent as the
upper bound of a percent estimate.
Standard errors may also be used to perform hypothesis testing, a procedure for
distinguishing between population parameters using sample estimates. The most common
type of hypothesis is that the population parameters are different.
Tests may be performed at various levels of significance. A significance level is the
probability of concluding that the characteristics are different when, in fact, they are the
same. For example, to conclude that two characteristics are different at the 0.10 level of
significance, the absolute value of the estimated difference between characteristics must be
greater than or equal to 1.645 times the standard error of the difference.
The Census Bureau uses 90-percent confidence intervals and 0.10 levels of significance to
determine statistical validity. Consult standard statistical textbooks for alternative criteria.
Estimating Standard Errors
The Census Bureau uses successive difference replication to estimate the standard errors
of CHP estimates. These methods primarily measure the magnitude of sampling error.
However, they do measure some effects of nonsampling error as well. They do not
measure systematic biases in the data associated with nonsampling error. Bias is the
average over all possible samples of the differences between the sample estimates and the
true value.
12
Eighty replicate weights were created for the CHP. Using these replicate weights, the
variance of an estimate (the standard error is the square root of the variance) can be
calculated as follows:
80
4
2
𝑉𝑎𝑟(𝜃̂) =
∑(𝜃𝑖 − 𝜃̂)
80
(1)
𝑖=1
where 𝜃̂ is the estimate of the statistic of interest, such as a point estimate, ratio of domain
means, regression coefficient, or log-odds ratio, using the weight for the full sample and 𝜃𝑖
are the replicate estimates of the same statistic using the replicate weights. See reference
Judkins (1990).
Creating Replicate Estimates
Replicate estimates are created using each of the 80 weights independently to create 80
replicate estimates. For point estimates, multiply the replicate weights by the item of
interest to create the 80 replicate estimates. You will use these replicate estimates in the
formula (1) to calculate the total variance for the item of interest. For example, say that the
item you are interested in is the difference in the number of people with a loss in
employment income time frame compared to the number of people with a loss in
employment income in another. You would create the difference of the two estimates using
the sample weight, 𝑥̂0, and the 80 replicate differences, 𝑥𝑖 , using the 80 replicate weights.
You would then use these estimates in the formula to calculate the total variance for the
difference in the number of people with a loss in employment income from the first time
frame to the second time frame.
80
4
𝑉𝑎𝑟 (𝑥̂0 ) =
∑(𝑥𝑖 − 𝑥̂0 )2
80
𝑖=1
Where 𝑥𝑖 is the ith replicate estimate of the difference and 𝑥̂0 is the full estimate of the
difference using the sample weight.
Example for Variance of Regression Coefficients
Variances for regression coefficients 𝛽0 can be calculated using formula (1) as well. By
calculating the 80 replicate regression coefficients 𝛽𝑖 ′𝑠 for each replicate and plugging in
the replicate 𝛽𝑖 estimates and the 𝛽0 estimate into the above formula,
80
4
2
𝑉𝑎𝑟(𝛽̂0 ) =
∑(𝛽𝑖 − 𝛽̂0 )
80
𝑖=1
gives the variance estimate for the regression coefficient 𝛽0 .
13
TECHNICAL ASSISTANCE
If you require assistance or additional information, please contact the Demographic
Statistical Methods Division via e-mail at [email protected].
REFERENCES
Judkins, D. (1990) “Fay’s Method for Variance Estimation,” Journal of Official
Statistics, Vol. 6, No. 3, 1990, pp.223-239.
All links were verified as correct on April 17, 2024
File Type | application/pdf |
File Title | Source of the Data and Accuracy of the Estimates for the Census Household Panel Survey - Topic 12 |
Subject | Survey |
Author | U.S. Census Bureau |
File Modified | 2025-02-03 |
File Created | 2024-11-18 |