Supporting Statement (1220-0050) CE Part B_8_19

Supporting Statement (1220-0050) CE Part B_8_19.docx

Consumer Expenditure Surveys: Quarterly Interview and Diary

OMB: 1220-0050

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 1220-0050 can be found here:

Document [docx]

Download: docx | pdf

Consumer Expenditure Surveys

1220-0050

Supporting Statement

B. Collection of Information Employing Statistical Methods

1. Sampling Method

The Consumer Expenditure (CE) Survey is a nationwide household survey conducted by the U.S. Bureau of Labor Statistics to find out how Americans spend their money. The CE Survey actually consists of two sub-surveys, a Quarterly Interview survey (CEQ), and a two-week Diary survey (CED). The Interview survey collects detailed information on large expenditures such as property, automobiles, and major appliances, as well as on recurring expenditures such as rent, utilities, and insurance premiums. By contrast, the Diary survey collects detailed information on small, frequently purchased items such as food and apparel. The data from the two surveys are then combined to provide a complete picture of consumer expenditures in the United States.

The data for both surveys are collected from a representative sample of households around the country. Both surveys have the same sample design, which is a two-stage sampling process. In the first stage a representative sample of counties from around the United States is selected for the survey. And then in the second stage a representative sample of households from those counties is selected for the survey. This two-stage sampling process is designed to generate a sample of households in which every demographic group and every wealth level is well-represented in the survey. The rest of this section describes the two sampling processes in more detail.

Primary Sampling Units (PSUs)

In the first stage of sampling all 3,143 counties or county equivalents in the United States are partitioned into small geographic clusters called “primary sampling units” (PSUs) from which a representative sample of 91 of them are randomly selected for the survey. The clusters are the “core-based statistical areas” defined by the Office of Management and Budget (OMB), and they range in size from 1 to 29 counties with the average size being 5 counties. The same sample of 91 PSUs is used in both the CEQ and CED surveys, and the 91 PSUs fall into three categories:

PSU “size class”	Number of PSUs	Description
S	23	Large Metropolitan Core Based Statistical Areas. These are CBSAs with over 2.5 million people, and they are self-representing PSUs.
N	52	Small Metropolitan Core Based Statistical Areas, and Micropolitan Core Based Statistical Areas. These are CBSAs with under 2.5 million people, and they are non-self-representing PSUs.
R	16	Non-Core Based Statistical Areas. These are small clusters of counties in “rural” areas created by CE staff, and they are non-self-representing PSUs.

BLS selected its sample of 91 PSUs from a stratified sample design in which all 23 self-representing PSUs (the S PSUs) were selected for the survey with certainty, while all the non-self-representing PSUs (the N and R PSUs) were stratified into 68 (=52+16) strata using a 4-variable model whose independent variables were latitude, longitude, median household income, and median household property value. Then one PSU was randomly selected from each stratum with its probability of selection being proportional to its population.

All 91 PSUs are used by the CE survey. However, one of CE’s major customers is the Consumer Price Index (CPI) which is an urban survey, not a national survey, that uses CE’s data for its expenditure weights, so CPI uses only the 75 (=23+52) urban PSUs in its survey.

Sampling Households Within PSUs

After selecting a sample of PSUs, a sample of households was then selected from the civilian non-institutional portion of those PSUs. That includes people living in houses, condominiums, and apartments, as well as people living in group quarters such as college dormitories and boarding houses. However, it excludes the non-civilian and institutional portions of the population, such as military personnel living on base, nursing home residents, and prison inmates.

Addresses for the CEQ and CED surveys are selected from two sampling frames maintained by the Census Bureau: the Unit frame and the Group Quarters (GQ) frame. Both frames are derived from the Master Address File (MAF), which is basically a list of all residential addresses identified in the 2010 census and is updated twice per year with information from the U.S. Postal Service. The Unit frame is the larger of the two frames and it contains both existing housing units and newly constructed housing units. It has approximately 99% of the MAF’s civilian non-institutional addresses and it is updated twice per year. The GQ frame is also derived from the MAF but it is much smaller; it has the remaining 1% of the MAF’s civilian non-institutional addresses and it is updated every three years.

In each PSU, a “systematic sample” of households is selected from the two frames. The first step in the selection process is sorting the households by variables that are correlated with their expenditures. The purpose of this is to ensure that households of every wealth level are well-represented in the sample. In the systematic sampling process the first household in the sample is selected from the sorted list using a random number generator. Then after the initial household is selected every k-th household down the list is selected where “k” is the PSU’s sampling interval. The Unit and GQ frames have different sorting variables, but they have the same sampling interval.

Table 1 below shows how the households are sorted in the Unit frame. It has codes ranging from 10 to 99 with the lower codes being for low-wealth households, and the higher codes being for high-wealth households. For the Unit frame, the sorting or “stratification” variable is created from the number of occupants in each household, their housing tenure (owner/renter), and the market value of their home (for owners) or the rental value of their apartment or home (for renters). These variables are used because they are correlated with expenditures: households with more people tend to be wealthier than those with fewer people; homeowners tend to be wealthier than renters; and people living in high-price housing units tend to be wealthier than people living in low-price housing units.

All the renters are at one end of the stratification and all the owners are at the other end of the stratification. The renters and owners are further subdivided into quartiles based on monthly rental and property values in order to ensure that households of every wealth level are well represented in the survey. Vacant housing units are put in the middle column for the number of household occupants because although they were vacant at the time of the decennial census, when CE’s field representatives visit them most will be occupied and they could be in any of the four non-zero categories. Thus the middle column is their “expected” location.

Table 1. CE Unit Frame Stratification Code Values

Renter/Owner Quartile	Number of Occupants
	1 person	2 persons	Vacant	3 persons	4+ persons
Renters 1^st Quartile	10	11	12	13	14
Renters 2^nd Quartile	25	24	23	22	21
Renters 3^rd Quartile	30	31	32	33	34
Renters 4^th Quartile	45	44	43	42	41
Owners 1^st Quartile	50	51	52	53	54
Owners 2^nd Quartile	65	64	63	62	61
Owners 3^rd Quartile	70	71	72	73	74
Owners 4^th Quartile	85	84	83	82	81
Other			99

To draw a systematic sample from the Unit frame, the addresses are sorted first by PSU, then by State FIPS code, County FIPS code, the CE stratification variable described above, Census Tract code, Census Block code, Street name, Street number, and MAFID code.

To draw a systematic sample from the GQ frame, the addresses are sorted first by PSU, then by State FIPS code, County FIPS code, Census Tract code, CHPCT (the percent of people in the tract living in college housing), and Census Block code. CHPCT is used because people living in college housing units are very different than the rest of the people in the GQ frame, so using it as a stratification variable helps produce a more representative sample.

For more information on the sample design in general, please see the paper by Susan King on “Selecting a Sample of Households for the Consumer Expenditure Survey” (Attachment R); or the paper by Danielle Neiman et. al., “Review of the 2010 Sample Redesign of the Consumer Expenditure Survey” (Attachment S). For more information on the geographic portion of CE’s sample design, please see the memorandum from Jay Ryan to Richard Schwartz on “PSUs for the Consumer Expenditure Survey’s 2010 Census-Based Sample Design,” December 18, 2012 (Attachment T).

Consumer Units

A consumer unit (CU) is the unit from which the CE seeks to collect its detailed expenditure information. It is basically the same thing as a “household,” although there are some technical differences between them. A CU is a group of people living together in a housing unit (1) who are related by blood, marriage, adoption, or some other legal arrangement such as foster children; (2) who are unrelated but pool their incomes to make joint expenditure decisions; or (3) is a person living alone or sharing a housing unit with other people but who is financially independent of the other people.¹ In most cases, CUs and households are identical so the terms are often used interchangeably. Approximately 99 percent of all occupied housing units are occupied by one CU, and there are approximately 130 million CUs in the United States. The following table shows the estimated number of CUs in all 91 strata from which CE’s sample of 91 PSUs was selected.²

Estimated Number of CUs in CE’s 91 Strata

Stratum Code	Estimated Number of CUs in the Stratum
S11A	1,916,829
S12A	8,239,029
S12B	2,511,760
S23A	3,983,681
S23B	1,808,974
S24A	1,410,066
S24B	1,173,786
S35A	2,373,185
S35B	2,343,038
S35C	2,226,023
S35D	1,171,909
S35E	1,141,275
S37A	2,705,813
S37B	2,492,843
S48A	1,765,452
S48B	1,070,955
S49A	5,401,694
S49B	1,825,454
S49C	1,778,910
S49D	1,448,362
S49E	1,303,309
S49F	572,767
S49G	220,279
N11B	2,107,733
N11C	1,782,731
N12C	1,711,973
N12D	1,466,621
N12E	1,652,789
N12F	1,499,951
N23C	1,429,854
N23D	1,371,790
N23E	1,582,553
N23F	1,371,175
N23G	1,652,369
N23H	1,646,840
N23I	1,576,918
N23J	1,443,122
N24C	1,252,236
N24D	1,196,973
N24E	1,384,575
N24F	1,241,240
N35F	1,277,976
N35G	1,112,833
N35H	1,274,905
N35I	1,073,353
N35J	1,302,974
N35K	1,110,367
N35L	1,301,557
N35M	1,081,592
N35N	1,226,603
N35O	1,152,152
N35P	1,305,536
N35Q	1,079,215
N36A	1,065,120
N36B	1,045,744
N36C	1,103,424
N36D	1,179,553
N36E	1,073,872
N36F	1,009,410
N37C	1,025,739
N37D	1,184,416
N37E	1,071,009
N37F	1,029,420
N37G	1,086,768
N37H	1,160,487
N37I	1,103,594
N37J	1,200,835
N48C	1,359,161
N48D	1,568,137
N48E	1,617,161
N48F	1,350,234
N49H	2,193,028
N49I	2,174,208
N49J	1,946,697
N49K	1,837,364
R11D	274,844
R12G	347,740
R23K	676,088
R23L	569,043
R24G	773,937
R24H	651,715
R35R	649,702
R35S	780,518
R36G	660,108
R36H	592,418
R37K	553,860
R37L	668,619
R48G	202,807
R48H	168,146
R48I	188,377
R49L	300,802
Total	130,000,000

Sample Size and Response Rates

The table below shows the expected annual sample sizes and response rates for the CEQ and CED surveys for 2021-2023. The sample sizes recently increased from their previous levels due to the CPI program changing the source of its outlet frame information from the Telephone Point of Purchase Survey (TPOPS) to the CEQ and CED surveys. The CPI program relied on TPOPS as its source of outlet sampling frame information since 1998, but due to its low response rate (which was around 30 percent), the duty of providing outlet information to the CPI program was transferred to the CE program.

CPI’s target population is the urban portion of the U.S. rather than the whole U.S., so CE’s sample sizes in the urban (“S” and “N”) PSUs were increased by 11 percent in the CEQ survey and by 53 percent in the CED survey, while the sample sizes in the rural (“R”) PSUs remained unchanged. The CED’s sample size was increased more than the CEQ’s sample size because internal CPI research indicated that the CED survey is more effective at collecting outlet information than the CEQ survey.

	Quarterly Interview Survey			Diary Survey
Category	2021	2022	2023	2021	2022	2023
Total Sample Size (addresses)	52,700	52,700	52,700	17,800	17,800	17,800

Type B and C Noninterviews (vacant, demolished, etc.)
Number	8,959	8,959	8,959	3,026	3,026	3,026
Percent of Total Sample	17.0	17.0	17.0	17.0	17.0	17.0

Eligible Units (occupied housing units)
Number	43,741	43,741	43,741	14,774	14,774	14,774
Percent of Total Sample	83.0	83.0	83.0	83.0	83.0	83.0

Type A Noninterviews
Number	19,683	19,683	19,683	7,239	7,239	7,239
Percent of Eligible Units	45.0	45.0	45.0	49.0	49.0	49.0

Completed Interviews
Number	24,058	24,058	24,058	7,535	7,535	7,535
Percent of Eligible Units (Response Rate)	55.0	55.0	55.0	51.0	51.0	51.0

The CEQ’s nationwide sample size used to be 48,000 addresses per year, but it was increased to 52,700 addresses per year in April 2020. Similarly, the CED’s nationwide sample size used to be 12,000 addresses per year, but it was increased to 17,800 addresses per year in January 2020.

As the table above shows, 83% of the sample addresses are expected to have occupied housing units, and the other 17% are expected to be “Type B/C” noninterviews, which are addresses that are not occupied housing units: they are nonexistent, nonresidential, vacant, demolished, etc. Then in the CEQ survey, 55% of the occupied housing units are expected to complete an interview, and the other 45% are expected to be “Type A” noninterviews, which are occupied housing units that do not complete an interview. This is expected to yield 24,058 completed interviews per year in 2021-2023.

Similarly, in the CED survey 83% of the sample addresses are expected to have occupied housing units, and 51% of the occupied housing units are expected to fill out diaries. This is expected to yield 15,070 (= 7,535 × 2) weekly diaries per year in 2021-2023.

The response rates in the table above are the CEQ’s and CED’s actual response rates over the past five years (2015-2019) minus 5 percentage points. Response rates have been decreasing in recent years, so the 5-year historical response rates are reduced by 5 percentage points to account for the downward trend.

Finally, it should be noted that in 2021 the PSU that was randomly chosen to represent stratum “N24F” will change from Wahpeton, ND to Brookings, SD. The request was made by the CPI program to avoid anticipated data collection issues.

Nonresponse Bias

In 2018 CE staff conducted a nonresponse bias study to determine whether the CEQ and CED surveys’ nonrespondents are “missing completely at random” (MCAR), and whether their missing-ness generates any bias in the published expenditure estimates. The study was undertaken in response to an OMB directive, and it concluded that the nonrespondents are not MCAR, but the amount of bias they generate is small. (See Attachment V - Assessing Nonresponse Bias in the Consumer Expenditure Interview Survey.)

The MCAR part of the study had four sub-studies. They found different demographic groups have different response rates; respondents have different demographic characteristics than the American population as a whole; respondents’ demographic characteristics change over time; and a mathematical model predicting response rates had parameters on many of its demographic variables that were statistically significant. However, the most significant finding was that high-income households have lower response rates than low-income households, which is a concern because CE is an economic survey that focuses on expenditures. Nevertheless, all four sub-studies indicated that CE’s nonrespondents are not MCAR.

The bias part of the study also had four sub-studies. They examined four different nonresponse weighting adjustment procedures to get an idea of the range of possible values the “correct” nonresponse-adjusted expenditure estimates might have. All four procedures increased the CEQ’s expenditure estimates by about one percent from its base-weighted (i.e., unadjusted) values, and all four procedures decreased the CED’s expenditure estimates by about one percent from its base-weighted values. Thus in both surveys, CE’s expenditure estimates show about a one percent bias using the nonresponse procedures. The consistency of all four weighting procedures within each survey suggests that the results are robust.

So, overall, the study showed that CE’s nonresponse weighting adjustment procedure is working well. The nonrespondents are not MCAR, but the amount of bias they generate is small, and the nonresponse weighting adjustment procedure is doing a good job compensating for the bias. The study provided a counterexample to the commonly-held belief that if a survey’s data are not missing completely at random then its estimates are subject to nonresponse bias.

For more information on the calculation of response rates, see the memorandum from Sharon Krieger to David Swanson on “Response Rates in the Consumer Expenditure Survey” (2018) (Attachment U). For more information on the nonresponse bias studies, see “Assessing Nonresponse Bias in the Consumer Expenditure Interview Survey” (Attachment V).

2. Collection Methods

Field representatives from the U.S. Census Bureau, under contract with BLS, personally visit the households in the CEQ’s and CED’s samples to collect the data. Prior to the first household visit, respondents are sent an advanced letter informing them that they have been selected for the survey and asking them for their cooperation. For subsequent household visits in the CEQ survey, respondents are sent an advanced letter reminding them that is has been 3 months since they last participated in the survey and asking them for their cooperation again.

Field representatives visit each household in the CEQ’s sample every 3 months for 4 consecutive quarters to collect information on the expenditures they made during the previous 3 months. Though most interviews are collected by personal visit, approximately 20% of interviews are collected by phone at the respondent’s request or due to extenuating circumstances making a personal visit impractical. The field representatives enter the household’s responses into a laptop computer. After participating in the survey for 4 quarters, the household is dropped from the survey and replaced by another household. The households in the CEQ survey are on a rotating schedule with approximately one-fourth of the households in the sample being new to the survey each quarter. Due to the coronavirus pandemic, FRs can complete the CEQ by telephone as needed.

For the CED survey, field representatives visit each household in the sample two times to collect information on the expenditures they make during a 2-week period. On the first visit the field representatives introduce themselves, explain the survey, and leave two weekly diaries, one for each week of the survey period. The household members are asked to record all their expenditures over the 2-week period in those diaries. Then on the second visit, the field representatives pick up the two diaries and thank the household for participating in the survey. After participating in the survey for two weeks, the household is dropped from the survey and replaced by another household. Due to the coronavirus pandemic, procedures will be modified to include contacting the respondent by telephone in lieu of personal visits, emailing a link to a Diary form, telephone transcription of expenditures from the Diary, and the availability of an online Diary. (See Attachment D for a detailed description of CED procedural changes resulting from the coronavirus pandemic including an email template for sending the Diary electronically.)

After completing the second week of the CED survey and the fourth quarter of the CEQ survey, the households are sent a Thank You letter and a certificate of appreciation for their participation in the survey.

Estimation

The estimation procedure for both the CEQ and CED follow well-established statistical principles. The final weight for each sample CU is the product of its base weight (which is the inverse of the CU’s probability of selection); an adjustment factor to account for noninterviews; and a calibration adjustment factor that post-stratifies the weights to account for population undercoverage. A typical base weight for a CU in the CEQ is approximately 10,000, which means it represents 10,000 CUs – itself plus 9,999 other CUs that were not selected for the survey. A typical final weight is approximately 18,000, which means it represents 18,000 CUs in the population – itself plus 17,999 other CUs that were not selected for the survey and/or did not participate in the survey.

For additional information on CE’s sample design and estimation methodology, please see “Chapter 16, Consumer Expenditures and Income” in the BLS Handbook of Methods (Attachment W); Jay Ryan’s memorandum to Richard Schwartz on “PSUs for the Consumer Expenditure Survey’s 2010 Census-Based Sample Design,” December 18, 2012 (Attachment T); and Brian Nix’s memo on ‘Differences in Response Rates in the Consumer Expenditure Survey’ (Attachment X).

3. Methods to Maximize Response Rates

Keeping the CEQ’s and CED’s response rates as high as possible requires special efforts, particularly from the Census Bureau’s field staff. The field staff are trained in a variety of techniques designed to persuade people to participate in the survey, and they are also trained in techniques for “refusal conversion” designed to change the minds of people who refuse to participate in the survey. If someone refuses to participate in the survey, the field office sends a letter trying to persuade them to participate in the survey and a senior interviewer or supervisory field representative is assigned to the case for follow-up refusal conversion efforts. Of course refusal conversion efforts take time and cost money, so regional office staff try to decide which cases to work on and how much effort to put into them based on cost-effectiveness considerations.

Special computer processing techniques are also used in the CEQ to reduce respondent burden, which in turn helps keep response rates up. For example, some data collected in one interview are carried forward to subsequent interviews, such as data on household members and their personal characteristics, along with data on their properties, mortgages, vehicles, and insurance policies. Minimizing respondent burden, including interview length, are important factors in the effort to keep response rates up.

When field staff still cannot convert noninterviews to interviews, the estimation process has a noninterview adjustment to account for them. As mentioned above, every CU in the sample has a base weight equal to the number of CUs in the population it represents. In this process the respondent CUs have their weights increased to account for the nonrespondent CUs. The total sample of CUs (both respondents and nonrespondents) is partitioned into 192 subsets based on their region, CU size, income, and number of contact attempts.³ Then within each subset the base weights of the respondents are increased by multiplying them by a factor equal to the sum of the base weights for all CUs (both respondents and nonrespondents) divided by the sum of the base weights from just the respondent CUs. This makes the final weights of the respondents add up to the total number of CUs in the population.

4. Testing Plans

CE does not currently have any additional plans for testing. However, in the event that additional stimulus payments are sent by the government, the CE will add stimulus payment questions to the CEQ CAPI Instrument and submit a nonsubstantive change request. The questions, if added, will capture: 1) the receipt of the stimulus payments, 2) by which members of the Consumer Unit (CU), 3) the month the CU received the stimulus, 4) the amount of the stimulus, 5) how the CU received the stimulus (direct deposit or check), 6) the primary use of the stimulus and 7) whether any additional stimulus checks were received by the CU.

Additionally, CE will research mailing an advance postcard to all CEQ and CED respondents informing them that they will soon receive a letter inviting them to participate in the Consumer Expenditure Survey. (See Attachment Y for additional information.) If CE decides to proceed with this mailing, a nonsubstantive change request will be submitted.

5. Statistical Contacts

The sample design is a joint effort between BLS and the Census Bureau, with the two bureaus focusing on different aspects of the sample design, and the data is collected by the Census Bureau under contract with BLS. For more information on the sample design or the data collection effort, you may contact the following individuals.

Sample Design:

Stephen Ash (Census)

David Swanson (BLS)

(301) 763-4294

(202) 691-6917

Data Collection:

Jennifer Epps (Census)

Janel Brattland (BLS)

(301) 763-5342

(202) 691-5427

1 Unrelated people who share a housing unit are considered to be separate CUs if they are responsible for paying their own expenses in at least two of these three categories: food, shelter, and all other expenses. Likewise college students living away from home are considered to be separate CUs from their parents if they are responsible for paying their own expenses in at least two of these three categories.

2 The number of CUs comes from combining information about the total number of housing units in the Census Bureau’s sampling frames (i.e., the MAF) with observations made by CE’s field representatives about the number of CUs living in those housing units. CE’s observations in the field show the average number of CUs per occupied housing unit is approximately 1.015. For every 1,000 occupied housing units there are approximately 1,015 CUs. The number of CUs per stratum shown in the table above comes from allocating the nationwide total of 130 million CUs by the number of people living in each stratum according to the 2010 census.

3 There are 4 regions of the country, 4 CU size classes, 3 income classes, and 4 contact attempt classes, making 192 = 4 x 4 x 3 x 4 subsets into which the sample is partitioned. For nonrespondents the number of people in the CU is obtained from data collected in previous interviews or from talking to their neighbors. For all CUs (both respondents and nonrespondents) their income is estimated from a publicly available database from the IRS which has the average household income by zipcode. In the nonresponse adjustment process every CU is assumed to have its zipcode’s average income value.

File Type	application/vnd.openxmlformats-officedocument.wordprocessingml.document
File Title	Changes in section A
Author	FRIEDLANDER_M
File Modified	0000-00-00
File Created	2021-01-13