AMERICAN COMMUNITY SURVEY (ACS) |
Baseline |
|
February 2017 |
ACS RESEARCH & EVALUATION Analysis Plan |
|
2017 Adaptive Strategy Test |
|
|
|
|
|
|
|
|
|
|
|
|
Table of Contents
1. Introduction 1
2. Literature Review 1
3. Research Questions 2
4. Methodology 2
5. Potential Actions 15
6. Major Schedule Tasks 15
7. References 16
In 2013, the American Community Survey (ACS) introduced an internet mode for data collection. The addition of this mode helped lower the data collection costs for the ACS and provided a convenient way for respondents to complete the survey. However, some individuals either cannot or prefer not to respond through the internet. Those less likely to respond to a survey by internet include those 65 and older, adults with less than a high school education, and those living in households with a total income of less than $20,000 (Pew Research Center 2015).
The current method of mailing materials follows an internet push strategy. This method encourages households to respond by internet in the first two mailings and then provides a paper questionnaire in the third (sent about two weeks after the first mailing). This frustrates some respondents who do not have internet access or prefer to respond by paper. In fact, the addition of the internet mode resulted in self-response rates decreasing in certain areas (Baumgardner, Griffin, and Raglin 2014). We want to offer sampled housing units in those areas a paper questionnaire earlier in the mailing process (called the ‘Choice’ method). In the Choice method, households would be able to choose between responding online or by paper in the initial mailing.
Since the internet mode is the cheapest data collection mode and the Choice method is likely to increase mail returns (Ramos 2012), we only want to offer the Choice method to areas unlikely to respond by internet. We developed an algorithm to determine which areas (census tracts) should be offered the Choice method and will conduct a field test to evaluate the impact of the Choice mailing strategy on self-response return rates and cost. The algorithm is explained in detail in section 4.1.
Keywords: Data Collection Methods, Data Quality
In 2013, ACS evaluated the effects of adding an internet response option (Baumgardner, Griffin, and Raglin 2014). They found that adding internet increased the overall self-response rate from 2012 to 2013.1 However, using an internet push mail strategy may have discouraged some households without internet access (or who prefer to respond by paper) to not respond at all. In fact, there was a decrease in certain states and within certain demographic groups (e.g., older households and low-income households).
Similar research to the Adaptive Strategy Test was conducted in support of the 2020 Census in the 2015 National Content Test (Bentley and Mathews 2016). Tracts were classified as low, medium, or high based on the low response score from the Planning Database and the number of internet connections per 1,000 households. The results showed that offering a choice in mode was effective in the areas where a choice was offered (only the low category). Continuing research is under way to further develop the methodology for the 2020 Census. The results of this test may also offer insight for the 2020 Census.
Additional research conducted outside the U.S. Census Bureau has identified demographic characteristics associated with internet response (Pew Research Center 2015). These characteristics include age, education, race, income, and geographic location.
The 2017 Adaptive Strategy Test will be conducted during the October 2017 panel.
The research questions we want to answer are
What is the impact of offering a choice in mode on the self-response return rates (both overall and by mode), final response rates, response reliability, and cost?
What are the characteristics of the households who respond using the Choice method versus the Push method? Also answer this by mode.
What is the impact of offering the Choice method on form completeness versus the Push method? Also answer this by mode.
We will conduct two-sided t-tests at the α=0.1 significance level. We will compare Choice and Push within the Mail Preference and Mixed Preference categories and we will also make an overall comparison of Choice versus Push (Mail Preference and Mixed Preference combined). Table shells for these analyses are provided in section 4. If we have enough sample, we would also like to make comparisons within each rule of our chosen algorithm.
We will be using two mailing strategies for this project. The mail-out strategy for the Choice method will be similar to that which we use for the Puerto Rico Community Survey, in which a paper questionnaire is sent in the initial mailing. The Internet Push method will follow the current production mailing strategy, in which we push for response online first. With the exception of small language changes in the Choice method, we will attempt to keep the mail materials for both methods the same. Table 1 outlines the mailing strategies for each method.
Table 1. Mailing Strategy for Choice and Push Methods
|
Choice Method |
Push Method (Current Production) |
09/21/17 |
Pre-Notice Letter |
Initial Mailing
|
09/25/17 |
Initial Mailing
|
|
09/28/17 |
Reminder Postcard |
Reminder Letter |
10/13/2017 |
|
Paper Questionnaire Package
|
10/17/2017 |
|
Reminder Postcard |
10/19/2017 |
Replacement Questionnaire Package
|
|
11/02/2017 |
Final Reminder Postcard |
Final Reminder Postcard |
*The letter for the Choice method will offer two mode options while the letter for the Internet Push method will only offer internet.
To decide which households will receive the Choice method, we will use a classification algorithm to categorize census tracts2 into one of three groups: Mail Preference, Mixed Preference, and Internet Preference.
Mail Preference
These are tracts where we believe there is a preference to respond by mail or an inability to respond online and low self-response overall.
Mixed Preference
These are tracts that we believe may prefer mail and may benefit from being offered a mode choice in the initial mailing.
Internet Preference
These are tracts we believe are likely to respond online.
Once tracts are classified into one of these three groups, we will randomly select half of the methods panel (MP) groups and assign mailing strategies based on whether the sample address is in a selected MP group as shown in Table 2.
Table 2. Distribution of Mailing Strategies within Tract Categories
Tracts |
If address in selected MP group… |
If address not in selected MP group… |
Mail Preference |
Choice method |
Push method |
Mixed Preference |
Choice method |
Push method |
Internet Preference |
Push method |
Push method |
Assigning addresses this way will result in roughly half the sample addresses in the Mail Preference and Mixed Preference categories receiving the Choice method and the other half receiving the Push method. All addresses in the Internet Preference category will receive the Push method materials.
4.1 Algorithm for Census Tract Classification
The Census Bureau conducted similar research in preparation for the 2020 Census and is working on an updated algorithm (Bentley and Mathews 2016) to determine which tracts to offer a choice (the 2020 Census version of the Choice method is slightly different from the one being used for this test). Table 3 outlines the metrics used by the decennial classification algorithm.
Table 3. Metrics used in Decennial Classification Algorithm at Tract Level
Metric |
Source |
Ratio of mail returns to internet returns from 2013 to 2016 |
ACS 1-yr files |
Self-response check-in rate from 2013 to 2016 |
ACS 1-yr files |
Number of high speed internet connections per 1,000 households |
Federal Communications Commission (FCC) |
The percent of the population that is 65 and older |
Planning Database (PDB) |
The algorithm first calculates the ratio of mail check-in rates to internet check-in rates. If the ratio is greater than one, then the address is temporarily coded as mail preference, otherwise it is internet preference. Next, they looked at the propensity to respond. If the overall self-response check-in rate is less than 41.283 percent, it was considered low response, otherwise it was high response. If a tract was both mail preference and low response, it was placed in the choice category.
In addition, if a tract was mail preference and high response and either:
the percent of the population that is 65 and older is greater than 22 percent.
OR
the number of internet connections per 1,000 households is less than or equal to 400.
then it was placed in the choice category as well.
The cut-offs for the check-in rates and age (41.283 percent and 22 percent, respectively) were determined based on a desire to have about 20 percent of the tracts placed into the choice group, which was optimal for testing from a balance of sample size and cost concerns. The internet connection cut-off was determined based on the FCC data being categorical and the number of tracts in each category.
We evaluated this approach alongside other classification techniques (e.g., cluster analysis, discriminant analysis) and ultimately we chose to adopt a similar algorithm using mostly the same metrics (our percent 65 and older variable comes from the 2015 ACS 5-year estimates on American Fact Finder and not the PDB, for example) with some minor changes for the cut-off values to classify tracts as Mail Preference. To classify tracts as Mixed Preference, we adjusted the cut-offs to select more tracts and added an extra metric that calculates the difference in self-response check-in rates postinternet implementation minus preinternet implementation. We chose this algorithm because it worked as well or better than the other classification techniques and was more flexible for future use.
Tables 4 outlines the algorithm being used for this test for classifying tracts into Preference categories.
Table 4. Classifying tracts into Preference Categories
Classification rule |
Ratio of mail to internet return rates from 2013 to 2016 |
Self-response check-in rate for 2013-2016 |
High speed internet connections per 1,000 households |
Percent 65 and older |
Difference in self-response check-in before and after internet implementation |
If any of the |
>=1.20 |
< 41.283% |
- |
- |
|
following are true |
>=1.20 |
>=41.283% |
<=400 |
- |
|
then tract is |
>=1.20 |
>=41.283% |
- |
>22% |
|
Mail Preference |
>=1.75 |
- |
- |
- |
|
Else if any of the |
>1 |
< 50% |
- |
- |
- |
following are true |
>1 |
>=50% |
<=400 |
- |
- |
then tract is |
>1 |
>=50% |
- |
>22% |
- |
Mixed Preference |
- |
- |
- |
- |
< -10 |
Else tract is |
|
|
|
|
|
Internet Preference |
|
|
|
|
|
4.2 Sample Design of Test
The current proposed algorithm to identify census tracts that would benefit from receiving a paper questionnaire earlier in the mailing process has identified approximately 25,000 census tracts (as either Mail Preference or Mixed Preference) from the more than 72,000 census tracts. We estimate that 100,000 housing units in those tracts will be in the ACS sample in a month out of a total sample of 295,000. We will randomly assign approximately half of these housing units to the Choice mail methodology and half to the Internet Push methodology.
This sample design will allow detectable differences of approximately 1.0 percentage point between the self-response return rates. The power of the test, which is 80-percent and where α=0.1, assumes a 50-percent self-response rate.
4.3 Self-Response Return Rates
We want to determine the impact of offering a choice in mode. To compare the Choice method and the Push method, we will calculate self-response return rates at two points in time: at the beginning of the Computer Assisted Telephone Interview (CATI) operation, which starts November 1, 2017 before the final reminder is mailed and at the end of CATI. This second point in time will allow us to capture those who responded after receiving the final reminder (sent on November 2,2017). We will also calculate overall response rates at closeout.
The formulae for the self-response return rates are provided below. The denominator is the sum of the base weights of the units in the self-response universe.3 The numerator is the sum of the base weights of the units in the self-response universe that respond to the survey.
Total Self-Response Return Rate |
= |
Number of mailable/deliverable sample addresses that provided a nonblank4 return by mail or a complete or sufficient partial response by Telephone Questionnaire Assistance (TQA) or internet5 |
*100 |
Total number of mailable/deliverable sample addresses6
|
|||
Internet Self-Response Return Rate |
= |
Number of mailable/deliverable sample addresses that provided a complete or sufficient partial response by internet5 |
*100 |
Total number of mailable/deliverable sample addresses6
|
|||
Self-Response Return Rate |
= |
Number of mailable/deliverable sample addresses that provided a nonblank4return by mail or a complete or sufficient partial response by TQA |
*100 |
Total number of mailable/deliverable sample addresses6
|
If we receive more than one self-response from a single address, we will classify the response mode based on the first response received. Table 5 details the criteria for self-response.
Table 5: Response Criteria for Self-Response Return Rates
Mode |
Considered a valid self-response if… |
Internet |
|
|
|
|
|
|
|
|
Table 6 shows the comparisons of self-response return rates.
Table 6 - Impact of Choice Method on Self-Response Return Rates
Category |
Choice Method |
Push Method |
Choice - Push |
p-value |
Mail Preference Overall |
|
|
|
|
|
|
|
|
|
Internet |
|
|
|
|
Mixed Preference Overall Internet |
|
|
|
|
Mail and Mixed Preference combined Overall Internet |
|
|
|
|
4.4 Final Response Rates
At the end of all data collection operations, we will also calculate a final response rate for each treatment, by combining the self-responses, CATI responses, and Computer-Assisted Personal Interview (CAPI) responses.
The CATI universe is comprised of addresses that did not respond in the self-response phase of data collection (i.e., internet and mail data collection) and a small subset of unmailable addresses (i.e., an undeliverable zip code) for which we have telephone numbers. We will count a case as a CATI response if the address is in the CATI universe and we obtain sufficient information from a CATI interview for the response to be classified as a complete or sufficient partial response.
If we receive a self-response for an address after a CATI response, the case will be classified by the self-response mode.
Due to cost, the CAPI universe is comprised of a subsample of all remaining nonresponding addresses after the CATI operation as well as all unmailable and undeliverable addresses. We will apply a subsampling factor to the base weights of the CAPI cases subsampled to account for the CAPI cases not sampled. We will count a case as a CAPI response if the address is in the CAPI universe and we obtain sufficient information from a personal interview for the response to be classified as a complete or sufficient partial response.
If we receive response from multiple modes, we will assign the mode of response in the following order of preference: self-response, CATI, CAPI. If we receive more than one self-response from an address (internet or mail), we will select the first response received.
The final response universe, which is determined after the CAPI operation, is the same universe as the self-response universe for the initial mailing with the following exceptions:
the inclusion of unmailable addresses (except unmailable addresses that are not selected for the CAPI subsample).
the exclusion of out-of-scope addresses whose classification is determined during the CAPI operation.7
the exclusion of addresses confirmed to be businesses during telephone follow-up, telephone interviews, personal interviews, or TQA.
The formulae for the final response rate and the internet, mail, CATI, and CAPI portion of this rate are provided below:
Final Response Rate |
= |
Number of addresses that provided a nonblank4 complete or sufficient partial response in any mode8 |
*100 |
|
Total number of addresses in the final response universe
|
|
|||
Internet Portion of Final Response Rate |
= |
Number of addresses that provided a complete or sufficient partial response by internet5 |
*100 |
|
Total number of addresses in the final response universe
|
|
|||
Mail Portion of Final Response Rate |
= |
Number of addresses that provided a nonblank4 return by mail or complete or sufficient partial response by TQA |
*100 |
|
Total number of addresses in the final response universe
|
|
|||
CATI Portion of Final Response Rate |
= |
Number of addresses that provided a complete or sufficient partial response8 in CATI |
*100 |
|
Total number of addresses in the final response universe
|
|
|||
CAPI Portion of Final Response Rate |
= |
Number of addresses that provided a complete or sufficient partial response8 in CAPI |
*100 |
|
Total number of addresses in the final response universe
|
|
Table 7 shows the comparisons of final response rates.
Table 7 - Impact of Choice Method on Final Response Rates
Category |
Choice Method |
Push Method |
Choice - Push |
p-value |
Mail Preference Overall |
|
|
|
|
|
|
|
|
|
Internet CATI CAPI |
|
|
|
|
Mixed Preference Overall Internet CATI CAPI |
|
|
|
|
Mail and Mixed Preference combined Overall Internet CATI CAPI |
|
|
|
|
If there is time, we will also like to test the following:
Is there a difference in the rate of duplicate responses between the Choice and Push methods?
If there is an increase in self-response for the Choice method, when do we receive the return?
The Choice method will not be cost effective if it increases duplicate responses or late self-response returns (after we send out a second questionnaire).
4.5 Standard Error of the Estimates
We will estimate the variances of the point estimates using the Successive Differences Replication method with replicates – the standard method used in the ACS (see U.S. Census Bureau, 2014, Chapter 12). In calculating the self-response return rates and final response rates, we will use replicate base weights, which only account for sampling probabilities. We will calculate the variance for each rate and difference using the formula below. The standard error of the estimate is the square root of the variance.
Where:
the return rate, response rate, or difference estimate calculated using the full sample base weights
the return rate, response rate, or difference estimate calculated for the rth replicate
4.6 Reliability
To assess the impact of providing the ACS questionnaire in the initial mailing package on the ACS estimates we will evaluate the impact on the reliability of the estimates. The measure of reliability is the variance of our estimates (response rates). We want to see if adopting the Choice method (and holding sample size and cost constant) will increase or decrease our variances over the Push method. The metric, a ratio of the sum of the squared weights for the interviews under the Choice method compared to the Push method, will estimate the overall impact on the reliability of the estimates rather than the impact on specific characteristics. The weights will be adjusted to take into consideration the effect of the increased response overall as well as the shift in mode distribution due to higher/lower self-response.
4.7 Cost Analysis
Adding a Choice method will increase costs due to the extra mail materials (and may increase data capture costs as well), specifically the mailing of two paper questionnaires. If the self-response return rates for the Choice areas increase significantly however, it may balance out the additional costs by lowering the CATI and CAPI workloads. We will conduct cost analyses to determine how many census tracts, if any, we can send the Choice materials to without increasing the overall costs for the survey.
Table 8 shows the impact of the Choice Method on reliability and cost.
Table 8 - Impact of Choice Method on Reliability and Cost
- |
Maintain Current Sample |
Maintain Current Cost |
Maintain Current Reliability |
|||
Change in Cost† |
% Change in MOE |
% Change in Sample |
% Change in MOE |
% Change in Sample |
Change in Cost† |
|
Choice Method |
|
|
|
|
|
|
4.8 Characteristics of Respondents using the Choice method versus the Push Method
We will compare the demographic characteristics of those who respond within the different tract categories. Table 9 provides the demographic characteristics and the different levels within where we expect there may be a difference between who responds by internet versus mail. The subcategories for computer, internet access, age, race, Hispanic origin, educational attainment, income, marital status, and census region are based on definitions used by the Pew Research Center in their research (Pew Research Center 2015). The subcategories for English ability and urban/rural are based on definitions in Raglin et al. (2016). These definitions are used because the subcategories from the Pew Research Center do not completely match up to the data we have available.
Table 9. Demographic Characteristics for Comparing Mailing Strategies
Demographic Characteristics |
Subcategories* |
Computer |
Laptop Smartphone Tablet Other computer No computer |
Internet access |
Yes No |
Age |
18-29 30-49 50-64 65+ |
Race |
White only Black only |
Hispanic origin |
Hispanic Non-Hispanic |
Educational attainment |
High School or less Some college or Associates degree Bachelors degree or higher |
Income |
Less than $20,000 $20,000 to $99,999 $100,000 or higher |
Marital status |
Married Divorced/separated Widowed Never married |
English ability |
English only Very well Well Not well Not at all |
Nativity |
Born in the United States, Puerto Rico or U.S. Island Areas Not Born in the United States, Puerto Rico or U.S. Island Areas |
Urban/rural |
Urban Rural |
Census region |
Northeast Midwest South West |
*Subcategories may require collapsing depending on the sample sizes.
Table 10 is an example of comparing the Choice and Push methods by demographic characteristics. Table 10 is only for age. We will have a separate table for each characteristic we compare.
Table 10 – Characteristics of Those who Respond: Choice Method versus Push Method
AGE |
Choice Method |
Push Method |
Choice minus Push |
p-value |
Overall |
|
|
|
|
18-29 |
|
|
|
|
30-49 |
|
|
|
|
50-64 |
|
|
|
|
65 and older |
|
|
|
|
|
|
|
|
|
18-29 |
|
|
|
|
30-49 50-64 65 and older |
|
|
|
|
Internet |
|
|
|
|
18-29 |
|
|
|
|
30-49 |
|
|
|
|
50-64 |
|
|
|
|
65 and older |
|
|
|
|
In addition to examining the distributions of the two treatments, we will examine the percentage difference in the weighted counts for the demographic subcategories. A nonstatistical comparison to the difference in weighted counts for the variable as a whole, will allow us to better examine the causes of differences in distributions. The tables for these metrics will look similar to Table 10 with addition of the weighted counts.
4.9 Item Nonresponse
We will compute overall item form completeness rates for each section of the questionnaire (basic demographics, housing, and detailed person) for those respondents who provided an internet or mail response.
For each response, the denominator is the number of questions that should have been completed (after adjusting for skip patterns based on responses or removing cases that did not provide a response to an earlier dependent question). The numerator is the number of these items that were actually completed.
Once the item form completeness rate is calculated for each response, we will find the average across all responses for each self-response mode. We will compare Choice vs Push for each section of the questionnaire.
Table 11 is an example of determining the impact of the Choice method on item completeness. There will be a separate table for each section of the questionnaire.
Table 11 - Item Completeness: Choice versus Push Method
Mode |
Choice Method |
Push Method |
Choice - Push |
p-value |
Overall |
|
|
|
|
|
|
|
|
|
Internet |
|
|
|
|
The findings from the Adaptive Strategy Test could result in a change in the mailing strategy and/or materials for a subset of census tracts. We may decide not to use our algorithm or find that it needs revision. The classification rules may change or we may decide to only offer a choice to the Mail Preference category.
Tasks (minimum required) |
Planned Start (mm/dd/yy) |
Planned Completion (mm/dd/yy) |
Author drafts REAP, obtains CR feedback, updates and distributes Final REAP |
02/22/17 |
05/23/17 |
PM/Author conducts research activities |
09/26/17 |
01/30/18 |
Author drafts initial report, obtains CR feedback, updates and obtains final report sign off by the CRs and Division Chief |
01/30/18 |
06/14/18 |
Author develops presentation and conducts briefing to R&E WG |
03/30/18 |
04/13/18 |
Author develops and obtains approval of the R&E Project Record (REPR) |
07/05/18 |
07/18/18 |
Baumgardner, S., Griffin, D., & Raglin, D. 2014. “The Effects of Adding an Internet Response Option to the American Community Survey”, 2014 American Community Survey Research and Evaluation Report Memorandum Series, ACS14-RER-21. Retrieved March 6, 2017 from https://www.census.gov/library/working-papers/2014/acs/2014_Baumgardner_04.html
Bentley, M., Mathews, K. 2016. “2015 National Content Test Study Plan for Optimizing Self-Response”, DSSD 2020 Decennial Census R&T Memorandum Series #E-07, U.S. Census Bureau, May 19, 2016.
Matthews, B., Davis, M., Tancreto, J., Zelenak, M.F., Ruiter, M. 2012. “2011 American Community Survey Internet Tests: Results from Second Test in November 2011 – Revision”, DSSD 2012 American Community Survey Memorandum Series #ACS12-MP-03-R1, U.S. Census Bureau, June 19, 2012.
Pew Research Center. 2015. “Coverage Error in Internet Surveys Retrieved XXXX ##, 2017 from http://www.pewresearch.org/files/2015/09/2015-09-22_coverage-error-in-internet-surveys.pdf.
Raglin, D., Baumgardner, S., Poehler, E., Walejko, G., Hagedorn, S., Otmany, J. 2016. ACS Research and Evaluation Analysis Plan, “Online Communications: Improving Survey Response Campaign, Draft.
Tersine, A. 2016. Work Request RS16-6-0194. Adaptive Strategy for Targeting Internet vs Mail in First Mailing. Washington DC, U.S. Census Bureau. Retrieved on March 7, 2017 from the “2017 Adaptive Strategy Test” folder located in the ACSO SharePoint site.
U.S. Census Bureau (2014). “American Community Survey Design and Methodology”. Retrieved on May 04, 2017 from
1 The national self-response rate was calculated using data from January 2013 to June 2013. This was done in part because the government shutdown in October 2013 disrupted the ACS collection process.
2 We chose to categorize at the tract level because this is the level of geography that the 2020 Census research is using.
3 The base weight is the inverse of the probability of selection for a sample unit.
4 A blank form is a form in which there are no data defined people and the telephone number listed on the form by respondents
is blank.
5 See Table 5 for the definition of sufficient partial response by internet.
6 We will remove addresses where the initial mailing was returned by the postal service as undeliverable-as-addressed and a
response has not been received by the time of the replacement mailing from the universe of eligible households.
7 Out-of-scope addresses include demolished homes, homes under construction, relocated houses or trailers, and addresses that
are a permanent business or storage facility.
8 The respondent reaches the first question in the detailed person questions section for the first person in the household.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
File Title | NSC_AdaptiveStrategyTest_AttachmentC_Analysis_plan |
File Modified | 0000-00-00 |
File Created | 2021-01-22 |