2018 End-to-End Census Test Paper Data Capture Operational Assessment Study Plan
Paper Data Capture Integrated Project Team
Draft Pending Final Census Bureau Executive Review and Clearance.
|
|
|
Oct 31, 2018 Version 2.1 |
Page intentionally left blank.
IX. Division Responsibilities 12
XI. Issues That Need to be Resolved 13
XIII. Document Revision and Version Control History 13
List of Tables
The U.S. Census Bureau conducts a census of population and housing every ten years to collect and tabulate statistics about the population and economy of the nation, as required by the U.S. Constitution. Materials are mailed and then delivered by the United States Postal Service (USPS) to household addresses or left at household addresses by Census Bureau employees. The 2020 Census responses are expected to be collected using internet, paper, phone, and enumerator assistance.
The purpose of the 2018 End-to-End Census Test (2018 E2E CT) Paper Data Capture (PDC) Operational Assessment Study Plan is to collect and analyze the metrics of each operational system and process used in this test. Strengths and weaknesses will be identified to determine recommendations to improve the quality and accuracy of paper processing for the 2020 Census.
The goals of the 2018 E2E CT include documenting final workloads, costs, and lessons learned. Another aim of this test is to capture reporting metrics that will enable analysis and recommendations for enhancing procedures, processes, and operations of PDC in the 2020 Census. The 2018 E2E CT will include the following PDC-related activities:
Conduct Optical Character Recognition (OCR), Optical Mark Recognition (OMR), and Key from Image (KFI) activities for paper questionnaires using the Integrated Computer Assisted Data Entry (iCADE) system
Conduct Build-a-Bin
Conduct Translation
Conduct Box Check-in (Group Quarters [GQ])
Evaluate the adequacy, completeness and overall effectiveness of all operational PDC procedures
Evaluate all of the systems that support the PDC operation
Evaluate the deployed design configurations and the process workflow integration.
The scope of this 2018 E2E CT PDC Operational Assessment Study Plan covers activities related to the PDC operation, which is responsible for the capture and conversion of data from paper questionnaires. This operation consists of the following components:
PDC Universe Management
Paper Questionnaire Preparation
Paper Questionnaire Data Capture
PDC Quality Assurance Check
Data Distribution (which includes sending Response and Case Status Data to Response Processing and Paradata to Program Management)
Final Data Disposition (which includes Check-out, Receipt Confirmation, and Disposition).
The 2018 E2E CT PDC operation will be conducted from March 16 through August 31, 2018, at the U.S. Census Bureau’s National Processing Center (NPC) in Jeffersonville, Indiana. Portions of the 2018 E2E CT will be held in three locations, covering more than 700,000 housing units: Pierce County, Washington; Providence County, Rhode Island; and the Bluefield-Beckley-Oak Hill, West Virginia area. The 2018 E2E CT peak operations, which include paper data capture, will be exclusively in the Providence County, Rhode Island site, which has a population of over 600,000 with more than a quarter-million housing units. For this test, paper questionnaires will be delivered by the USPS to the NPC.
The following section describes the 2018 E2E CT Paper Data Capture operational processes that have been designed in preparation for the 2020 Census.
Paper Data Capture Overview
For Census 2000 and the 2010 Census, PDC systems development and operations were outsourced, with the exception of operations at the NPC, which were executed under government leadership. After the 2010 Census, iCADE was introduced as a decennial census paper processing alternative through the Improving Operational Efficiency (IOE) program and was assessed to be a viable system for paper data capture during the 2020 Census. The iCADE system is a large-scale, efficient, and accurate data capture system that incorporates both automated and manual data capture functionalities. It is currently in operation for several ongoing censuses and surveys at the U.S. Census Bureau.
In addition to the iCADE system, three other important systems are involved in the PDC 2018 E2E CT operation: the Intelligent Mail Barcode Postal Tracking System version 2 (IPTSv2), the Automated Tracking and Control (ATAC) system, and the Census Image Retrieval Application (CIRA).
The IPTSv2 is used to ingest data from the USPS Intelligent Mail Barcode (IMB) Tracing Service. The system is used to track U.S. Census Bureau outbound mail deliveries and the inbound business reply mail making its way back to the Paper Data Capture Center (PDCC). Inbound mail data can be used by the PDCC to measure expected workloads. The 2018 E2E CT will be the first time IPTSv2 will be in production.
The ATAC system provides automated workflow case management used by NPC to integrate various NPC applications in support of PDC operations. This system will perform initial check-in activities of mailback questionnaires and box shipments from the field (GQ).
CIRA was developed after the 2010 Census to provide Census Bureau analysts secure access to images of questionnaires and the corresponding response data needed for research and analysis. It features a centralized query system providing retrieval of and access to questionnaire images and associated captured data. CIRA is integrated with iCADE and provides an archive of images of paper responses. For the 2018 E2E CT, CIRA will provide paper data capture images only. However, for the 2020 Census, CIRA will display all mode data visually in support of the Age Search Operation along with the paper images.
The PDC operation begins with receiving the envelopes containing respondent-returned paper questionnaires delivered by the USPS. Group Quarters forms are also in scope for the operation and will be received in FedEx boxes shipped from the field. Once received, USPS respondent envelopes or FedEx GQ boxes are checked in and envelopes are opened with returned questionnaires removed and prepared for scanning and data capture. GQ boxes are not opened at this point but will be checked-in within the check-in area. The forms will be scanned at batching. For USPS mail returns, automated processes are used to check in questionnaires. However, mail pieces may be rejected due to:
Failure to read Census ID barcodes
Nonstandard respondent-supplied envelopes and/or out-of-scope materials
Mail returns too damaged to run through the sort check-in equipment (sorter).
Rejected mail pieces are inspected by clerks to determine whether they should be re-run through the sorter or checked in manually. After questionnaires are successfully checked in, they are prepped, guillotined, and scanned to produce electronic images through iCADE.
The iCADE application utilizes a combination of OMR, OCR, and KFI functionalities to capture response data from the electronic images. OMR is an automated function to capture check-box data, OCR automatically reads respondent handwriting to capture write-in response data, and KFI is used to manually key any response data that were not successfully captured using OMR and OCR. For OCR, iCADE provides a confidence measure indicating the confidence with which the system reads the handwritten response. OCR data captured with low confidence are sent to KFI for manual capture. The rules for checking data validity have been pre-specified and are integrated into system processing. The iCADE Service Level Agreements (SLA) for overall output accuracy for the three iCADE functions are as follows:
OMR – 99.8 percent accuracy
OCR – 80.0 percent of the write-in fields are successfully captured with 99.0 percent accuracy (write-in fields not successfully captured by OCR are sent to KFI).
KFI – 97.0 percent accuracy [The iCADE OCR acceptance rate will determine the manual key entry workload. Currently for the 2020 Census, DCMD is expecting 20 percent of the workload to require key entry].
The PDC operation includes processes that ensure the quality of the data captured from the paper questionnaires. For OMR, automated processes identify ambiguities when there are unexpected check-box entries (e.g., multiple marks for a single-mark response field, scratched out check-box marks, marks that are difficult to determine, etc.). These ambiguities are sent to manual intervention to determine the correct response data. Fields captured with high confidence using KFI and OCR are sampled, and five percent are re-captured through KFI. The data from the original capture are compared and adjudicated against data from the re-capture to determine the quality level for the batch. If the error level within the sample exceeds established quality thresholds, the entire batch is reworked to correct all errors. These quality assurance processes ensure that the PDC operation is able to meet the accuracy requirements above.
Upon completion of data capture activities, the data and images captured by iCADE are made accessible through CIRA.
DCMD, iCADE, and NPC success criteria for the 2018 End-to-End Census Test PDC operations are as follows:
PDC activities are conducted and completed within budget
The PDC operation is able to process the workload on schedule
An acceptable digital image of all paper questionnaires is captured, stored, and backed up
Rejected questionnaires are electronically re-processed or manually processed
PDC workflow requirements are met
All iCADE system interfaces are validated
All system reporting operations are validated
QA process is validated and results support processing assumptions
All issues are logged and resolved, and solutions are implemented in timely manner.
Previous research and literature review
The 2014 Census Test was a site test of 192,066 housing units in Montgomery County, Maryland, and in the District of Columbia. Census Day was July 1, 2014. The objective of the 2014 Census Test was to test different contact strategies and nonresponse follow-up procedures. Sampled units were primarily encouraged to respond by the internet, with a paper questionnaire sent only on the fourth or fifth contact, depending on the particular contact strategy. The overall response rate was 65.9 percent, with 10.2 percent of the sampled housing units responding using the paper questionnaire (Bentley and Rothhaas, 2016). The 2014 Census Test was the first decennial test that used iCADE for PDC, and it was also the first time iCADE performed OCR on non-numeric data for any survey.
For the 2015 Census Test, the PDC operation occurred largely in the same way as the 2014 Census Test and employed the iCADE system. The 2015 Census Test included housing units in and around Savannah, Georgia, and Maricopa County, Arizona, in which new contact strategies and nonresponse follow-up methods were tested. Testing included a new path that allowed anyone with or without a U.S. Census Bureau provided user ID to respond using the internet instrument. The 2015 Census Test included use of a bilingual booklet form of the questionnaire, which presented a new processing challenge for PDC. Some respondents tore unused language portions from booklets prior to returning them, which led to difficulties in processing, if the barcode for tracking the form was on the side that was discarded. (This phenomenon continued to be observed in subsequent census tests.)
The 2015 National Content Test consisted of a sample of 1.2 million housing units from across the 50 States, the District of Columbia, and Puerto Rico. Census Day was September 1, 2015. This test focused on the content of the questionnaire and optimizing self-response through different contact strategies. Ten different paper questionnaires were designed and sent to sampled housing units for self-response. The overall response rate was 51.9 percent (Mathews et al, 2017), and the paper response rate ranged across contact strategies from less than 10 percent to approximately 30 percent (Phelan, 2016). Processes to capture the data for the 2015 National Content Test were similar to those in the previous two tests, in that they used iCADE, though interfaces were updated and the quality assurance sampling strategy was improved to incorporate stratification.
The objective of the 2016 Census Test was to test a number of technical and operational improvements. All 453,425 housing units in the self-response mailing universe of Los Angeles County, California, and in Harris County, Texas, were eligible for the test which had a Census Day of April 1, 2016. The response rate was 45.9 percent, with 13.4 percent of housing units responding with the paper questionnaire (Coombs et al, 2017). For this test, several changes to the PDC operation were made, including the processing of responses in non-English languages. In previous tests, responses containing non-English characters were rejected during OCR and sent to KFI. The 2016 Census Test introduced the use of a Spanish application that was provided with the OCR Commercial Off the Shelf (COTS) product. The Spanish application created some problems because it rejected many fields that, upon further review, appeared to be legible and in English. The 2016 Census Test also included paper questionnaires in Chinese and Korean, so the PDC operation had to account for responses in these new languages as well. Questionnaires were presorted by the sorter, then clerks reviewed each booklet during batching to separate returns that contained responses in languages other than English or Spanish – a very manually intensive process.
For the 2017 Census Test, Census Day was April 1, 2017 and included a national sample of 80,000 housing units. The 2017 Census Test offered another opportunity to implement and test questionnaire changes. Based on the experience in the 2016 Census Test, the OCR Spanish application was removed, and Spanish characters were rejected by OCR and sent to KFI. For questionnaires completed in languages other than English and Spanish, keyers would “flag” the questionnaires and those questionnaires were sent for translation.
The paper questionnaire data capture assessment will address the following questions:
Did the PDC operation begin and end on time?
Success: PDC needs to begin and end on time to ensure all of our dependent stakeholders needs are met.
How did the actual PDC workload compare to the expected PDC workload?
Success: PDC operations and systems were fully prepared for workloads that are driven by response rates.
How many questionnaires did keyers flag as foreign language (i.e., non-English/Spanish)? How many of those questionnaire responses were actually provided in a foreign language?
Success: The PDC system worked as designed without errors and the PDC operations followed the process/procedures for success. Only those fields that should have gone to translation went and were translated correctly.
What was the productivity of the iCADE components –OMR, OCR, and KFI?
Success: All of these applications performed as planned/designed. Error rates, accept rates, expected performance met our expected results.
What were results of the quality assurance checks? Did the iCADE components meet the established accuracy requirements?
Success: QA results met SLA’s. Stratified sample was configured to generate a representative QA result. The performance of Paper Data Quality (PDQ) results validated iCADE’s internal QA process.
What were the costs of the PDC operation?
Success: All costs were within scope of projected expense profile.
What events during the 2018 E2E CT impacted the PDC operation and how?
Success: PDC managed all 2018 E2E CT events with success.
Did the check-out procedure perform as anticipated?
Success: All check-out activities accomplished effectively.
This section describes how each assessment question will be answered.
Did the PDC operation begin and end on time? This question will be answered by a snapshot of planned and actual dates from the Integrated Master Schedule (IMS). We will also provide discussion of any operational issues that impacted our ability to meet scheduled deadlines.
How did the actual PDC workload compare to the expected PDC workload? This question will be addressed with the following table:
PDC Workload |
Expected |
Actual |
Total Questionnaires/Boxes/ICQs |
||
Questionnaires Mailed |
|
|
Questionnaires Returned |
|
|
GQ Box Check-in |
|
|
GQ ICQ Forms |
|
|
Source: IPTS v2 report, iCADE reports, ATAC reports
How many questionnaires did keyers flag as foreign language (i.e., non-English/Spanish) responses? How many were actually provided in a foreign language?
The number and percent of questionnaires flagged by keyers as foreign language (i.e., non-English/Spanish) will be reported, as defined by the following formula:
If expected or estimated rates are available, they will be compared to actual rates.
This data source is iCADE reports.
What was the productivity of the iCADE components – check-in, OMR, OCR, and KFI? This question will be addressed by the following table:
Function |
Forms Processed per Hour |
Batching |
|
Scanning |
|
KFI |
|
Check-out/Exception Processing |
|
Source: iCADE reports
What were results of the quality assurance checks? Did the iCADE components meet the established accuracy requirements? The quality assurance results will be addressed in the following table:
Segment |
Requirement |
Error Ratio |
OCR sample |
5% |
|
KFI sample |
5% |
|
PDQ sample |
10% |
|
OCR error rate |
1% |
|
KFI error rate |
3% |
|
OMR error rate |
0.2% |
|
OCR rejected batches |
N/A |
|
KFI rejected batches |
N/A |
|
Source: iCADE reports
To address the accuracy requirements, we will include an analysis comparing actual rates to required rates, as shown in the following table:
Table 4: Comparison of Actual to Required Rates
Task Measured |
Requirement |
Actual % |
OMR accuracy rate |
99.8% |
|
KFI accuracy rate |
97% |
|
OCR accuracy rate |
99% |
|
OCR acceptance rate |
80% |
|
KFI sample |
5% |
|
OCR sample |
5% |
|
Imaging capacity |
14,000 forms/day |
|
Source: iCADE Reports
Due to the small workload of the 2018 E2E CT, it is highly unlikely that this metric for imaging capacity would be met.
For form repair rate, the Exception Review Report (may be called the Form Repair Report) will be used to list counts of the types of exceptions encountered during 2018 processing. The counts will be broken down by their resolution codes in Table 5.
Item |
Total Documents |
Sent for Re- Batching |
No Re-Batching |
Needed Further Inspection |
Missing Page |
|
|
|
|
Loose Page |
|
|
|
|
Duplicate Page |
|
|
|
|
Flagged Questionnaire |
|
|
|
|
Batched Not Scanned |
|
|
|
|
Scanned Not Batched |
|
|
|
|
Count Issue |
|
|
|
|
Blank Form |
|
|
|
|
Unattached Page |
|
|
|
|
Correspondence |
|
|
|
|
Possible Train Wreck* |
|
|
|
|
Source: iCADE reports
* Possible Train Wreck – forms within a batch that appear in reverse order or flipped
What events during the 2018 E2E Census Test impacted the PDC operation and how?
Were operational and/or system changes made to the PDC operation prior to or during the 2018 E2E CT?
If changes or modifications were made to operations and/or systems, including iCADE, ATAC, IPTSv2, and CIRA, how did the timing of the changes impact PDC operations? How were the changes managed? (If there was a Change Request number, it will be provided.)
Were the PDC processing procedures complete, accurate, and effective? The response to this question will be addressed in the table below.
Were anomalies observed during the 2018 End-to-End Census Test that are expected to occur during the 2020 Census?
Procedure |
Major rewrite necessary |
Minor corrections necessary |
No updates needed |
Check-in |
|
|
|
Batching |
|
|
|
Manual registration |
|
|
|
Scanning |
|
|
|
Build a bin |
|
|
|
Translation |
|
|
|
Quality Assurance |
|
|
|
Check-out |
|
|
|
Paper Data Quality (PDQ) |
|
|
|
KFI |
|
|
|
ORT Observation |
|
|
|
Source: NPC, ATAC, iCADE, ORT Findings
Did the check-out procedure perform as anticipated?
Was the NPC operational staff able to execute the check-out process with success?
Did the check-out applications perform as expected?
Were there any forms that received a successful check-out status that did not meet the check-out criteria for 2018?
Were all of the forms flagged to be pulled at check-out actually pulled and reprocessed?
The answer to these questions will include details about the checkout process and any problems encountered. For example, number of questionnaires moved to Library Storage (form destruction); number of forms needing resolution/were pulled for re-processing; number of digital images that were accepted (UTS [Unified Tracking System] sufficiency check); and/or number of digital images that were captured (CIRA), stored, and backed up.
The table below identifies the source of all data being captured to respond to the questions indicated.
Data File/Report |
Source |
Purpose |
Expected Delivery Data |
iCADE Event data |
iCADE / Census Data Lake |
Questions 2, 3, 4, 5, 7 |
03/13/2018 – 08/31/2018 |
iCADE Response Data |
iCADE / Census Data Lake |
Questions 3, 4, 5 |
03/13/2018 – 08/31/2018 |
IPTSv2 Event data |
IPTSv2 / Census Data Lake |
Questions 1, 2, 7 |
03/13/2018 – 08/31/2018 |
ATAC Check In data |
ATAC |
Question 2, 7 |
03/13/2018 – 08/31/2018 |
UTS |
UTS |
Question 8 |
03/13/2018 – 08/31/2018 |
CIRA |
CIRA |
Question 7, 8 |
03/13/2018 – 08/31/2018 |
Source: CDL Reports, iCADE, IPTSv2, ATAC, UTS, CIRA
The PDC operation will likely be modified from how it is executed in the 2018 E2E CT to how it will be executed in the 2020 Census.
Forms will not be destroyed for the 2018 E2E CT.
This assessment relies on daily metrics from the iCADE system. If iCADE does not deliver the needed data, then the relevance of this assessment will be reduced.
This assessment relies on lessons learned from the 2018 E2E CT. If the documentation of lessons learned is delayed, then the assessment may be delayed or have a reduced scope.
The PDC operation has the following limitations: The PDC operation for the 2018 E2E CT is being conducted in a location on the NPC campus in Jeffersonville, IN. This is not where the 2020 Census PDC operations will be conducted. The sites to be used for the 2020 Census are being acquired and made ready for use by the General Services Administration (GSA). The PDC operation has limited ability to influence the GSA acquisition and build out processes; thus, an assessment of this process and how it impacts 2020 PDC production is beyond the scope of this operational assessment. Another limitation is the size of the anticipated workload. A small workload provides for reduced throughput. This makes it difficult to assess the ability of the PDC solution to process a decennial sized workload. Overall, the results of the assessment will not be representative of what would actually occur in a decennial census environment.
Division or Office
|
Responsibilities |
DCMD |
|
iCADE |
|
DSSD |
|
NPC |
|
ATAC |
|
CIRA |
|
IPTSv2 |
|
Operational Milestone |
Date |
PDC Operation Starts |
03/13/2018 (A) |
PDC Operation Finishes |
08/31/2018 (A) |
Assessment Milestone |
Date |
Receive, Verify, and Validate PDC Assessment Data |
10/01/2018 |
Distribute Initial Draft PDC Assessment Report to the Decennial Research Objectives and Methods (DROM) Working Group for Pre-Briefing Review |
12/18/2018 |
Decennial Census Communications Office (DCCO) Staff Formally Release the FINAL PDC Report in the 2020 Memorandum Series |
06/06/2019 |
Source: 2020 Census IMS
All known issues have been resolved.
Role |
Review/Approval Date |
Decennial Census Management Division (DCMD) ADC For PDC |
06/20/2018 |
Decennial Research Objectives and Methods (DROM) Working Group |
08/22/2018 |
Decennial Census Communications Office (DCCO) |
mm/dd/yyyy |
Version/Editor |
Date |
Revision Description |
Version 1/A. Schwoegl |
01/24/2018 |
Substantial changes per DROM’s direction |
Version 2/D. Forsht |
06/15/2018 |
Incorporated comments for DROM submission |
Version 2.1/D.Forsht |
11/2/2018 |
Incorporated all comments post-DROM |
Acronym |
Definition |
2018 E2E CT |
2018 End-to-End Census Test |
ATAC |
Automated Tracking and Control |
CDL |
Census Data Lake |
CIRA |
Census Image Retrieval Application |
COTS |
Commercial Off the Shelf |
DCCO |
Decennial Census Communications Office |
DCMD |
Decennial Census Management Division |
DROM |
Decennial Research Objectives and Methods Working Group |
DSSD |
Decennial Statistical Studies Division |
ECON-ADEP |
Economic Directorate – Associate Director for Economic Programs |
GQ |
Group Quarters |
GSA |
General Services Administration |
iCADE |
Integrated Computer Assisted Data Entry System |
IOE |
Improving Operational Efficiency |
IPTSv2 |
Intelligent Mail Barcode Postal Tracking System version 2 |
IMB |
USPS Intelligent Mail Barcode |
IMS |
Integrated Master Schedule |
KFI |
Key From Image |
MCI |
Manual Check-In |
NPC |
National Processing Center |
NRFU |
Nonresponse Follow up |
OCR |
Optical Character Recognition |
OMR |
Optical Mark Recognition |
ORR |
Operational Readiness Review |
PDC |
Paper Data Capture |
PDCC |
Paper Data Capture Center |
PDQ |
Paper Data Quality |
QA |
Quality Assurance |
SME |
Subject Matter Expert |
SLA |
Service Level Agreement |
UAA |
Undeliverable As Addressed |
USPS |
United States Postal Service |
UTS |
Unified Tracking System |
Bentley, M. and Rothhaas, C (2016), “2020 Census Research and Testing: 2014 Census Test Results for Optimizing Self-Response.” 2020 Census Program Memorandum Series, #2016.04, June 14, 2016.
Coombs, J., Lestina, F., and Phelan, J. (2017), “2020 Research and Testing: 2016 Census Test Optimizing Self-Response, Language Services, Race and Ethnicity, and Relationship Experiment Analysis Report.” 2020 Census Program Internal Memorandum Series, #2017.22.i, September 8, 2017.
Coon, D., Osborne, N., Muenzer, R. (2012), “2010 Census Decennial Response Integration System Paper Questionnaire Data Capture Assessment Report.” 2010 Census Planning Memoranda Series, #195, <https://www.census.gov/2010census/pdf/2010_Census_DRIS_Paper_Questionnaire_Data_Capture_Assessment.pdf>, May 21, 2012.
Mathews, K., Phelan, J., Jones, N.A., Konya, S., Marks, J., Pratt, B.M., Coombs, J., and Bentley, M. (2017), “2015 National Content Test Race and Ethnicity Analysis Report.” 2020 Census Memorandum Series, #2017.08, <https://www.census.gov/programs-surveys/decennial-census/2020-census/planning-management/memo-series/2020-memo-2017_08.html>, February 28, 2017.
Phelan, J. (2016), “2020 Research and Testing: 2015 National Content Test Optimizing Self-Response Report.” 2020 Census Program Internal Memorandum Series, #2016.57.i, November 22, 2016.
U.S. Census Bureau (2017a), “2018 End-to-End Census Test: Goals, Objectives, Success Criteria (GOSC) and Research Questions,” Version 2.4, April 24, 2017.
U.S. Census Bureau (2017b), “2018 End-to-End Census Test: One-Pager,” Version 1.5, April 24, 2017.
U.S. Census Bureau (2017c), “2018 End-to-End Projected Paper Responses, DSSD-draft” June 6, 2017.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Author | Heidi Kauffman Brady (CENSUS/DCMD FED) |
File Modified | 0000-00-00 |
File Created | 2021-01-20 |