Melissa A. Cidade
Diane K. Willimack
Kristin Stettler
Demetria V. Hanna
Expanding Record-Keeping Study Methodology to Assess Structure and Availability of Data in Business Records
Sixth International Conference on Establishment Surveys (ICES VI)
Thursday, June 17, 2021
Thank you very much to my colleague, Diane Willimack, for organizing this exciting session. I am eager to share with you some of the important work we have been doing at the Census Bureau regarding measurement error and economic surveys, and what a forum to do it in!
I am Melissa Cidade, a survey methodologist in the Census Bureau’s Data Collection and Methodology Research Branch in the Economic Statistical Methods Division. I have had the pleasure of working on an extensive redesign of our economic surveys in collaboration with my co-authors: Diane Willimack, Kristin Stettler, and Demi Hanna. During this presentation, I am going to walk you through some of the methodological innovations we have developed over the past two years or so in support of this redesign.
<number>
<number>
In-Scope Economic Surveys
A few years back, the Census Bureau enlisted the National Academies of Sciences – referred to as naz – to systematically review our annual economic surveys. This panel was charged with providing recommendations to improve the “relevance and accuracy of the data, reduce respondent burden, incorporate alternative sources of data where appropriate, and streamline and standardize Census Bureau processes and methods across surveys” (NAS 2016: 6).
On your screen now are the surveys that were in-scope for that review. You can see that for the most part, the Census Bureau has used a sector-driven approach to survey development – note that manufacturing and services have their own set of surveys, trade has a set of surveys, and so on. One of the recommendations from the NAS panel is the implementation of an Annual Business Survey System – which has evolved into the Integrated Annual Survey, a streamlined, cross-sector, harmonized survey instrument designed to lower respondent burden while still achieving high quality, timely data in the service of the American economy.
Driving the development of the Integrated Annual Survey has been a portfolio of research projects to bring together disparate sources of data to one survey instrument. This presentation will provide an overview of the innovative methods we have used to understand the record-keeping practices of businesses, in order to develop this streamlined instrument.
Citation: National Academies of Sciences, Engineering, and Medicine. (2018). Reengineering the Census Bureau’s Annual Economic Surveys. Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/25098.
2
2
Reviewers: We recognize that the Integrated Annual Survey may have a name change before the date of this presentation and will update accordingly.
Research Questions:
1.Definitions: how do businesses define themselves relative to the Census Bureau definitions?
2.Accessibility: how accessible are key data points at varying business units?
3.Burden: how resource intensive is gathering data at these varying business units?
2
Throughout the research period, we were guided by a few key research concepts and questions. First, we were interested in how businesses defined themselves, both internally and relative to Census Bureau definitions. This included the business’ units of operation, industry, and other key identifiers. We were also driven to understand how accessible data were at differing levels within a company – that is, could respondents get the data to the level of granularity we were asking with minimal effort and maximum accuracy? Finally, as with all of our data collections, we asked about the burden – or resource intensiveness – of pulling these data at various levels within the company.
This research is building on the emergent body of literature referred to as the “unit problem” – the mismatch between the administrative unit, that is, how the business sees itself, and the statistical unit, that is the standardized unit created by statistical agencies for data collection purposes. In order to minimize what van Delden et al call “unit errors” (2018) we must begin by understanding how businesses are keeping their records before we can ask them to map these records to our data requests.
Citation: van Delden, A., Lorenc, B., Struijs, P., & Zhang, L.-C. (2018). Letter to the Editor: On Statistical Unit Errors in Business Statistics. Journal of Official Statistics, 34(2), 573–580.
2
2
In-Scope Businesses and Respondents
Eligibility Criteria:
•Sampled in at least two in-scope surveys
•In at least two industrial sectors
•More than one establishment
2
Phase 1 | Phase 2 | |
Number of Industries* | ||
Three or fewer | 16 | 25 |
Four or more | 5 | 5 |
Number of establishments* | ||
30 or fewer | 9 | 19 |
31 or more | 12 | 11 |
*Numbers may not sum to total interviews because of missing data. |
Phase 1 Interviewing
<number>
In this next section, I’ll talk about the phase 1 interviewing we conducted.
<number>
<number>
Phase 1: The Chart of Accounts
<number>
For Phase 1 interviewing, we built our interviewer protocol around a generic company chart of accounts. A chart of accounts (COA) is an index of all the financial accounts in the general ledger of a company. It is an organizational tool that provides a breakdown of all the financial transactions that a company conducted during a specific accounting period, broken down into subcategories
First, we showed respondents the mock chart of accounts on your screen now. We asked respondents to compare and contrast how their business is structured and maintains its records. We probed respondents on their chart of accounts relative to their company’s structure, industries in which the company operates, and locations, as well as the types of software used to maintain their chart of accounts.
Once we had a better understanding of the company chart of accounts and record keeping practices, we could then ask follow-up questions about specifics within their chart of accounts. Here, we were really interested in mismatches between our understanding of how records are kept and retrieved and the questions respondents encountered on Census Bureau surveys. We centered these questions around five areas as applicable to the company:
- Business segments by industry (kind of business)
- Sales/receipts/revenues
- Inventory
- Expenses, including payroll and employment
- Capital expenditures.
<number>
<number>
Phase 1 Findings
7
All companies followed a general chart of accounts with varying levels of detail. It was within these details that we made some interesting findings.
(click) Before I get too far, Let me just take a moment to talk about the North American Industry Classification System – or NAICS. NAICS is a hierarchical taxonomy with nested values – our example today is a fictious company, the Census Cat Company. This company – at the two-digit NAICS level – is in the Manufacturing Sector. At the three-digit level, we see it is identified as a food manufacturing company, a more specific type of manufacturing. At the four-digit level – the industry group – the Census Cat Company is designated as an animal food manufacturing company. And, at the six-digit level – the national industry code – we see that it is a dog and cat food manufacturing company. Note that NAICS is transnational, used in Canada, Mexico, and the United States down to the fifth digit. The sixth digit is nation-specific so that each country can produce country-specific detail. A complete and valid NAICS code contains six digits.
(click) One of the major findings to come from this interviewing was the mismatch in unit definitions. We noted that at least seven companies may have been misclassified or may not have understood Census Bureau distinctions among classifications, for example, a 4 digit vs 6 digit NAICS classification. We also noted that the NAICS taxonomy is unnatural for respondents; that is, because NAICS is a standardized classification system, and businesses often need more or different details in their chart of accounts, mapping records to the corresponding NAICS is challenging for some and impossible for others.
(click) The second takeaway from the first round of interviewing is that businesses are using disparate terminology to describe their various operating units. When asked about “establishments,” for example, respondents indicated that their company used a different term – such as region, office, department, line of business, and business segment – or did not track data by individual locations at all.
(click) The third finding from round one interviewing was insight into companies’ response processes. Almost all respondents indicated that completing Census Bureau surveys required more than one person in the company to respond. They also indicated that Census surveys do not match internal reporting, and are uncomfortable making decisions on how to manipulate their data to match our requests.
All three of these findings directly influenced the phase 2 interviewing.
(To access the NAICS manual, click: https://www.census.gov/naics/)
7
7
Phase 2 Interviewing
7
Taking the information we learned in phase 1, we then introduced a novel methodology to assess data accessibility.
7
7
Definitions and Equivalencies
General and Specific Industry
Thinking about the problem of misclassification identified in phase 1, we then asked respondents pointedly about their NAICS Classification. In this case, first, we asked about their six digit NAICS classifications, calling it their ‘specific’ industry. Remember that six-digits is the most specific principal business activity code we have. We then asked about the four digit NAICS classification – so, less detailed - and called it the ‘general industry’. Note that the interviewers walked respondents through each of the six digit NAICS codes we could find for their company, asked for feedback or impressions, and then did the same for the four digit NAICS codes. This part of the interview was time consuming and difficult; we noticed that respondents had trouble understanding their NAICS classification, and then struggled to think of how their business units might related to their NAICS classification. Classifying a business is a critical component to collecting data on that firm, both in terms of directing respondents to the appropriate survey forms based on their classification and in terms of sampling, weighting, imputation, reporting and other important data handling techniques.
It seems that the industry classification either worked or didn’t, with few falling in between: We were surprised at how the NAICS data that we had in our records was inconsistent both across and between companies.
<click> Here you can see a few examples of respondents positively reacting to their general and specific industry codes. In these cases, the NAICS we had on file made sense to respondents and fit how respondents saw their company relative to the NAICS categorization scheme.
<click> And, here are a couple of quotes where respondents struggled with the NAICS codes we have assigned them. Remember that this was after discussing the NAICS with the respondent, and having them focus in on their classifications: they still did not agree with or understand their assignments.
<number>
<number>
Card Sort
Defining Accessibility
And, here is the big reveal from our card sort exercise.
On this slide – each row is a completed interview. Each column is a business unit – company, establishment, line of business, state, specific industry, and general industry. And, each color corresponds to the accessibility of the data at that unit within that topic – grouped across the top: revenue, capital assets, inventories, operating expenses and payroll. Note that blank spaces denote where respondents were either unable to speak to this topic (that is, they were not involved in reporting on this topic), this concept didn’t apply to the specific business (that is, this business didn’t have inventories, for example), or the interviewer suspected that the respondent wasn’t clear on the task at hand (that is, the respondent wasn’t understanding the units being used or some other communication challenge).
<number>
<number>
Card Sort: Revenue
Question:
What were the TOTAL sales, revenue, and other operating receipts for this [business unit] in 2019?
REVENUE | |||||
GREEN | YELLOW | ORANGE | RED | X | |
COMPANY | 29 | 0 | 0 | 0 | 1 |
ESTABLISHMENT | 15 | 9 | 3 | 1 | 2 |
LINE OF BUSINESS | 13 | 10 | 2 | 0 | 5 |
STATE | 13 | 8 | 7 | 0 | 2 |
SPECIFIC INDUSTRY | 6 | 7 | 9 | 2 | 6 |
GENERAL INDUSTRY | 9 | 9 | 4 | 1 | 7 |
Key Takeaways:
•Using a generic Chart of Accounts during interviewing helps to center respondents to the task at hand.
•Use cognitive methodology to give context to the resultant data.
•Card sorts can be a useful tool in establishment surveys.
•Visualization of qualitative data can have a powerful impact with stakeholders.
<number>
We are just now churning through the rich data that these almost 60 interviews produced. Since this session is focused on methodology and not findings, though, I want to cue you in on three major methodological findings of the work so far.
First, when asking about record keeping practices, we found that providing respondents with a generic Chart of Accounts helps them to understand the task at hand, and to identify and explain differences in the ways that they maintain their records.
Next, we found that leaning on tried and true cognitive testing methods provides a way of assessing content validity – that is, by asking respondents pointedly about their definitions of ‘accessibility’ and of ‘unit,’ we could then provide context for the results of these interviews.
Third, this research is the application of a method not usually used in testing establishment surveys. We found that by using the card sort, respondents were engaged in the interview. The card sort acted as a way of operationalizing the four-point scale measuring accessibility, a complex construct.
Finally, we have found with our stakeholders that the visualizations from the card sort are a captivating way to present complex interview data. We have found that even our most quantitatively-minded colleagues like the display of the qualitative data in a way that is more “rows and columns” than we usually have.
<number>
<number>
Thank you!
Diane K. Willimack
1+ (301) 763-3538
Melissa A. Cidade
1+ (301) 763-8325
<number>
We will be chewing through these interviews in the next weeks and months and are excited at the insights to be gained. To that end, if you want to follow up with any of us, our contact information is listed on the screen now. Thank you very much!
<number>
<number>
File Type | application/vnd.openxmlformats-officedocument.presentationml.presentation |
File Modified | 0000-00-00 |
File Created | 0000-00-00 |