1205-0421 Supporting Statement Part B_FINAL 6.29.18

This document describes the statistical methods used by the Occupational Information Network (O*NET®) Data Collection Program. It is Part B of a Supporting Statement being submitted to the Office of Management and Budget (OMB) to request a continuation of the program, with minor revisions, for an additional 3 years. Other sections of the Supporting Statement, contained in separate documents, include Part A (justification for the program) and Appendices A–D (questionnaires, mailing materials, a list of related publications, and nonresponse analyses, respectively).

The O*NET Data Collection Program continually operates to populate and maintain a current database on the detailed characteristics of workers, occupations, and skills. The program received initial OMB clearance in 1999 for a pretest; six subsequent clearances have allowed main study data collection to continue without interruption since June 2001. Our current clearance expires September 30, 2018. This request is to continue to update occupations that reflect older data as well as to collect data on new and changing occupations for 3 more years (October 1, 2018–September 30, 2021), subject to annual budget levels.

Part B describes the sampling universe, sampling methods, and expected response rates; procedures used to collect the data; methods to maximize response rates; and various tests of procedures that have been conducted. A list of statistical consultants is provided, along with the references cited in the text. In addition, as noted above, Appendix D provides in a separate document the nonresponse analyses conducted by project staff.

Sampling Universe, Sampling Methods, and Expected Response Rates

A multiple-method data collection approach for creating and updating the Occupational Information Network (O*NET) database has been developed to maximize the information for each occupation while minimizing data collection costs. The primary source of information for the database is a survey of establishments and sampled workers from within selected establishments, which is referred to as the Establishment Method of data collection. With its two-stage sample design, employees are sampled in their workplace, the establishments being selected in the primary stage and employees being selected in the secondary stage.

Although the Establishment Method provides the best approach for most occupations, the survey contractor, RTI International, sometimes uses a special frame (e.g., a professional association membership list) to supplement the Establishment Method in a dual-frame approach when additional observations are required. When this supplementation to the Establishment Method is used, a dual-frame adjustment is made to the sampling weights to account for the coverage overlap between the two sources of collected data.

A second method entails recruitment of appropriate occupation experts who can supply the information required for an occupation (Occupation Expert Method, or OE Method). An occupation expert is someone who has worked in the occupation for at least 1 year and has 5 years of experience as an incumbent, trainer, or supervisor. Additionally, an occupation expert must have had experience with the occupation within the most recent 6 months. The OE Method is used for occupations as necessary to improve sampling efficiency and avoid excessive burden, as when it is difficult to locate industries or establishments with occupation incumbents; employment is low; or employment data are not available, as is the case for many new and emerging occupations.

Establishment Method

Establishment Method Sampling Universe

The central goal of the O*NET Data Collection Program is to provide data for each of the O*NET Standard Occupational Classification (SOC) occupations, which are prevalent to varying degrees in different industries in the United States. Estimates from this program are designed to assist users in distinguishing among occupations and are not necessarily designed to capture all of the subtle differences between jobs in different industries. Nonetheless, the O*NET sampling universe for each occupation is generally a subset of all employees in the occupation who are working in the United States. This subset, or target population for the occupation, is defined by two criteria: (1) its workers represent a majority of job incumbents in the occupation, and (2) data among this set of establishments can be gathered with reasonable efficiency.

Previous O*NET experience has shown that trying to build a sampling frame that covers 100% of an occupation is inefficient and poses undue burden for some establishments. For example, the occupation-by-industry matrix data suggested that a very small number of bricklayers could be found in establishments in the hospital industry; however, asking a point of contact (POC) in a hospital about bricklayers led to some difficulties. In addition to the questioning’s being unduly burdensome, often the Business Liaison (BL) lost credibility when a POC was asked about occupations not likely associated with his or her establishment, as with bricklayers in hospitals. Moreover, the establishment POC may give some false negative responses because the POC simply does not know whether some rare occupations exist in his or her establishment. This situation would be particularly likely for larger establishments. To address these concerns, the target population is defined so that it includes establishments in industries and size categories in which the occupation is most prevalent.

When less-than-complete population coverage is allowed, it is possible that some bias may be introduced into the study estimates if the covered and noncovered population members would give substantially different responses to the survey questions. To evaluate this potential bias in the O*NET estimates, an internal assessment was conducted that considered 18 randomly selected occupations for which at least 80% population coverage had been achieved. The linkages of these 18 occupations to industries were then reconsidered, and reduced sets of industries were determined that covered only 50% of workers in each occupation. Estimates for a selected set of outcomes were then computed from the reduced data set, which simulated estimates at the 50% level of population coverage. When the original data with at least 80% coverage were compared with the reduced data with 50% coverage, no systematic differences in the estimates were observed. Almost all of the differences between the two sets of estimates were very small and symmetrically distributed around zero. The observed pattern could be explained by random sampling error alone and provided no evidence of bias due to reduced frame coverage. The investigation concluded that no systematic bias is introduced with the use of a population coverage minimum of 50% for each occupation compared with a minimum coverage of 80%. On the basis of these results, O*NET Establishment Method data collection now maintains a population coverage of at least 50% for each occupation.

Sampling Waves

To help identify industries in which particular occupations are employed, the O*NET sampling method uses employment statistics published by the U.S. Bureau of Labor Statistics (BLS) and supplemented by empirical information identified during O*NET data collection. Groups of approximately 50 occupations each, called primary waves, are formed so that the occupations in a primary wave are employed in a similar set of industries. For example, carpenters and plumbers are both employed by establishments in construction-related industries. Thus, when establishments are selected from the industries associated with a primary wave of occupations, a selected establishment is much more likely to employ one or more of the occupations in the wave than it would have been to employ one or more occupations not grouped this way. This method minimizes the number of establishments that must be contacted for selection of the required number of employees for an occupation. For example, when construction trades, such as those of carpenters and plumbers, are grouped together in a primary wave of occupations, it is much more likely that an establishment selected from construction-related industries will employ more than one of the 50 related occupations in the wave than would be the case if sampling had been from a broader set of industries associated with a group of unrelated occupations.

Each primary wave of occupations is scheduled to be fielded in three subwaves of establishment samples. The subwaves are identified as X.1, X.2, and X.3, where X represents the set of primary occupations and where the accompanying number represents the order in which the subwaves of establishment samples occur. For example, Subwave 3.1 denotes the first sample of establishments for the occupation set known as Wave 3, and 3.3 denotes the third sample of establishments for the occupation set. Any occupation that requires additional respondents is included in the next subwave. The first subwave of establishments uses the Occupational Employment Statistics (OES) data to indicate those industries most likely to employ the occupations. It is designed to include a wide range of industries and to cover at least 50% of the target population. As each subwave establishment sample is selected, the experience gained from the previous subwaves is used to more effectively target the sample to industries in which the occupations have been demonstrated to be found.

If, after being fielded in its X.3 subwave, an occupation lacks a sufficient number of completed respondents, then it is fielded in a completion wave. Completion waves combine the difficult-to-complete occupations from several waves and are designed to target industries with a high probability of employing the occupations. The goal of a completion wave is to ensure that the number of establishments selected for each occupation is sufficient to complete all occupations in the wave. Statistically, a completion wave is no different from the X.1, X.2, and X.3 subwave sampling process, with the same sampling, weighting, and estimation methods being used to conduct the completion wave. Essentially, a completion wave adds a fourth subwave of sampling for some difficult-to-complete occupations. Packaging together some of these occupations in a combined wave maintains operational efficiency.

Sampling steps are carried out for the primary occupations associated with a wave. The primary occupations are those selected for a wave as a result of the clustering of occupations likely to be employed in the same industries. Once the sets of industries to be targeted in a wave are identified, additional secondary occupations likely to be found in these industries may be added to the wave and allocated to the selected establishments. To improve efficiency, if a selected establishment employs fewer than the maximum number of allowed primary occupations, secondary occupations are included for that establishment.

The method described above for sampling waves yields two major benefits. First, prescribing sampling from industries that have been determined by empirical data to employ the occupation of interest maximizes the efficiency of the sample. Second, it minimizes the oversampling of any one occupation. Because the establishment sample size for a particular set of occupations is spread over three subwaves, if an occupation is found more easily than expected, the sample for future subwaves can be used to find other occupations in the wave rather than to continue searching for this particular occupation.

To minimize both the cost of conducting the O*NET Data Collection Program and the burden placed on the establishments, the number of employees selected into the sample and the number of returned questionnaires are monitored carefully on a daily basis. Once it becomes clear that at least 15 of the goal of 20 completed respondents will be available for each of the three domain questionnaires for an occupation, consideration is given to terminating further sampling of employees for that occupation. This step is taken because of the difficulty of estimating the rate at which employees will be encountered during employee sampling. In some cases, employees from an occupation are much easier to locate than anticipated, and the desired number of responding employees is quickly exceeded; continuing to sample employees for such an occupation would use resources inefficiently and burden the establishments unnecessarily.

The method used to control employee sample selection is called Model-Aided Sampling (MAS). With this method, target numbers of sampled employees are defined, before data collection begins for each occupation, by census region, business size, and industry division. MAS ensures that the resulting sample of employees is distributed across the target cells approximately in proportion to the population distribution across the target cells. MAS sample size targets are based on information from the OES survey conducted by BLS and from establishment information provided by Dun & Bradstreet (D&B). Once data collection begins, daily progress toward the targets is monitored closely for each occupation. Once a cell’s MAS target is achieved, the selection of employees in that cell is stopped. This cessation allows future data collection subwaves to focus on the cells that have not yet reached the MAS targets.

The use of MAS to enhance the efficiency of the O*NET sample is considered one of the most important sample design innovations in the history of the program. This approach has dramatically reduced data collection costs, with minimal effects on the accuracy of the estimators. Development of the MAS methodology began in early 2004 and continued through the end of 2007. As part of this research effort, various sample cutoff or stopping rules were investigated by means of Monte Carlo simulation. A cutoff rule determines the point at which efforts to interview certain types of establishments and employees are discontinued because the prespecified cell sample size criteria have been satisfied. These studies showed that, under a wide range of cutoff rules, MAS does not bias the O*NET estimates or their standard errors. At the same time, MAS dramatically reduces the number of establishment contacts required to satisfy the random sample allocation for an occupation. This finding has resulted in substantial reductions in respondent burden, data collection costs, and time required to complete an occupation (Berzofsky, Welch, Williams, & Biemer, 2006). On the basis of these results, MAS was fully implemented in 2008. Additionally, an empirical evaluation of MAS was undertaken in 2012, and the findings of that study support the earlier findings from the simulation studies (Berzofsky, McRitchie, & Brendel, 2012).

Sampling Steps for the Primary Occupations Associated with a Wave

As mentioned, the Establishment Method involves multiple sample selection steps. Establishments are selected during the first steps of selection, and employees are selected during the later steps. This sample selection process is diagrammed in Exhibit B-1 and is further detailed in text.

Step 1: Create establishment sampling frame. Two major sources of information are used to create the establishment sampling frame. First, a list covering nearly 17 million establishments in the United States is constructed from the D&B list of U.S. establishment locations. Use of the D&B frame is based on an evaluation that showed it to be the least costly data source that had the essential establishment-level information, including industry type and location-specific employment. The frame is updated monthly to ensure that the most current and accurate establishment information possible is used for selecting the sample for each subwave. Additional information from the OES survey, conducted by BLS, is merged with the D&B list of establishments. From the combined file a matrix is created for each occupation; the matrix lists the industries in which the occupation is found and, for each industry, the number of employees and associated establishments by census region and by each of four size categories.¹

Exhibit B-1. Summary of Sample Selection Process

Step 2: Determine industries to target for each occupation in subwave. Using the matrix developed in Step 1, the industry–region–employee-size categories for each occupation are classified into one of four concentration groups: high, medium, low, or none. This classification helps to target industries in which an occupation is likely to be found. For example, if it is believed that establishments in a particular industry or industries have a good chance of containing a particular occupation, then the industry is classified as high. Similarly, industries with a low chance of having an occupation are classified as low. To increase efficiency in the sampling process, the sample is designed to oversample the high industries and undersample the low industries for each occupation. None denotes those industries that were not expected to contain any employees in the occupation of interest or that had a negligible proportion of the employees.

In subsequent waves (i.e., X.2, X.3, and completion waves), adjustments to the concentration levels for an occupation may be made on the basis of many factors, including empirical data collected previously for the occupations, progress made in meeting targets for industry, establishment size, region, the other occupations included in the subwave, the number of available establishments, and the identification of industries with a high probability of employing the occupation.

Step 3: Select initial sample of establishments from frame. First, for each industry, the number of establishments to be selected is determined. However, because only limited access to some D&B list data fields is possible before sample selection of establishments, a stratified simple random sample of establishments is first selected from the list, with a sample size larger than the number that will ultimately be contacted. For this large simple random sample of establishments, the O*NET team has access to the full D&B establishment information that will be used in Step 4 of selection. Within each stratum, the selection probability for the i-th selected establishment is

, (1)

where and are the number of establishments selected and number of establishments in the population, respectively. The associated sampling weight for this step is

. (2)

Step 4: Select final set of establishments for subwave. A subsample of establishments is selected with probability proportionate to a composite size measure (CSM; Folsom, Potter & Williams, 1987) from the simple random sample of establishments selected in Step 3. If a single occupation were being studied, then common sampling practice would be to select a sample of establishments with probabilities proportional to the number of employees working in the occupation at each establishment. Because several occupations are to be sampled from the selected establishments, a cross-occupation composite size measure is used to lessen the variation in selection probabilities and in the final analysis weights. The composite size measure accounts for the estimated number of employees in the occupations of interest within an industry group, as well as for the overall sampling rates. The composite size measure for i-th establishment selected in Step 3 is

, (3)

where the summation is over the occupations (j), is the overall sampling fraction for the j-th occupation, m_j is the desired national sample size of employees from the j-th occupation, is the estimated total number of employees in the j-th occupation on the frame, and M_ij is the estimated number of employees in the j-th occupation from the i-th establishment.

For each occupation, the sampling rate is generally greatest for those establishments classified in the high industry group for the occupation and successively lower for each of the remaining medium, low, and none industry groups. Once the composite size measure is calculated, all nongovernmental industries are sorted in descending order by industry so that similar industries are proximate to each other before they are split into four equal groups based on the composite size measure. These four groups form the nongovernmental industry strata. Governmental establishments are not classified into separate industries on the D&B frame. Consequently, all governmental establishments constitute a fifth industry stratum in the design.

In addition, some occupations are highly concentrated in only one or two industries. These industries are split from the previously defined industry strata groupings to form separate “special” industry strata. These special strata often prove valuable for particular occupations because otherwise a small number of establishments would be selected from these industries. Forming the special strata for occupation-valuable industries with comparatively small size measures ensures that a minimum number of establishments from the “special” industry strata are selected into the final sample. Establishments are further stratified by number of employees in the establishment and by census region. Establishments with a large number of employees are oversampled, whereas establishments with few employees are undersampled. The degree of over- or undersampling varies for each subwave.

Chromy’s (1979) probability minimum replacement (PMR) selection procedure is then used to select a probability-proportional-to-size sample of establishments within each stratum, with probabilities of selection

, (4)

where is the number of establishments selected from the stratum and is the sum of the composite size measures for establishments from the stratum.

The sampling weight associated with this step is

. (5)

Step 5: Assign occupations to selected establishments. To limit the burden for a particular establishment, each establishment selected in Step 4 is assigned a maximum of 10 occupations, randomly selected with probability proportional to size. Here the size measure is the product of the sampling rate for the occupation (f_j) and the establishment’s estimated number of employees within the occupation, M_ij. Before selection of a sample of the occupations, occupations certain to be selected because of their large sizes are included in the sample and removed from the frame, and the number of times they would have been “hit” (which, by the PMR method, can exceed 1) is recorded. Then the remaining (noncertainty) occupations are sampled and placed in a random order. The certainty occupations are listed first, followed by the randomly ordered noncertainty units. For each establishment, both the set of up to 10 occupations and the number of times each occupation was selected (which could be greater than 1 for certainty occupations) is entered into the Case Management System (CMS).

As before, Chromy’s (1979) PMR selection method is used to select occupations with probability proportional to size. To understand how his method is applied here, one may suppose the i-th establishment has J_i occupations associated with it. The size measure for the j-th occupation is defined as

so that

. (6)

A sample of up to 10 occupations for each establishment will be selected, with the expected number of times that the j-th occupation is selected being , which may be greater than 1 for some occupations. For an occupation j where it is greater than 1, the occupation is selected with certainty and assigned an O*NET point value equal to by randomly rounding to one of its two adjacent integers. That is,

and

(7)

where Int and Frac are the integer and fractional parts of a decimal number. This rounding provides an integer number of selections associated with each selected establishment while retaining the correct noninteger expected value for the number of selections. The certainty occupations appear at the top of the list used by BLs to inquire about occupations at the establishment. From among the remaining occupations, a probability-proportional-to-size sample is selected. If C_iis the number of certainty-occupation sampling hits from the i-th establishment, , summation over the certainty occupations, then the remaining occupations are selected with probabilities summation over noncertainty occupations. The selected noncertainty occupations are assigned an O*NET point value = 1. As noted previously, the selected noncertainty occupations are then placed in a random order and follow the certainty occupations on the list of occupations used by the BLs.

When a POC is identified within each establishment, the RTI BL reviews the list of occupations with the POC, asking the POC to estimate the number of employees at that establishment in each occupation. Each time the BL receives a response that is greater than zero, a counter within the CMS is increased by the associated O*NET point value, the randomly rounded number of times the occupation was selected. If the counter reaches 5 before the BL has completed the list of occupations, the BL stops. After the maximum 5 occupations are identified, the POC is asked to roster all individuals in the selected occupations.

To determine the final occupation selection probabilities, one must adjust for the occupations remaining on the establishment’s sampling list at the point where the BL stopped as a result of having found the maximum number of occupations to be included in data collection. It is assumed that the resulting sample of up to 5 occupations is a random sample of the originally selected occupations for an establishment. This assumption is supported by the random ordering of the noncertainty occupations. Let be the total number of sampling hits among all of the occupations about which the BL inquired before stopping; then, , summation over the occupations inquired about by the BL. The final selection probability for the j-th occupation from the i-th establishment is

. (8)

The associated sampling weight is

. (9)

This method accomplishes two important goals:

It results in an approximate random sample of occupations with known probabilities of selection.
It limits POC and establishment burden. This goal is achieved because the number of positive POC responses is limited to a maximum of 5. If the company is large and happens to have employees in all 10 occupations, then stopping after 5 occupations minimizes the perceived burden on the POC, as opposed to the alternative of asking for employment estimates for all 10 occupations and then subselecting 5.

Step 6: Specify employee sample selection algorithm for each occupation/ establishment. In this step of selection, the algorithm is specified for randomly selecting employees from an employee roster provided by the POC. The resulting number of employees selected from each occupation is proportional to the number of times the occupation was selected in Step 5, its O*NET point value. However, to further minimize burden on an establishment, the total number of employees selected within any single establishment never exceeds 20, and the total number of employees selected in each occupation (within each establishment) never exceeds 8. If fewer than 20 employees are rostered and fewer than 8 are rostered for each occupation, then all rostered employees are selected. Otherwise, a random sample of employees is selected, subject to the constraints just described. If n_ij and N_ij are the number of employees selected and the number of employees listed, respectively, from the j-th occupation at the i-th establishment, then the selection probability for an employee k from the j-th occupation is

(10)

and the associated sampling weight is

. (11)

Step 7: Specify occupation-specific algorithm for random assignment of questionnaire types to sampled employees. In this step, the algorithm is specified for assigning each selected employee to a domain questionnaire type. The survey is designed to collect data for each occupation from at least 20 respondents to each of three different domain questionnaires (Generalized Work Activities, Work Context, and Knowledge). At this step of selection, all employees selected in Step 6 are randomly assigned to one of the three questionnaire types. The questionnaire assignments are made in proportion to the number of employee respondents required for each questionnaire type in a subwave. To implement this algorithm, a queue is established that lists the order in which questionnaire types are assigned to employees from a specific occupation. The questionnaire types are listed in random order, with each type occurring in the queue at a rate proportional to the number of completed questionnaires required from that type. When an employee is selected, he or she is assigned the questionnaire type at the head of the queue. The next listed questionnaire type is then moved to the top of the queue for use with the next selected employee.

Although employees are randomly assigned to a questionnaire, analysis weights are computed for each occupation, not separately for each of the three O*NET questionnaire types used within each occupation. Because the task questions are the same across the three questionnaire types for each occupation, the estimates for these questions are produced from all the respondents for the occupation. Selected incumbents are assigned randomly, with equal probabilities, to one of the three questionnaire types, with approximately the same number of incumbents responding to each of the questionnaires. Consequently, bias is not introduced for estimates of means and proportions specific to a single questionnaire type. Producing the analysis weights for the entire occupation effects greater stability because the entire occupation offers a larger available sample size than each of the three individual questionnaires offers.

Step 8: Randomly assign selected establishments to Business Liaisons for data collection. The final step of sample selection randomly assigns selected establishments to a BL. To make this process more efficient and ensure that the BL workloads are equal, establishments are stratified by industry grouping, and assignments are made within strata. To do so, the Operations Center manager assigns up to three BLs to each stratum and indicates the number of establishments to be assigned to each BL. The only exceptions to the randomization process occur when multiple establishments are sampled from the same organization, in which case all establishments are assigned to the same BL, or when observed trends inform decisions about matching specific BLs to industries for future case assignments. Although establishments are randomly assigned to BLs, this step does not enter into the overall selection probabilities or analysis weights: it is an administrative step to reduce any potential bias associated with the BLs.

The various weighting components associated with many of the Establishment Method sampling steps are combined to produce the final analysis weights, as described in the subsection on weighting.

Supplemental Frames

If the sample yield for an occupation proves to be far less than expected from the Establishment Method, the use of a special frame is considered for completing an occupation when additional Establishment Method subwaves would likely be nonproductive or inefficient. In this situation, if a suitable supplemental frame can be obtained, then the occupation is removed from the Establishment Method wave and sampled separately. Supplemental frames were used for 3.5% of the O*NET-SOC occupations published to date. A supplemental frame may be either a listing of establishments highly likely to employ a particular occupation, or a listing of incumbents working in an occupation. For example, a trade association of business establishments highly likely to employ an occupation would be appropriate when the occupation is highly concentrated in a particular type of establishment. In addition, workers in some occupations tend to be members of associations or to be licensed. In such situations, it is often possible to obtain a frame listing either establishments or incumbents working in the occupation.

When a listing of establishments is obtained, a random sample of the establishments is selected from the frame. The sample usually will be stratified by geographic location and any other available establishment information that may be related to type of incumbents working in the occupation. Simple random samples are usually selected, because often little information is known about the number of incumbents the establishments employ. Consequently, if n and N are the number of establishments selected and the number on the frame, respectively, within a stratum, then the selection probability for the i-th selected establishment is

, (12)

and the associated weight is

. (13)

The selected establishments are then included in Step 6 of the sampling process, with the single target occupation associated with each establishment.

On the other hand, when the supplemental frame lists incumbents working in an occupation, then a simple random sample of incumbents is selected, usually stratified by geographic location and, if available, subspecialty of the occupation. If n and N are the number of incumbents selected and the number on the frame, respectively, within a stratum, then the selection probability for the k-th selected incumbent working in the j-th occupation from the i-th establishment is²

, (14)

and the associated weight is

. (15)

The selected incumbents are then directly contacted and their participation is solicited. Those who agree to participate are entered into sampling process Step 7, by which a questionnaire type is assigned to each incumbent.

The supplemental frame weights are adjusted for nonresponse and combined with the Establishment Method weights, as described in the subsection on weighting.

Employee Sample Size

A key issue in sample design is the level of precision required in the resulting data and the cost of producing a particular level of precision, in terms of both dollars and respondent burden. The O*NET sample design has been developed to provide results with a level of precision that should be adequate to meet the needs of general-purpose users (those seeking information at the occupational level). Consistent with the procedures used by the O*NET Program since 2001, an occupation is considered complete and ready for inclusion in the final O*NET database when at least 15 valid completed questionnaires (after data cleaning) have been obtained for each of the three questionnaire domains.³

The current sample size goal is based on the final technical report of Peterson, Mumford, Borman, and colleagues (1997), which presents means and standard deviations for both 5- and 7‑point Likert scales, with consecutive integer scores, for the descriptors within the Skills, Knowledge, Generalized Work Activities, Abilities, and Work Styles domains. Statistics were computed separately with the reported data for each of six occupations. The data in these tables indicate that when 15 responses per descriptor are obtained, the mean values for virtually all of the 5‑point and the 7-point descriptors have 95% confidence intervals (CIs) that are no wider than plus or minus 1 to 1.5 scale points for all occupations.

Exhibit B-2 displays the half-width of 95% CIs for means of 5- and 7-point scales asked of all respondents, by sample size, from Analysis Cycles 16 through 18 for all incumbent occupations. The items are summarized in Exhibit A-2 as those with a data source of job incumbents and are presented as part of the questionnaires in Appendix A. The scales were given consecutive integer scores, and estimates were produced as described in the “Estimation” subsection of ‎B.2.1. Across all sample sizes, nearly all of the scale means have 95% CIs that are no wider than plus or minus 1.6 scale points. For those scale means based on sample sizes of between 15 and 25 respondents, more than 95% of the 5‑point scales and more than 90% of the 7-point scales have 95% CIs no wider than plus or minus 1.5 scale points.

Exhibit B-2. Half-Width of 95% Confidence Intervals

Percentile	5-Point Scales		7-Point Scales
Percentile	Sample Sizes of 15 to 25	All Sample Sizes	Sample Sizes of 15 to 25	All Sample Sizes
95th	± 1.1	± 1.0	± 1.7	± 1.6
90th	± 1.0	± 0.9	± 1.5	± 1.4
75th	± 0.8	± 0.7	± 1.2	± 1.1
50th	± 0.6	± 0.5	± 0.9	± 0.8

Note: Data are taken from 145 five-point scales and 74 seven-point scales measured on each of 248 occupations. The Education and Training Scales (N = 5) within the Knowledge Questionnaire and 2 of the 57 Work Context scales (3‑point scales) are excluded.

Mumford, Peterson, and Childs (1997) have cited Fleishman and Mumford (1991) as support that variation of 1 to 1.5 scale points on a 7-point scale “is typical of that found for well-developed level scales.” Setting a minimum employee sample size of 15 (with many occupations achieving a larger sample size) therefore will generally satisfy this requirement. Additionally, Peterson and colleagues (2001) state that 15 to 30 incumbents typically provide sufficient interrater reliability for describing occupations, given the types of measures the O*NET Program uses to describe occupations.

Weighting

After the raw data are edited and cleaned, weights are constructed for each establishment and employee respondent to reduce estimate bias and variance due to factors such as nonresponse, undercoverage, and the complex sample design. The weighting process for the basic Establishment Method is described first. Subsequently, weighting for the supplemental-frame samples is described, together with weighting methods for combining the Establishment Method and supplemental-frame samples.

Estimates generated from O*NET survey data are computed with analysis weights to reflect the combined effects of the following:

probabilities of establishment selection;
probabilities of occupation selection;
early termination of employee sampling activities because of higher-than-expected yields for some occupations;
probabilities of employee selection;
multiple-sample adjustments;
nonresponse at both the establishment and the employee levels; and
under- and overcoverage of the population, caused by frame omissions and undetected duplications.

The starting point for each of these stages is the inverse of the probabilities of selection at each stage (establishment, occupation, and employee)—called the base sampling weight for the stage. The base sampling weight accounts for the unequal probabilities with which establishments, occupations, and employees are selected at each stage and are presented in

Equation (2), w_1i associated with the initial simple random sample of establishments from the D&B frame;
Equation (5), w_2i associated with the probability-proportional-to-size sample of establishments from the initial simple random sample;
Equation (9), w_3ij associated with the selection of an occupation at an establishment; and
Equation (11), w_4ijk associated with the selection of an employee from an occupation at an establishment.⁴

The product of these four weights would be the appropriate analysis weight if effects due to such issues as nonresponse and under- or overcoverage were negligible; however, weight adjustments likely will improve the accuracy of the estimates. The weight adjustments are implemented in three weighting steps corresponding to the three main steps of Establishment Method sampling:

Weighting Step 1, by which establishment weights are computed to account for the probabilities of selecting establishments and to account for adjustments for establishment nonresponse;
Weighting Step 2, by which occupation weights are computed to account for the probabilities of selecting specific occupations from each establishment and to account for adjustments for the early termination of sampling for some occupations under MAS; and
Weighting Step 3, by which employee analysis weights are computed to account for the probabilities of selecting employees within each occupation and to account for adjustments for employee nonresponse and for overlapping frames across the subwaves.

The weights are calculated separately for each subwave in Weighting Steps 1 and 2, and then they are combined into an overall analysis weight in Weighting Step 3. The specific methods used in each of these weighting steps are described here after the unit-nonresponse adjustment method is described.

Nonresponse adjustment. The sampling weights are adjusted for nonresponse with use of a generalized exponential model (GEM). RTI has used the GEM method to create sampling weight adjustments for the 1999 through 2016 annual National Survey on Drug Use and Health conducted for the Substance Abuse and Mental Health Services Administration and for several other surveys conducted by RTI.

The GEM calibration is a generalization of the well-known weighting class approach, the iterative proportional fitting algorithm that is generally used for poststratification adjustments, Deville and Särndal’s (1992) logit method, and Folsom and Witt’s (1994) constrained logistic and exponential modeling approach. The GEM calibration process causes the weighted distribution of the respondents to match specified distributions simultaneously for all of the variables included in the model. One advantage of the GEM method over simpler weighting class or poststratification adjustments is that the adjustment model can use a larger and more diverse set of control variables because main effects and lower-order interactions can be used in the model, rather than complete cross-classifications. Folsom and Singh (2000) described the GEM method in a paper presented to the American Statistical Association.

To summarize, a set of predictor, or adjustment, variables is specified, together with the control total for each variable to which the weighted sample is expected to match. The GEM method is designed to determine a weight adjustment factor for each respondent, such that

where the summation is over the respondents, x_k is an adjustment variable in the model, w_k is the base sampling (or unadjusted) weight, a_k is the adjustment factor, and T_x is the control total for the variable x. T_x may be either a nonresponse adjustment control total estimated by the sum of base sampling weights for both respondents and nonrespondents or an external control total to adjust for under- or overcoverage of the frame. The adjustment factors, a_k, are determined to match the control totals for all of the variables in the model simultaneously. Furthermore, upper and lower bounds on the weight adjustment factors can be set to reduce the influence of observations that otherwise might have received a very large weight adjustment. The upper and lower bounds also reduce the effect of unequal weighting that may result from uncontrolled weight adjustments.

Weighting Step 1: Establishment weights. The base sampling weight, , for the selected establishments in a subwave is the product of the weights in Equations (2) and (5):

. (16)

The establishment sampling weights are adjusted for nonresponse, by subwave, with use of the GEM method with a model that contains different combinations of the following variables:⁵

industry division used for sampling/MAS;
U.S. Census division;
establishment size;
headquarters/branch type;
number of occupations assigned to an establishment;
code information using decennial census data for quartile distribution of owner-occupied housing;
urban or rural status;
time zone; and
two-way interactions between industry division used for sampling/MAS and establishment size.

Variable selection proceeds by first fitting a model containing only main effects and tightening the upper and lower bounds so that all upper bounds are less than 8 and a minimal increase in the unequal weighting effect (UWE) is achieved.⁶ Two-way interactions among the variables are then added to the model. Cells that do not contain any respondents or that are collinear with other cells are removed from the model. If a convergent model cannot be obtained, some covariate levels are collapsed together; for example, U.S. Census divisions are collapsed to regions. Other variables or interactions may be removed from the model until a convergent model is obtained (i.e., a solution is found given all constraints) that maintains as many of the covariates and their two-way interactions as possible.

Variable selection and testing are conducted for each sampling subwave to determine the final model for a subwave. Extremely large weights are trimmed back to smaller values. Even though the GEM method provides control over the size of the adjustment factors, it is still possible for large weights to result, though at a rate lower than that from an uncontrolled adjustment process. The total amount trimmed within a subwave is proportionally reallocated across the responding establishments to maintain the same estimated total number of establishments for each subwave. The adjusted establishment weights are denoted as .

Weighting Step 2: Occupation weights. The base occupation weight, , for i-th occupation selected from the j-th establishment is

, (17)

which is the product of w_3ij, defined by Equation (8), and the adjusted establishment weight for the subwave defined in Weighting Step 2. For most occupations, no further nonresponse adjustments are necessary, because once an establishment agrees to participate, all of its selected occupations are available. However, MAS is used to terminate incumbent sampling early for some occupations with higher-than-expected numbers of sampled incumbents. For such occupations, the rate at which an occupation is encountered is estimated from the establishments contacted before the early termination; the estimated rate is then used to predict the number of additional establishments that would have reported employing the occupation. The occupation weights for the establishments that complete employee sampling are then adjusted to account for the predicted additional establishments. To understand the adjustment for early termination of sampling for some occupations, consider the classification of establishments shown in Exhibit B‑3.

Exhibit B-3. Classification of Establishments by Occupation Model-Aided
Sampling Status

Group	Description
A	Inquired about occupation, and it is present at establishment
B	Inquired about occupation, but it is not present at establishment
C	Did not inquire about occupation because of early termination of incumbent sampling for occupation

Groups A and B are those establishments where the presence or absence of an occupation is known and can be used to estimate the rate at which the j-th occupation is present, or the presence rate, by

, (18)

where the summations are over the establishments in Group A or Group B. Next, the additional number of establishments where the j-th occupation would have been found if sampling had not been terminated early is estimated by applying the presence rate to number of establishments in Group C. Thus, the estimated total number of establishments where the j-th occupation would have been found is given by

, (19)

where the summations are over the establishments in Group A or Group C. It is tacitly assumed in Equation (18) that the establishments where occupations are inquired about approximate a random sample from all three groups listed in Exhibit B-3 (i.e., sampled, eligible, and participating establishments). This assumption is consistent with the random assignment of establishments to the BLs and the random order in which establishments are initially contacted. The base occupation weights for the establishments in Group A are then adjusted to sum to T_j for the j-th occupation.

To make this adjustment more sensitive, the process for estimating the number of establishments where the j-th occupation would have been found is completed separately by census regions, by the business size groups, and by industry divisions—as with the process for defining the MAS target cells. This process yields three sets of estimated marginal totals corresponding with these three variables. The GEM method is then used with a model containing the marginal, or main, effects of census regions, business size groups, and industry divisions to adjust the base occupation weights from Equation (17) for those establishments in Group A. The adjusted weight for the j-th occupation from the i-th establishment is denoted by .

Weighting Step 3: Employee analysis weights. The base weights for the responding employees in a subwave are

, (20)

which is the product of w_4ijk, defined by Equation (11), and the adjusted occupation weight for the subwave, defined in Weighting Step 2. At this point the responding employees from all subwaves are combined, and a multiple-frame adjustment is made. The overlap of target populations among the subwaves for each occupation is determined, and a multiple-frame adjustment is made, as described by Korn and Graubard (1999), using the sample sizes in the overlaps. For example, if two subwaves overlap for an occupation within a set of industries, then the adjustment factors for the subwaves are

, (21)

where is the sample size from the l-th subwave in the overlap between the subwaves. Then, the multiple-frame adjusted employee weights are

, (22)

where the adjustment factor, a_l, is selected to correspond with the industry overlap portion associated with the ijk-th employee. This adjustment process is completed separately for each combination of overlapping subwaves.

Next, the employee weights in Equation (22) are further adjusted for nonresponse, using the GEM method with a model that contains different combinations of the following variables:

indicator for each occupation;
industry division used for sampling/MAS;
U.S. Census division;
establishment size;
sampling wave;
questionnaire type;
headquarters/branch type;
number of occupations asked about in an establishment;
number of occupations assigned to an establishment;
code information using decennial census data for quartile distribution of owner-occupied housing;
total number of selected employees in an establishment;
primary or secondary occupation;
whether POC has ever heard of O*NET;
expected sampling yield (high, medium, or low);
quintile distribution of percentage of occupation with industry;
quintile distribution of percentage of industry within occupation;
urban or rural status;
time zone; and
two-way interactions between industry division used for sampling/MAS and establishment size.

As before, variable selection and testing are conducted to determine the final model. Indicator variables for the occupations are included in the final model so that the adjustment maintains the correct sum of weights for each occupation; at the same time, to improve the adjustment, the data across occupations are used for the other variables.

At this point, an examination for extreme weights is conducted for each occupation by domain questionnaire. To prevent a few respondents from being too influential, weight trimming is performed according to two rules. Weights are deemed too extreme for a particular occupation by domain group if

any weight exceeds the mean weight of the group plus 1.5 standard deviations of the weights, or
a single weight accounts for more than 50% of the weight sum of the group.

Extreme weights are trimmed to the smaller of the two values bulleted above; the total amount trimmed for an occupation by domain group is proportionally allocated to all respondents in the domain group. The 50% check on the contribution of a single respondent is repeated until no single respondent exceeds the proportion limit.

The employee weights are then ratio-adjusted to match the distribution of workers in an occupation by industry sector as estimated from BLS’s OES data. If the OES indicated employment in a specific industry division for an occupation for which the O*NET data lacked respondents, the OES total employment for that industry division is proportionally distributed across the industries for which O*NET respondents existed.

Occasionally, this final ratio adjustment will induce a large UWE,⁷ inflating the variances of O*NET estimates or producing undesirably large employee weights. Accordingly, final checks are conducted to determine whether (1) an occupation has a UWE greater than 10.0, or (2) the weight of a single respondent accounts for more than 30% of the total weight of all respondents within a domain questionnaire. If either of these criteria is met, then this final ratio adjustment is repeated; however, the distribution of workers for the subject occupation is allowed to deviate from the OES estimated distribution in order to satisfy these criteria.

The resulting weights are denoted by . If a supplemental frame was not used to complete the sample of an occupation, then these weights are the final employee analysis weights used to produce the estimates for the O*NET Data Collection Program. If a supplemental frame was used for an occupation, then an additional, supplemental-frame weighting step completes the calculation of the analysis weights.

Supplemental-frame weighting. As noted in the discussion of sampling methods, locating a sufficient number of respondents under the Establishment Method occasionally requires sampling from a supplemental frame. This situation usually occurs for rare occupations or occupations that are difficult to locate through Establishment Method sampling. When it does occur, additional weighting steps are necessary to account for the dual frames used to select the samples. Two situations must be considered: one in which the supplemental frame consists of establishments likely to employ the occupation of interest, and another in which the supplemental frame consists of employees likely to be working in the occupation. Described here are the steps taken in each situation.

First consider the situation in which a supplemental frame of establishments is used, as illustrated in Exhibit B-4. In this figure the dashed rectangle represents the full frame; the oval, the supplemental frame. The full frame, denoted by F, is the D&B establishment listing used for the usual Establishment Method sampling, which includes N establishments. The supplemental frame, denoted by F_S, includes N_Sestablishments.

Exhibit B-4. Overlap of Full and Supplemental Frames

For a supplemental-frame sample of establishments, Equation (16) of Weighting Step 1 is modified to start with a base sampling weight, , given by

, (23)

where is defined in Equation (13). The supplemental-frame establishment sampling weights are then adjusted for nonresponse with use of the GEM method and under a model containing the stratification variables used to select the supplemental-frame sample. To determine the final model with the GEM software, variable selection and testing are conducted for each supplemental-frame sample. The nonresponse-adjusted supplemental-frame establishment weights are denoted as .

A supplemental frame is developed to target a small number of related occupations—and often only one occupation—because of the difficulty of locating them. For a supplemental-frame sample, occupations are not randomly assigned to establishments from a large set of occupations, as is done for the subwaves in Establishment Method sampling. Consequently, the base sampling weight for the j-th occupation selected from the i-th establishment for Weighting Step 2, in Equation (17), is modified to be

, (24)

which excludes the weighting factor related to the random assignment of an occupation to an establishment. The subscript j is added, however, to recognize that a specific occupation is associated with each selected establishment in the supplemental-frame sample. The supplemental-frame sample then proceeds through the rest of Weighting Step 2. The adjusted weight for the j-th occupation from the i-th establishment is denoted by .

As part of Weighting Step 3, the base sampling weights for the responding employees in a supplemental-frame sample are defined as in Equation (20),

, (25)

which is the product of , defined in Equation (11), and the adjusted occupation weight for the supplemental-frame sample, defined in Weighting Step 2 as .

Next, the occupation’s employee weights from the supplemental-frame sample must be combined with the same occupation’s weights from the Establishment Method sample. The employee weights from each of the Establishment Method subwaves are first combined as shown in Equation (22) into a single set of weights for all employees from an occupation selected by the Establishment Method. An extra step is added to Weighting Step 3 to combine the Establishment Method employee weights from Equation (22) with the supplemental-frame employee weights from Equation (25).

At this point it must be assumed that the establishments listed on the supplemental frame, F_S in Exhibit B-4, are equivalent to a random sample of the establishments on the full frame, F. This assumption is made because of the inability to determine which establishment on the full D&B-derived frame, F, links to which establishment on the supplemental frame. Under this assumption, a multiple-frame situation again emerges because data come from two samples: the Establishment Method sample and the supplemental-frame sample. A multiple-frame adjustment is made, as described by Korn and Graubard (1999, sec. 8.2), using the sample sizes from the two samples. Similarly, as was described for Equation (21), let t_F be the sample size for an occupation from the Establishment Method full sample, and let t_S be the sample size for the occupation from the supplemental sample. The multiple-frame adjustment factors are then given by

and

, (26)

which correspond with the full frame and the supplemental frames, respectively. Finally, the weight for the k-th employee from the j-th occupation in the i-th establishment is given by

for the full sample and

(27)

for the supplemental sample, where and are defined in Equations (22) and (25), respectively, and a_F and a_Scorrespond with the j-th occupation. The weights are then used in the adjustment steps after Equation (22) to complete Weighting Step 3, which yields the final analysis weights .

Next, consider the situation in which a supplemental frame consists not of establishments but of employees working in the occupation. In this situation, no actions equivalent to those in Weighting Steps 1 and 2 exist for the supplemental-frame sample. For Weighting Step 3, the supplemental-frame sample weighting starts with the base sampling weight

, (28)

where was defined in Equation (15). These weights are then combined with the Establishment Method employee weights for an occupation, as was described in connection with Equations (26) and (27), to yield . Again, the key assumption is made that the employees on the supplemental frame are equivalent to a random sample of all employees working in the occupation who might have been selected through the Establishment Method. This assumption is necessary because it is not possible to link the employees on the supplemental frame back to the D&B frame of establishments used for the Establishment Method. The weights are then used in the adjustment steps after Equation (22) to complete Weighting Step 3, which yields the final analysis weights .

Estimation

The estimates produced for each occupation consist of scale means and percentage estimates. The number and type of scales are those listed in Exhibit A-2 with a data source of job incumbents. Each of these scales is a 3-, 5-, or 7-point Likert scale. The final estimates are the means of the scales for each occupation. No subgroup or domain estimates are produced or released, both to protect against disclosure and because small sample sizes are not conducive to reliable estimation. The standard deviation will be available for each item mean as a measure of response variation among an occupation’s respondents. Finally, there are several percentage estimates produced for items concerning work context, education and training, background items, and occupation tasks. Again, the final estimates are the percentages for each occupation, and no subgroup or domain estimates are produced or released.

For each item, if respondents do not provide an answer to a particular question, they are excluded from both the numerator and the denominator of the estimated mean. Because item nonresponse tends to be very low for this study (see Appendix D), no item imputation is conducted, and no value for missing items is assumed for estimation.

Variances are estimated with the first-order Taylor series approximation of deviations of estimates from their expected values. These design-based variance estimates are computed with SUDAAN^® software (RTI International, 2013). These estimates properly account for the combined effects of clustering, stratification, and unequal weighting—all of which are present in the O*NET data. The variance estimation clusters are the establishments; the stratification is by industry grouping and establishment size as used in selection of the establishment samples. These estimated variances are used to estimate both the standard errors associated with the mean or percentage and the CIs. Standard error estimates and 95% CIs are included with all estimates of means and proportions.

The estimate of a mean or a proportion is given by the formula

, (29)

where is the final analysis weight for the k-th respondent from the i-th establishment in the h-th stratum, and is the response variable. For a scale mean, the response variable, , is the scale value reported by the respondent; for a proportion, the response is 1 for a positive response and 0 for a negative response. The Taylor series linearized values for the estimated mean or proportion are given by

. (30)

The variance of is estimated by

, (31)

where n_h is the number of variance estimation clusters from the h-th stratum, and .

Expected Data Collection Results

Data collection had been completed for 154 subwaves as of December 31, 2017. These subwaves consisted of 308,642 sampled establishments and 333,566 selected employees. The overall response rate was 75% for establishments and 64% for employees. Although these response rates compare favorably with those of similar studies (see Section ‎A.1.5, Part A), methods to further enhance response rates are continually being evaluated and implemented (see Section B.4).

The data collection results expected for the Establishment Method during the period October 2018–September 2021 are shown in Exhibit B-5. The numbers for the sampled establishments, eligible establishments, and participating employees come from our burden estimates in Exhibit A-19 in Part A. The establishment response rate and employee response rate come from the summary of our data collection experience in Exhibit A-3 in Part A. The number of participating establishments and number of sampled employees may be derived from our assumed response rates.

Exhibit B-5. Establishment Method Expected Data Collection Results

Sampled establishments	49,000
Eligible establishments	40,670
Participating establishments	30,503
Establishment response rate (participating establishments/eligible establishments)	75%
Sampled employees	66,856
Participating employees	42,788
Employee response rate (participating employees/eligible employees)	64%

Occupation Expert Method

The OE Method is used for occupations as necessary to improve sampling efficiency and avoid excessive burden, as when it is difficult to locate industries or establishments with occupation incumbents; employment is low; or employment data are not available, as is the case for many new and emerging occupations. To determine which sampling method should be used for an occupation, a comparison is made of the advantages and disadvantages of the Establishment and OE Methods. For each occupation, information on the predicted establishment eligibility rate and the predicted establishment and employee response rates is used to quantify the efficiency of sampling the occupation by means of the Establishment Method. The OE Method is used for an occupation when the Establishment Method of data collection is not feasible and an appropriate source of occupation experts is available, as when a membership list of a professional or trade association exists and provides good coverage of the occupation. A random sample is selected from provided lists to prevent investigator bias in the final selection of occupation experts. Sample sizes are designed to ensure that at least 20 completed questionnaires are available for analysis after data cleaning. A goal of 20 questionnaires was set as a reasonable number to enable adequate coverage of experts, occupation subspecialties, and regional distribution.

Through December 31, 2017, the OE Method was used to collect data for 429 occupations. Of these, 388 were completed and 41 were still in process as of that date. In the completed occupations, a total of 20,247 occupation experts were sampled, of whom 15,134 were found to be eligible. Of these, 11,184 occupation experts participated, for an overall OE response rate of 74%.

The data collection results expected for the OE Method during the period October 2018–September 2021 are shown in Exhibit B-6. The number of participating occupation experts comes from our burden estimates in Exhibit A-19 in Part A. The occupation expert response rate comes from our summary of data collection experience in Exhibit A-6 in Part A. The number of eligible occupation experts may be derived from our assumed response rate, and the number of sampled occupations is implied by our experience-based eligibility rate.

Exhibit B-6. Expected Occupation Expert Method Data Collection Results

Sampled occupation experts	3,519
Assumed eligibility rate	75.7%
Eligible occupation experts	2,664
Participating occupation experts	2,025
Occupation expert response rate (participating occupation experts/eligible occupation experts)	76%

For the OE method, unweighted estimates of the same means and percentages are reported as for the Establishment Method, together with the estimated standard deviation of the mean estimates. Because occupation experts are not selected as a random sample from all incumbents working in an occupation, the weights and weighting steps used for Establishment Method occupations are not appropriate, and weights are not calculated for OE Method occupations.

Procedures for the Collection of Information

Data collection operations are conducted at RTI’s Operations Center and Survey Support Department, both of which are located in Raleigh, North Carolina. For the Establishment Method, the Operations Center’s BLs contact sample business establishments, secure the participation of a POC, and work with the POC to carry out data collection in the target occupations. The data are provided by randomly selected employees within the occupations of interest. All within-establishment data collection is coordinated by the POC; the BLs do not contact employees directly.⁸ After the POC agrees to participate, informational materials and questionnaires are mailed to the POC, who distributes the questionnaires to the sampled employees. Completed questionnaires are returned directly to RTI for processing. Respondents also have the option of completing the survey online.

Survey Support Department staff mail materials to POCs, job incumbents, and occupation experts, and they receive and process completed questionnaires that are returned by respondents. Both the telephone operations of the BLs and the mailing and questionnaire-receipt operations of the support staff are supported by the CMS. Data-entry software supports the keying and verification of incoming survey data.

Establishment Method

As described in Section ‎B.2.1, the Establishment method uses a two-stage design involving a statistical sample of establishments expected to employ workers in the target occupations, followed by a sample of the workers in the target occupations within the sampled establishments. The sampled workers are asked to complete the survey questionnaires.

The Establishment Method works well for most occupations. Occasionally, however, the standard protocol is supplemented with a special frame, such as a professional association membership list, when additional observations are required to complete data collection for an occupation. The primary difference with this approach is that the supplemental respondents are sampled directly from the special frame and contacted directly by the BLs, without involvement of a sampled establishment or a POC.

O*NET Operations Center

Data collection activities are housed in RTI’s O*NET Operations Center. The Operations Center staff includes BLs, their supervisors (i.e., Team Leaders), a Monitoring Coordinator, and the Operations Center Manager, who reports to the Data Collection Task Leader. Usual operating hours for the Operations Center are Monday through Friday, 8:30 a.m. to 5:30 p.m., Eastern Time. Operating hours are extended during periods of unusually high workloads or when necessary to contact a high concentration of Pacific time zone businesses.

The BLs form the nucleus of the Operations Center staff. The number of BLs fluctuates depending on attrition and workload, but it is typically 20–25. New BLs are recruited and hired periodically to maintain adequate staffing levels. BL job candidates are carefully screened and evaluated by Operations Center management.

Case Management System

The O*NET CMS is a Web-based control system that supports and monitors the data collection activities of the BLs, the mailing of informational materials and questionnaires, and the receipt of completed paper and Web questionnaires.

Questionnaires and Informational Materials

The Establishment Method data collection protocol calls for each sampled worker to receive one of three randomly assigned domain questionnaires—Knowledge (which includes Education and Training as well as Work Styles), Generalized Work Activities, and Work Context. Each sampled worker also receives a Background Questionnaire and a Task Questionnaire.

The Background Questionnaire contains a standard set of 13 demographic questions about the background of the respondent. It also includes two occupation-specific questions about professional or job-related associations. The respondent is provided with a list of associations related to the worker’s occupation and asked to indicate whether he or she belongs to any of them. The respondent is also asked to write in any other associations to which he or she belongs. This information is collected in case it becomes necessary to complete the occupation with use of the dual-frame approach.

Task Questionnaires are developed initially through the extraction of task information from multiple sources located on the Internet. This questionnaire includes a definition of the occupation, a list of tasks, and space for the respondent to write in additional tasks. The respondent is instructed to indicate whether or not each task is relevant to his or her occupation and to rate each relevant task’s Frequency and Importance. In subsequent updating efforts, task inventories are revised to reflect the new and most current information from respondents, including write-in tasks.

Each sampled employee receives an integrated questionnaire booklet consisting of the randomly assigned domain questionnaire and the Background Questionnaire and Task Questionnaire applicable to the employee’s occupation. In addition, workers are given the option of completing the questionnaire online at the project’s Web site instead of completing and returning the paper questionnaire.

Spanish versions of the questionnaires are available for occupations with high proportions of Hispanic workers. The Spanish questionnaires are sent to workers who are identified as Spanish speaking by their POC. In addition, an employee who has been sent an English questionnaire can request a Spanish version by calling RTI at a toll-free number.

Examples of the English questionnaires are included in Appendix A.⁹ The Spanish versions are available on request.

In addition to the questionnaires, the Establishment Method data collection protocol includes a variety of letters, brochures, and other informational materials mailed to POCs and sampled workers. Spanish versions of the materials addressed to workers are available for occupations with high proportions of Hispanic workers. Appendix B contains examples of the English versions of these materials.¹⁰ The Spanish versions are available on request.

Data Collection Procedures: Establishment Method

Described here are the steps of the Establishment Method standard data collection protocol. A summary of this protocol is shown in Exhibit B-7.

Exhibit B-7. Establishment Method Data Collection Flowchart

Step 1: Verification call to the receptionist. The BLs call each sampled business to determine whether the business is eligible (i.e., whether it is still in operation at the sampled address). The other component of the verification call is to identify the anticipated POC, who must be knowledgeable about the types of jobs present and who is the recipient of the screening call.

Step 2: Screening call to the point of contact. The BLs next call (or are transferred to) the anticipated POC to ascertain whether the business has at least one employee in at least one of the occupations targeted for that establishment. If so, the following POC information is obtained:

name and title of the POC,
U.S. Postal Service delivery address,
telephone number,
e-mail address (if available), and
fax number.

None of the BLs’ conversations with the POC are scripted in advance. Instead, “talking points” are provided to guide the BLs’ interactions with POCs. BLs are trained to listen and interact effectively and in a comfortable style, rather than to read from a prepared script; therefore, reading off a computer screen is discouraged. The BLs enter all information gathered during each conversation with a POC into the CMS.

Step 3: Send information package. The information package, which is sent to the POC after the completion of the screening call, contains more detailed information about the O*NET Program. The following information is included in the information package:

lead letter from the U.S. Department of Labor (DOL);
O*NET brochure;
“Who, What, and How” brochure;
Selected Occupations List, providing title and descriptions of target occupations;
list of endorsing professional associations; and
brochure describing the business-, POC-, and employee-level incentives.

Step 4: Recruiting call to the point of contact. To give the POC adequate time to receive, read, and process the information, the next call to the POC is made approximately 7 days after the information package is shipped. During the recruiting call, the BL

verifies that the information package was received;
confirms that the POC is qualified to serve in the POC role;
reviews with the POC the titles and descriptions from the Selected Occupations List for the target occupations, to determine whether the establishment has any employees in those occupations;
(if one or more target occupations are present) explains the O*NET Program in greater detail, answers questions, and attempts to secure the POC’s commitment to participate;
(for participating establishments) explains the need for the POC to prepare a numbered list of employees’ names for each selected occupation, for use in selecting a sample of employees; and
sets an appointment for the sampling call, allowing sufficient time for the POC to compile the occupation rosters (in smaller businesses, the sampling call is sometimes combined with the recruiting call).

Step 5: Sampling call to the point of contact. During this call, the BL obtains from the POC the number of names on each roster and enters the counts into the CMS, which selects the sample according to preprogrammed random sampling algorithms. The BL then informs the POC of the line numbers of the employees selected for each occupation. The POC is asked to note the line numbers of the selected employees on each roster for reference when the questionnaires are subsequently distributed. For designated O*NET-SOC occupations with a high percentage of Hispanic employees, the BL also asks the POC if any of the selected employees should receive a Spanish version of the questionnaire instead of the English version. The language preference of each employee is then indicated in the CMS.

Step 6: Send questionnaire package. After completion of the sampling call, the employee packets are shipped to the POC for subsequent distribution to the sampled employees. As part of the same mailing, the POC receives a thank-you letter and a framed Certificate of Appreciation from DOL, personalized with the name of the POC and signed by a high-ranking DOL official. Each questionnaire packet contains a letter from RTI’s project director, the assigned questionnaire (including the domain questionnaire, Background Questionnaire, and Task Questionnaire integrated into a single booklet), a return envelope, an information sheet for completing the Web questionnaire (including the respondent’s user name and password), and a $10 cash incentive. In addition, a label is affixed to the cover of the questionnaire to remind the respondent of the option to complete the questionnaire online. A Spanish questionnaire is sent to any Hispanic employees who the POC indicated during the sampling call should receive this version. In addition, all employees in these O*NET-SOC occupations are informed through a bilingual notice included in the mailing that they have a choice between English and Spanish versions, and they are provided with a toll-free number to call if they would like to receive the alternate version.

Step 7: Send toolkit. Approximately 10 business days after mailing the questionnaire package, RTI also mails the POC the O*NET Toolkit for Business—a packet of information about the O*NET Program, which managers can use for human resource planning and preparation of job descriptions.

Step 8: 7-day follow-up call to the point of contact. Approximately 7 days after the shipment of the original questionnaire package to the POC, the BL calls to verify receipt of the mailing and to review the process for distributing the questionnaires to the selected employees. The BL also informs the POC of a forthcoming shipment of thank you/reminder postcards and asks him or her to distribute them to all sampled employees.

Step 9: Send thank you/reminder postcards. After the 7-day follow-up call, the BL places an order for thank you/reminder postcards to be sent to the POC for distribution to all sampled employees.

Step 10: 21-day follow-up call to the point of contact. Approximately 21 days after the shipment of the original questionnaire package, the BL calls to thank the POC for his or her ongoing participation and to provide an update on any employee questionnaires received to date. The BL asks the POC to follow up with nonrespondents.

Step 11: 31-day follow-up call to the point of contact. Approximately 31 days after the shipment of the original questionnaire package to the POC, the BL calls to again thank the POC for his or her ongoing participation and to provide an update on any employee questionnaires received to date. At this time, the BL informs the POC of a forthcoming shipment of replacement questionnaires, which are to be distributed to any employees who have not yet returned the original questionnaire.

Step 12: Send replacement questionnaires. After the 31-day follow-up call, the BL places an order for the shipment of replacement questionnaires. These packages are ordered for any employees who have not yet responded. The replacement questionnaire package is like the original one, except for a slightly different cover letter and the absence of the $10 cash incentive. Using roster line information or employee initials provided by the BL during the 31-day follow-up call, the POC then distributes the appropriate replacement questionnaire package to each nonresponding employee and encourages the employee to complete and return the questionnaire.

Step 13: 45-day follow-up call to the point of contact. Approximately 45 days after the shipment of the original questionnaire package to the POC, the BL places one final follow-up call to the POC to thank the POC for his or her assistance and to provide one final status report regarding employee questionnaires. If all questionnaires have been received at this point, the BL thanks the POC for his or her organization’s participation. If questionnaires are still outstanding, the BL confirms receipt and distribution of the replacement questionnaire packets and asks the POC to follow up with nonrespondents. This step is usually the final one in the data collection protocol.¹¹

Mailout Operations, Questionnaire Receipt, and Processing

Mailout operations and questionnaire receipt and processing are housed in RTI’s Survey Support Department. Orders for mailings of questionnaires and informational materials to support data collection are placed by the BLs in the Operations Center and processed by survey support staff in the Survey Support Department. The CMS supports and monitors the entire process, including placing the order, printing on-demand questionnaires and other order-specific materials, shipping the order, and interacting with the U.S. Postal Service to track delivery of the order. Staff members follow written procedures in fulfilling orders, including prescribed quality control checks. They are also responsible for maintaining an adequate inventory of mailing materials and for inventory control.

Completed questionnaires returned by mail are delivered to RTI, where they are opened and batched and the barcodes scanned to update the CMS for receipt. The batches are then delivered to data-entry staff, where the survey data are keyed and 100% key verified. The questionnaire batches are then stored in a secure storage area. Data from the paper questionnaires are merged with the Web questionnaire data and readied for data cleaning routines.

Establishment Method Data Collection Results

Establishment Method data collection has continued uninterrupted since the start of the study in June 2001. As of December 31, 2017, 154 waves of data collection have been completed and 180,153 establishments and 213,603 employees have participated, resulting in an establishment response rate of 75% and an employee response rate of 64%.¹²

Occupation Expert Method

The OE Method is an alternate method of collecting information on occupational characteristics and worker attributes that is used to improve sampling efficiency and avoid excessive burden for problematic occupations. This situation occurs when occupations have low employment scattered among many industries or when employment data do not yet exist (e.g., for new and emerging occupations). With this method, persons considered experts in the target occupation, rather than job incumbents, are surveyed. Occupation experts are sampled from lists provided by source organizations that can include professional associations, certification organizations, industry associations, and other organizations that can identify qualified experts in a given occupation. The sampled occupation experts are contacted directly by the BLs, without involvement of a sampled establishment or a POC. Unlike the standard Establishment Method, under which workers complete only one questionnaire, the OE Method requires that occupation experts complete all three domain questionnaires, as well as a Background Questionnaire and a Task Questionnaire. Because of the increased burden, occupation experts receive a $40 cash incentive instead of the $10 incentive offered to Establishment Method respondents.

The same staff and facilities used for Establishment Method data collection—the BLs in the Operations Center and the survey support staff in the Survey Support Department—are also used for the OE Method work. Like the Establishment Method, the OE Method uses a Web-based case management system, the OE CMS, to support and monitor the data collection activities of the BLs, the mailing of informational materials and questionnaires, and the receipt and processing of completed paper questionnaires.

Questionnaires and Informational Materials

With the exception of a few additional items in the Background Questionnaire, the OE Method questionnaires are the same as those used for Establishment Method data collection. Occupation experts are asked to complete all three domain questionnaires (as well as a Background Questionnaire and Task Questionnaire), whereas Establishment Method respondents complete only one domain questionnaire (as well as a Background Questionnaire and Task Questionnaire bound together with the domain questionnaire). Paper questionnaires are bundled before shipping, with the order of the domain questionnaires randomized at the respondent level. As with the Establishment Method, occupation experts are given the option of completing their questionnaires online at the project Web site.

OE Method informational materials resemble the Establishment Method materials but are modified to reflect how the OE Method differs from the Establishment Method (direct contact with the respondent, identification through a named source organization, reference to only one occupation, multiple questionnaires, and a higher incentive).

Examples of OE Method questionnaires are presented in Appendix A; informational materials are presented in Appendix B.

Data Collection Procedures

The steps in the OE Method data collection protocol closely follow those for establishments. The primary differences are the absence of the verification and sampling calls. Verification calls are inapplicable because a specific individual is contacted instead of an establishment. Sampling calls are inapplicable because the individual is not sampled from a larger group of employees. All other steps follow the Establishment Method protocol. The OE Method data collection protocol is shown in Exhibit B-8.

Mailout Operations, Questionnaire Receipt, and Processing

OE Method mailing operations and questionnaire receipt and processing follow the same procedures as those described for the Establishment Method in Section ‎B.3.1.

Occupation Expert Method Data Collection Results

Data collection was completed for 388 occupations as of December 31, 2017. As described in Section A.1.5, of the 15,134 eligible occupation experts identified, 11,184 participated, for an occupation expert response rate of 74%.

Exhibit B-8. Occupation Expert Method Data Collection Flowchart

Methods to Maximize Response Rates

The O*NET Data Collection Program is committed to achieving the highest possible response rates. This section summarizes some of the key features of the data collection protocol that are designed to maximize response rates and reduce nonresponse:¹³

Multiple Contacts—As described in Section ‎B.3, the Establishment Method protocol consists of up to 13 separate telephone and mail contacts with POCs (see Exhibit B‑7), and the OE Method consists of up to 10 contacts with occupation experts (see Exhibit B-8). In addition, supplemental contacts are made via e-mail and fax as appropriate. The multiple contacts are designed to establish rapport with the POCs/occupation experts, facilitate their participation, and help ensure the best possible questionnaire completion rate.
Multi-Mode Study—O*NET offers employees the choice of completing the survey online or filling out a paper version and returning it via mail. In their questionnaire packet, sampled employees receive both the hardcopy survey and a customized flyer providing their unique user ID and password as well as some general instructions on how to access the questionnaire. To encourage participation via Web, related study materials sent to the POC and the employees highlight the Web survey option; talking points have been added in the CMS for BLs to remind POCs and occupation experts of the Web option; and reminders are communicated via e-mail and are posted on the project Web site.
Incentives—As described in Section ‎A.9 in Part A, the O*NET Data Collection Program offers incentives to employers, POCs, job incumbents, and occupation experts. Participating employers receive the Toolkit for Business; POCs receive a framed Certificate of Appreciation from DOL if they agree to participate; job incumbents receive a prepaid cash incentive of $10; and occupation experts receive a prepaid cash incentive of $40 and a Certificate of Appreciation if they agree to participate. See Section ‎A.9 in Part A for a full discussion of the incentives and their rationale.
Refusal Conversions—At least one conversion attempt is made on every refusal encountered by a BL. When a POC or occupation expert refuses to participate, the case is transferred to a specially trained Converter for a refusal conversion contact. Refusal rates for BLs and conversion rates for the Converters are tracked and monitored for quality control.
100% Dedicated Facility—RTI’s O*NET Operations Center is an attractive, professional office space with state-of-the-art furniture, technology, and amenities that help attract and retain a quality staff. The facility is 100% dedicated to O*NET, and all staff work solely on the O*NET project.
Quality of Staff—Because of the unscripted nature of the calls conducted on this study and the challenges of securing participation from POCs and occupation experts, BL job candidates are carefully screened and evaluated by Operations Center management. Candidates are selected on the basis of a track record of successful work experience, educational attainment, computer proficiency, and research skills. BLs receive competitive salaries and attrition is low.
Staff Training—Newly hired BLs must successfully complete a 2-week intensive training program that includes presentations by key management staff on the data collection steps, situational role play, hands-on practice in the CMS, and coaching on overcoming objections. Once hired, BLs routinely participate in ongoing training in such topics as skills enhancement, refusal conversion, protocol refreshment, and occupation briefing.
Supervision—The BLs are closely supervised (the Operations Center supervisor-to-BL ratio is 1:6). The Team Leaders hold frequent one-on-one meetings with the BLs on their team to provide coaching, discuss problem areas, and suggest techniques for improvement.
Call Monitoring—Supervisors use silent monitoring equipment to monitor a random sample of each BL’s calls, including recorded calls and real-time live calls with visual screen monitoring. Immediate feedback is provided along with ongoing training and coaching to the BLs as needed.
Detailed Performance Measurement—O*NET data collection managers use a variety of measurements to help monitor and control the implementation of the data collection protocol. BL metrics such as call volume, phone time, caseload management, response rates, and quality scores help the O*NET survey managers monitor BL performance, detect potential problem areas, and observe trends that inform decisions about case assignments.
Conversational Style/Talking Points—The BLs do not read canned scripts. Instead they are provided with Talking Points in the CMS and use their interpersonal communication skills and active listening to cover the key points in a conversational manner. This approach promotes a much more positive experience for the POC and helps the BL build the necessary rapport.
Avoider Protocol—To maximize contact rates, BLs are provided a detailed protocol to follow when they are unable to reach POCs or OEs. The protocol provides direction on leaving voicemail messages, attempting calls on alternate days and times, sending approved e-mails, and sending letters in the mail aimed at generating callbacks.
Alternate POCs—The Establishment Method protocol allows for anyone within a company who has good knowledge of the occupations present at the establishment to serve as a POC. When a potential POC refuses to participate, BLs are trained to use a variety of methods to seek out and recruit an alternate POC within the establishment in order to avoid the refusal.

These and other enhancements have had a positive effect on the O*NET Program’s ability to secure the participation of both establishments and employees.¹⁴ The O*NET Program will continue to explore ways to enhance response rates through its continuous improvement program.

Tests of Procedures

Continuous improvement of survey quality and efficiency has been a constant focus of the O*NET Data Collection Program. The survey design described in this document has evolved over years of testing and evaluating alternative procedures. The O*NET Program team believes that this design reflects the best combination of design parameters and features for maximizing data quality while reducing data collection costs. Summarized here are some of the tests of procedures that have been conducted for the O*NET Data Collection Program.

1999 Pretest Experiments

The initial design of the O*NET Program was based on rigorous testing and evaluation of alternative design protocols. Tests of seven different cash incentive plans were carried out between June 1999 and January 2000 on a sample of about 2,500 eligible businesses and 3,800 employees. In addition, various options for contents of the Toolkit for Business incentive were tested. These tests found that the best design appeared to be the combination of the $10 prepaid incentive to employees and various material incentives to the POC. A report documenting the pretest activity and results was included in the 2002 O*NET Office of Management and Budget (OMB) submission and can be found at https://www.onetcenter.org/reports/omb2002.html.

Point-of-Contact Incentive Experiment

The POC incentive experiment considered the effects on establishment and employee response rates of offering the POC a $20 incentive in addition to the other incentives that the POC receives for O*NET participation. About 80% of the approximately 10,500 establishments and 30,000 employees involved in the experiment were assigned to the treatment group, and the remaining 20% were assigned to the control group. With this large sample size, statistical power of the experiment was very high.

The results provided no evidence that the incentive had any effect on establishment cooperation rates: the POC appeared equally as likely to initially agree to participate with the $20 incentive as without it. Nor was there evidence of any benefit for employee response rates. Given these results and the considerable cost of providing monetary incentives to the POC, in December 2004 it was decided that the $20 POC incentive should be discontinued for all newly recruited POCs. Detailed results can be found in Biemer, Ellis, Pitts, and Aspinwall (2005).

Experiments in Weight Trimming

One potential source of the sampling error for O*NET estimates is due to the survey weights. The UWE, as it is known (see Section B.2.1), can be somewhat large because of the disproportionate sampling methods that must be employed to achieve data collection efficiency. The UWE can be reduced through weight trimming but only at the risk of increasing selection bias. Alternative methods for weight trimming were investigated from 2005 to 2007. This investigation involved assessing the effect of successively aggressive weight-trimming plans for a wide range of estimates and population domains. The weight-trimming analysis was comprehensive, including

comparison of UWEs,
graphical and tabular displays of current weight estimates compared with aggressively trimmed weight estimates,
evaluation of weights on suppression of estimates,
evaluation of statistical differences between current weight estimates and aggressively trimmed weight estimates, and
evaluation of substantive differences between current weight estimates and aggressively trimmed weight estimates.

The method and results of the evaluation are described in an internal report (Penne & Williams, 2007). The evaluation resulted in the implementation of a more aggressive weight-trimming plan that provides an optimal balance of sampling variance and bias.

Experiments in Model-Aided Sampling

The use of MAS to enhance the efficiency of the O*NET sample is considered one of the most important sample design innovations in the history of the program. This approach dramatically reduced data collection costs with minimal effects on the accuracy of the estimators. Work on the development of the MAS methodology began in early 2004 and continued through the end of 2007. As part of this research effort, various sample cutoff or stopping rules were investigated by means of Monte Carlo simulation. A cutoff rule determines the point at which efforts to interview certain types of establishments and employees are discontinued because the prespecified quota cell criteria have been satisfied. These studies showed that, under a wide range of cutoff rules, MAS does not bias the O*NET estimates or their standard errors. Furthermore, MAS substantially reduced the number of establishment contacts required to satisfy the random sample allocation for an occupation. This innovation resulted in substantial reductions in respondent burden, data collection costs, and time required to complete an occupation (Berzofsky et al., 2006).

Alternative Measures of Uncertainty

The standard error of an estimate is a measure of statistical precision that is inversely proportional to the sample size. However, for many O*NET data users, the interrater agreement for an average scale rating is very important for their applications. Therefore, beginning in 2005, alternative measures of uncertainty of scale estimates were investigated to supplement the current use of standard errors. Three alternatives—the standard deviation, kappa, and weighted kappa—were analyzed and contrasted, using actual O*NET data as well as simulation.¹⁵ This work led to the decision to make available the standard deviation as a second measure of uncertainty.

Suppression of Estimates with Poor Precision

Before the O*NET data are published, a set of suppression rules is applied to identify estimates that may have extremely poor precision due to very large variances or inadequate sample sizes. Estimates that fail these rules are flagged to caution data users that the estimates are unreliable and should be interpreted as such. Ideally, estimates that are sufficiently reliable are not flagged (i.e., suppressed). An optimal set of suppression rules balances the need to provide as much data as possible to the data users with the need to duly warn users about estimates that are extremely unreliable. In 2004 and 2005, alternative methods for suppression were investigated. The study also evaluated the then-current methodology for estimating the standard errors of the estimates—the generalized variance function (GVF) method. Using Monte Carlo simulation techniques, as well as actual O*NET data, the study found that the GVF method produced standard error estimates that were severely positively biased. As a result, many estimates that were sufficiently reliable were erroneously suppressed because their standard errors were overstated. Use of the GVF method was discontinued in favor of a direct method of variance estimation. Consequently, estimates published before this change were republished under the direct method of variance estimation. No other changes in the method of estimate suppression were required.

Dual-Frame Sampling for Hard-to-Find Occupations

Some occupations are rare and dispersed across many Standard Industrial Classifications (SICs), which is the system used by D&B for drawing samples of establishments from their database. This can make certain occupations difficult to find. Identifying an employee in one of these occupations can require calling scores of establishments over many weeks of data collection. To address this inefficiency, a dual-frame approach for sampling employees in hard-to-find occupations was tested. As the term implies, the dual-frame sampling approach involves two sampling frames: (1) the usual SIC frame, which has very good coverage of all occupations, regardless of their size, and (2) a smaller, more targeted frame (such as a professional association list), which may have much lower coverage of all employees in the occupation but contains a very high concentration of them, making them more efficient to sample and contact than a multitude of establishments in many industries. Using these two frames in combination provides the benefits of good population coverage with reduced data collection costs and reduced establishment burden. The testing and evaluation of the dual-frame sampling option was introduced in 2004 for selecting college teachers. This test showed the dual-frame approach to be an effective method of efficiently completing difficult-to-find occupations. Details regarding weighting and estimation for the approach were further developed and refined and, eventually, it was expanded for use in other hard-to-find occupations when suitable alternative frames of employees in the occupations are available.

Alternative Levels of Population Coverage

In constructing a sampling frame for an occupation, the sampling statistician has a choice: include all possible establishments where the occupation could be found, or include only those where the probability of finding the target occupation is high. The former strategy will generate a sampling frame having 100% coverage of the population but with many nonpopulation units. The latter approach will reduce the number of nonpopulation units (and the sampling and data collection costs) but at a lower rate of population coverage and, thus, with greater potential coverage bias. The optimal level of frame coverage is one that maximizes data collection efficiency while maintaining an acceptable level of coverage bias. Initial O*NET experience demonstrated that trying to build sampling frames with 100% coverage was inefficient and overly burdensome for many establishments, resulting in the decision to reduce the minimum coverage level from 100% to 80%. In 2004, experiments were conducted to evaluate the impact of further reducing coverage from 80% to 50% to more specifically target industries where occupations are primarily found and improve sampling efficiency. The two coverage levels were compared for many occupations with respect to their estimates and standard errors and the associated costs of completing the occupations. This evaluation clearly showed that frame coverage could be safely reduced to as low as 50% with essentially no risk of coverage bias and a substantial reduction in data collection costs. These results led to the adoption of a minimum coverage rate of 50% for all occupation sampling frames.

Analysis of Unit and Item Nonresponse

Nonresponse in the O*NET Data Collection Program can occur from the establishment POC at the screening/verification, recruiting, or sampling stages of selection. Within-unit nonresponse occurs at the employee level when a selected employee fails to complete and return a questionnaire. In addition, employees who return their questionnaires may inadvertently or intentionally skip one or more question items on the questionnaire. This type of missing data is known as item nonresponse. The effects of all three types of nonresponse on the estimates have been continually investigated since 2003 and reported to OMB; the nonresponse analysis for Analysis Cycles 16–18 appears as Appendix D.¹⁶ Such analyses have shown that nonresponse errors, whether unit, within unit, or item, do not appear to be a significant source of error in the O*NET program. In addition, the sampling weights are adjusted for nonresponse, which further minimizes any potential bias (Section ‎B.2.1).

New Occupation Identification System

The O*NET-SOC taxonomy was initially based on the Occupational Information Statistics survey SOC structure that was in place in 1998. Since that time, the taxonomy has been updated to reflect changes made by the SOC Policy Committee to the federal SOC taxonomy and to accommodate the identification of new and emerging occupations. Although these changes allowed O*NET to provide information on occupations that are relevant in the changing economy, they also resulted in some data management challenges. Since the beginning of the O*NET program, the O*NET-SOC code had been used as the unique occupation identifier; however, as the taxonomy changed, O*NET-SOC codes were revised, and occupations were surveyed more than once, it became necessary to change the way occupations were uniquely identified. Accordingly, an internal system was developed to assign (1) a unique occupation identifier, which remains constant for each O*NET-SOC occupation as the taxonomy changes over time, and (2) a unique data collection identifier for each instance of data collection. Therefore, regardless of how many times the occupation is surveyed, the occupation identifier remains constant; only the data collection identifier is incremented. Exhibit B-9 presents an example of the internal coding system.

Exhibit B-9. Example of Internal Coding System

Data Collection Instance	*ONET-SOC Code**	*ONET-SOC Title**	Unique Occupation Identifier (OID)	Unique Data Collection Identifier (IDI)
1	11-9011.01	Nursery and Greenhouse Managers	00036	00036.01.1
2	11-9013.01	Nursery and Greenhouse Managers	00036	00036.02.1
3	11-9013.01	Nursery and Greenhouse Managers	00036	00036.03.1

This system also allows for easier integration of data over time, and it improves our ability to monitor changes in individual occupations as the economy changes. The system is for internal use only; occupational data are still published under the official O*NET-SOC taxonomy coding.

Additional Tests of Procedures

This list of the O*NET tests of procedures is far from complete; many other, smaller tests and evaluations have been conducted for the O*NET Data Collection Program for purposes of continuous quality improvement. Still, the list indicates the breadth and depth of such investigations and highlights some of the major results and design innovations. Work continues on quality improvement to this day because optimum quality for a continuous survey of a dynamic, ever-changing population is, of necessity, an ongoing process.

Statistical Consultants

The DOL/Employment and Training Administration (ETA) official responsible for the O*NET Data Collection Program is Pam Frugoli (202-693-3643). Through a DOL grant, the National Center for O*NET Development in Raleigh, North Carolina, is responsible for managing O*NET-related projects and contracts and for providing technical support and customer service to users of the O*NET data and related products. The O*NET operations director is Jerry Pickett (919-814-0289).

Under contract to the Center, RTI International is responsible for providing sampling, data collection, data processing, and data analysis services. The RTI project director is Michael Weeks (919‑541‑6026). Additional analyses are provided by HumRRO, Inc., in Alexandria, Virginia, and by North Carolina State University in Raleigh, North Carolina. The statistical consultants listed in Exhibit B-10 reviewed this OMB Supporting Statement.

Exhibit B-10. Statistical Consultants

Name	Organization	Telephone Number
Nonfederal Statisticians and Researchers
James Rounds	University of Illinois at Urbana-Champaign	217-244-7563
Federal Government
Avar Consulting	On behalf of U.S. Department of Labor	301-977-6553

Data Collection/Analysis Contractor (RTI International)
Paul Biemer	RTI International	919-541-6056
Michael Penne	RTI International	919-541-5988

The primary authors of Section ‎B.2 of the Supporting Statement are Michael Penne and Paul Biemer. Mr. Penne is a senior research statistician and Dr. Biemer is a distinguished statistical fellow at RTI.

References

Berzofsky, M. E., McRitchie, S., & Brendel, M. (2012). Model-aided sampling: An empirical review. In Proceedings of Fourth International Conference on Establishment Surveys. Montreal, Quebec, Canada. Available from http://www.amstat.org/meetings/ices/2012/papers/301868.pdf

Berzofsky, M. E., Welch, B., Williams, R. L., & Biemer, P. P. (2006). Using a model-assisted sampling paradigm instead of a traditional sampling paradigm in a nationally representative establishment survey. In Proceedings of the American Statistical Association, Section on Survey Research Methods (pp. 2763–2770). Washington, DC: American Statistical Association. Retrieved from https://www.amstat.org/Sections/Srms/Proceedings/y2006/Files/JSM2006-000811.pdf

Biemer, P. P., Ellis, C. S., Pitts, A. D., & Aspinwall, K. R. (2005). A test of monetary incentives for a large-scale establishment survey. In Proceedings of the American Statistical Association, Joint Statistical Meetings. Alexandria, VA: American Statistical Association. Available from https://www.amstat.org/publications/booksandcds.cfm

Chromy, J. R. (1979). Sequential sample selection methods. In Proceedings of the American Statistical Association, Section on Survey Research Methods (pp. 401–406). Washington, DC: American Statistical Association.

Cicchetti, D. V., & Allison, T. (1971). A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings. American Journal of EEG Technology, 11, 101–109.

Deville, J. C., & Särndal, C. E. (1992). Calibration estimation in survey sampling. Journal of the American Statistical Association, 87(418), 376–382. doi:10.2307/2290268

Fleishman, E. A., & Mumford, M. D. (1991). Evaluating classifications of job behavior: A construct validation of the ability requirements scales. Personnel Psychology, 44(3), 523–575. doi:10.1111/j.1744-6570.1991.tb02403.x

Folsom, R. E., Potter, F. J., & Williams, S. R. (1987). Notes on a composite size measure for self-weighting samples in multiple domains. In Proceedings of the American Statistical Association, Section on Survey Research Methods (pp. 792–796). Washington, DC: American Statistical Association.

Folsom, R. E., & Singh, A. C. (2000). A generalized exponential model of sampling weight calibration for extreme values, nonresponse and poststratification. In Proceedings of the American Statistical Association, Section on Survey Research Methods (pp. 598–603). Washington, DC: American Statistical Association. Available from https://www.amstat.org/Sections/Srms/Proceedings/

Folsom, R. E., & Witt, M. B. (1994). Testing a new attrition nonresponse adjustment method for SIPP. In Proceedings of the American Statistical Association, Social Statistics Section. Washington, DC: American Statistical Association.

Johnson, L., Jones, A., Butler, M., & Main, D. (1981). Assessing interrater agreement in job analysis ratings. San Diego, CA: Naval Health Research Center.

Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys. New York, NY: Wiley.

Mumford, M. D., Peterson, N. G., & Childs, R. A. (1997). Basic and cross-functional skills: Evidence for the reliability and validity of the measures. In N. G. Peterson, M. D. Mumford, W. C. Borman, P. R. Jeanneret, E. A. Fleishman & K. Y. Levin (Eds.), O*NET final technical report. Salt Lake City, UT: Utah Department of Workforce Services through a contract with American Institutes of Research.

Penne, M. A., & Williams, R. (2007, July 30). Aggressive weight trimming evaluation. Research Triangle Park, NC: RTI International.

Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., Fleishman, E. A., & Levin, K. Y. (Eds.). (1997). O*NET final technical report. Salt Lake City, UT: Utah Department of Workforce Services.

Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., Fleishman, E. A., Levin, K. Y., Campion, M. A., Mayfield, M. S., Morgeson, F. P., Pearlman, K., Gowing, M. K., Lancaster, A. R., Silver, M. B., & Dye, D. M. (2001). Understanding work using the Occupational Information Network (O*NET): Implications for practice and research. Personnel Psychology, 54(2), 451–492. doi:10.1111/j.1744-6570.2001.tb00100.x

RTI International. (2013). SUDAAN® Version 11.0.1 for Windows. Research Triangle Park, NC: Author.

1 The establishment size category has four levels based on the number of employees working in an establishment: Unknown or 1 to 9 employees, 10 to 49 employees, 50 to 249 employees, and 250 or more employees.

2 Subscripts corresponding to occupation and establishment are added here for ease of notation when the supplemental samples are combined with the Establishment Method samples in the subsection on weighting.

3 As noted above, the goal is to collect data from at least 20 respondents. This is designed to ensure that at least 15 usable questionnaires will remain after data cleaning.

4 As noted in conjunction with Establishment Method Sampling Step 7, analysis weights are computed for each occupation, not separately for each of the three O*NET questionnaires used within each occupation.

5 Early empirical evidence showed that these characteristics had disproportionate response rates within them.

6 An upper bound of “8” equates to a response rate of 12.5% and is based on early empirical evidence within potential nonresponse characteristics. Note that upper bounds are adjusted to be as small as possible to help minimize changes in weights.

7 The UWE measures the increase in the variance of an estimate—an increase due to unequal weighting above the variance that a sample of the same size would yield if the weights were equal. The UWE is estimated by .

8 As noted elsewhere in this section, the BLs do contact occupation experts (the OE Method) directly, as well as job incumbents when sampling from a professional membership list in a dual-frame approach; no POC is involved.

9 No changes have been made to the O*NET questionnaires since the last OMB clearance package was submitted to OMB in 2015.

10 A few minor changes have been made to the letters and other materials mailed to survey participants since the last OMB clearance package was submitted to OMB in 2015. These are listed in a table at the beginning of Appendix B.

11 If no employee questionnaires have been received at the time of the last scheduled follow-up call, the case is referred to a Team Leader, who reviews the history notes for the case to try to determine if the POC actually distributed the questionnaires; if necessary and appropriate, the Team Leader will make an additional follow-up call to the POC.

12 See Section ‎A.1.5 for details on the Establishment Method response rate experience and a comparison of these response rates with those of other surveys.

13 See Section B.3 for a full description of the data collection protocol.

14 See Section A.1.5 for a discussion of O*NET’s response rate experience.

15 To define kappa and the weighted kappa, consider the cross-classification of ratings from two raters, A and B. For a standard 5-point Likert scale, the expected proportion of entries in cell of the AB table is for = 1, . . . 5, where is the number of raters in the sample that select category k. The kappa statistic is defined as

where is the agreement rate (sum of the diagonal elements of the AB table) and is the expected agreement rate, assuming completely random ratings (i.e., all categories are equally likely). The weighted kappa is similar, except the agreement rates, and , include some fraction, f_d, of disagreements, where d is the distance of a disagreement from the diagonal (see Johnson, Jones, Butler, & Main, 1981). In the O*NET application, the f_d weights proposed by Cicchetti and Allison (1971) were used, which have also been implemented in SAS Proc Freq.

16 Nonresponse analyses for earlier analysis cycles were submitted in previous OMB Supporting Statements.

File Type	application/vnd.openxmlformats-officedocument.wordprocessingml.document
Author	Loraine Monroe
File Modified	0000-00-00
File Created	2021-01-21