SDR 2023 OMB Supporting Statement Part B_FINAL

SDR 2023 OMB Supporting Statement Part B_FINAL.docx

2023 Survey of Doctorate Recipients (NCSES)

OMB: 3145-0020

Document [docx]
Download: docx | pdf


SF-83-1 SUPPORTING STATEMENT (Part B)


for the


2023


Survey of Doctorate Recipients




TABLE OF CONTENTS



B. COLLECTION OF INFORMATION EMPLOYING STATISTICAL METHODS 1

1. RESPONDENT UNIVERSE AND SAMPLING METHODS 1

2. STATISTICAL PROCEDURES 3

3. METHODS TO MAXIMIZE RESPONSE 5

4. TESTING OF PROCEDURES 12

5. CONTACTS FOR STATISTICAL ASPECTS OF DATA COLLECTION 17


LIST OF APPENDICES

Appendix A – NSF Act of 1950; America COMPETES Reauthorization Act of 2010 A-1

Appendix B – First Federal Register Announcement B-1

Appendix C – Examples of SDR Data in Use by the Research Community C-1

Appendix D.1 – SDR 2023 Questionnaire Changes D.1-1

Appendix D.2 – 2021 SDR Questionnaire D.2-1

Appendix D.3 – NSCG/SDR Retirement Module Development Literature Review D.3-1

Appendix D.4 – Retirement Module Phase 2 Cognitive Testing Final Report D.4-1

Appendix D.5 – Survey of Doctorate Recipients Sexual Orientation and Gender Identity Question Experiment Plan D.5-1

Appendix D.6 – Survey of Doctorate Recipients Race and Ethnicity Question Experiment Plan D.6-1

Appendix E.1 – Draft 2023 SDR Survey Mailing Materials E.1-1

Appendix E.2 – Prefield Contact Materials for 2025 SDR E.2-1

Appendix F – 2023 SDR Sample Allocation and Selection Table F-1

B. COLLECTION OF INFORMATION EMPLOYING STATISTICAL METHODS

1. RESPONDENT UNIVERSE AND SAMPLING METHODS

The 2023 SDR sample size is set at 125,246 cases including 115,246 continuing cases sampled from the last survey cycle, and 10,000 new cohort sampled cases who are recent doctorate recipients from academic years 2020 and 2021.


Approximately 40,000 of the 2023 SDR sample who participated in the 2015 SDR make up the SDR longitudinal sample representing those who were less than 65 years of age in the 2015 SDR target population moving forward into the 2023 survey cycle of data collection. This panel will be weighted and maintained up through the 2025 cycle of the biennial SDR to provide longitudinal data for the 10-year time period 2015-2025. (See “Consultation Outside the Agency” within Section A.8 for further background information on its development).


1.1 Frame

The source of the primary sampling for the SDR is the Doctorate Records File (DRF). The DRF is a cumulative file listing research doctorates awarded from U.S. institutions since 1920. It is updated annually with new research doctorate recipients through NCSES’s Survey of Earned Doctorates (SED). The 2021 SDR sample selected from the 2019 DRF represented a surviving population of nearly 1.16 million Science, Engineering, and Health (SEH) doctorate holders less than 76 years of age. The 2023 SDR is expected to represent about 1.23 million SEH doctorate holders from the 2021 DRF, including over 83,000 from the two most recent academic years, 2020 and 2021.


The target population for the 2023 SDR includes individuals who must:

  • Have earned a research doctoral degree in a SEH field from a U.S. institution, awarded no later than academic year 2021, and

  • Be less than 76 years of age on 1 February 2023 based on their month and year of birth, and

  • Be living in a noninstitutionalized setting on 1 February 2023, and not terminally ill.


The final 2023 SDR sampling frame can be classified into five groups as shown in Table 1 and described here.

  1. Frame Group 1 contains individuals that were identified as eligible for the 2015 SDR survey cycle. These cases were from the 1960 though 2013 SED academic years.

  2. Frame Group 2 contains individuals that became newly eligible for inclusion in the 2017 SDR survey cycle. These cases were from the 2014 and 2015 SED academic years.

  3. Frame Group 3 contains individuals that became newly eligible for inclusion in the 2019 SDR survey cycle. These cases were from the 2016 and 2017 SED academic years.

  4. Frame Group 4 contains individuals that became newly eligible for inclusion in the 2021 SDR survey cycle. These cases were from the 2018 and 2019 SED academic years.

  5. Frame Group 5 contains individuals that became newly eligible for inclusion in the 2023 SDR survey cycle. These cases are from the 2020 and 2021 SED academic years.


Table 1: The 2023 SDR Frame Groups by Sample Component

FrameGroup

Description

SED Academic Years (AY)

Population Size

Sample Component

Sample Size

1

2015 SDR target population that remain eligible for 2023

1960-2013

906,687

2015 SDR sample

78,619

2015 Supplemental sample (selected in 2019 from 2015 eligible frame)

7,376

2

2017 SDR newly sampled cases that remain eligible for 2023

2014-2015

81,963

2017 new cohort

9,326

3

2019 SDR newly sampled cases that remain eligible for 2023

2016-2017

82,083

2019 new cohort

9,947

4

2021 SDR newly sampled cases that remain eligible for 2023

2018-2019

85,083

2021 new cohort

9,978

5

New cohort cases from SED AY 2020 and 2021

2020-2021

83,367

2023 new cohort

10,000

Total

1,239,897

All

125,246



1.2 2023 Sample Design

In the 2015 survey cycle, the SDR sample size increased from 45,000 to 120,000 individuals. The goal of the large sample size increase was to improve the precision of estimates for key analytic domains of interest, especially the fine field of degree (FFOD) categories reported in the SED. Over 200 FFODs served as the explicit sampling strata in the sample design for both the 2015 and 2017 cycles. For the 2019 survey cycle, adjustments to the SDR sample design were made based on feedback from SDR stakeholders in combination with evaluations of the reliability and utility of the 2015 and 2017 estimates at the 200+ FFOD stratification levels. As with the 2019 SDR, the 2021 SDR stratified the sample frame by 77 detailed fields, and sex and minority status, rather than roughly 220 FFODs used in the 2015 and 2017 SDR cycles. The 2023 survey cycle will use the same general approach that was used in the last two cycles. The 2023 SDR sampling design has the following features:

  • The stratification is made of a set of 308 sampling strata defined by crossing 77 detailed fields of degree (DFOD) with gender (2 categories: male and female) and underrepresented minority (URM) status1 (2 categories: URM and non-URM). This stratification is designed to produce sustainable and reliable estimates for population subgroups that can be supported by the sample size and are aligned with the NCSES taxonomy of disciplines (TOD).

  • Estimation precision requirements are set for three levels of aggregation over the 308 sampling strata, as shown in Table 2.


Table 2: The 2023 SDR Overall Precision Requirements

Domain

Margin of Error

Minimum Number of Completes

DFOD

5%

270

DFOD x GENDER

6%

190

DFOD x URM

7%

135


In Table 2, the margin of error in the first column is two times the standard error associated with estimating a population proportion of 50% at the 90% confidence level. The second column shows the required minimum number of completed surveys to achieve the precision requirement per domain. Based on these constraints, minimum sample sizes for each sampling strata could be determined. Finally, the allocation was performed by finding an allocation that is as close as possible to the proportional allocation, subject to the minimum sample size constraints.

  • The third feature was to drop chronic nonrespondents from the sample. Cases which were part of the 2015 supplemental sample that were nonrespondents in the 2019 and 2021 cycles were dropped for the 2023 SDR (5,321 cases) and cases from the 2017 new cohort which were nonrespondents in the 2017, 2019, and 2021 cycles (1,363 cases) were dropped for the 2023 SDR cycle.


In addition, the 2023 sample includes 10,000 individuals sampled from among those who obtained their SEH doctorate degree since the 2021 SDR sample selection. This sample, referred to as the 2023 “new cohort” sample, follows the same stratification design as the continuing sample cohort. Similar to the change made to the URM stratification variable in the continuing sample cohort, the new cohort sample also will use a changed URM stratification variable definition to better align the SDR oversampling of URM with the URM definition used in NCSES publications such as the Diversity and STEM; Women, Minorities, and People with Disabilities report.2 In the 2023 SDR stratification, non-Hispanic multi-race is removed from the URM group and assigned to non-URM group. It is projected that this change in the URM definition will increase the number of Hispanic and non-Hispanic single race URM from 2,097 to 2,265. When drawing the new cohort sample, variables such as race/ethnicity categories, citizenship at birth, predicted resident location, disability status, age group, and doctorate award year, are used as sorting variables within each stratum to improve their representation in the sample. The new cohort sample is then drawn systematically with equal probability within each stratum (see Appendix F). After combining the continuing sample and new cohort, the 2023 SDR sample will consist of 125,246 individuals.


Furthermore, the 2023 SDR contains a longitudinal sample of 40,000 individuals that were selected in 2019 from among sample respondents to the 2015 SDR who were less than 66 years old on February 1, 2015, the survey reference date.3 This longitudinal sample will continue to be followed in 2023 and in the 2025 survey cycle. Because these cases are part of the cross-sectional sample, data collection, editing, and other processing steps will not be treated differently. However, the longitudinal sample requires statistical procedures that differ from those used for the cross-sectional sample, as described in Section 2 below.

2. STATISTICAL PROCEDURES

The SDR statistical data processing procedures have several components including sampling weight adjustments to compensate for the stratified sampling design features and differential response rates, imputation procedures to address item nonresponse, and variance estimation procedures for calculating sampling error.


2.1 Weighting

A final weight will be computed for each completed interview in the cross-sectional sample including its longitudinal sample cases. These weights are intended to be used to conduct cross-sectional statistical analysis of the data from all 2023 SDR respondents so that the results represent the eligible population of doctorate recipients (i.e., individuals who earned a research doctoral degree in a science, engineering, or health field from a U.S. institution awarded no later than academic year 2021 and are less than 76 years of age). The weighting procedures consist of a series of statistical adjustments to the original sampling weights and will follow methods similar to those applied in the development of the 2021 SDR weights. These methods are briefly described below.


For a sample member , its original sampling weight will be computed as

where is the inclusion probability under the sample design.


The sampling weight will be adjusted in sequence for unknown eligibility, unit nonresponse, and frame coverage based on similar methodologies used for the 2021 SDR. First, for cases whose eligibility status is not determined by the end of the survey, their assigned base weights are transferred to cases whose eligibility is known. Next, among known eligible cases, the weights of nonrespondents are transferred to the respondents so that the respondents represent all eligible cases in the sample. Finally, a raking adjustment aligns the sample to the frame population so that the sample estimates agree with the frame counts with respect to factors not explicitly controlled for in the sample design.


Logistic regression models will be used to derive unknown eligibility and nonresponse weighting adjustment factors for different segments of the sample. Resulting propensity scores will be used to define weighting classes, and extreme weights will be trimmed to reduce the variation of the weights prior to and after raking. With a final weight, the Horvitz-Thompson estimator will be used to derive point estimates for SDR variables.


In addition to the weights for the cross-sectional sample, weights will also be created for the longitudinal sample. Methods similar to those used for the 2021 longitudinal weighting will be followed. Because this sample is drawn from the 2015 SDR respondents, the target population for these weights is the estimated 860,264 individuals who had received a research doctoral degree in a science, engineering, or health field from a U.S. institution by June 2013 and are less than 66 years of age on 1 February 2015. As with the cross-sectional weights, the longitudinal weighting procedures consist of a series of statistical adjustments to their sample weights as well. To account for the two-phase nature of the sample, the sampling weights are the final 2015 cross-sectional weights divided by the selection probabilities associated with inclusion into the longitudinal sample. These sampling weights are then adjusted for unknown eligibility and nonresponse, similar to the procedures for the cross-sectional sample. Finally, they are adjusted by raking to the 2015 frame totals noted above and to the cross-sectional population estimates of the 2017, 2019, 2021 and subsequent 2023 SDR.


2.2 Item Nonresponse Adjustment

Historically, the SDR has conducted comprehensive imputation to fill in item-level missing data in the cross-sectional sample. Two general methods of imputation, logical imputation and hot deck imputation, have been used. The logical imputation method is employed during the data editing process when the answer to a missing item can be deduced from past data, or from other responses from the same respondent. For those items still missing after logical imputation, a hot deck imputation method is employed.


In hot-deck imputation, a missing survey item for any respondent is replaced with reported data from another respondent, referred to as a donor. The respondent with the imputed survey data is referred to as a recipient. For each imputed item, potential donors are selected within the same class to match recipients based on key class variables. The 2023 SDR will use similar imputation techniques, although the actual imputation models may differ since we will have additional data from the 2021 cycle to identify donors, instead of only considering same-cycle (2023) data.


For the longitudinal sample, item-level imputations from the cross-sectional sample will be retained for respondents to each cycle year. If a full cycle of data is missing for longitudinal sample members, a combination of logical and hot-deck imputation will be used to fill in complete information for the missing cycle year. In the longitudinal hot-deck imputation method, donors are selected based on similarity in key variables in the observed cycle years (i.e., responses in any of the 2015, 2017, 2019 and 2021 cycles).


2.3 Variance Estimation

The SDR has used the Successive Difference Replication Method (SDRM) for variance estimation since 2015. The SDRM method was designed to be used with systematic samples when the sort order of the sample is informative. This is the case for the 2023 SDR, which employs systematic sampling after sorting cases within each stratum by selected demographic variables. As in prior cycles, a total of 104 replicates will be used for both the cross-sectional and the longitudinal samples for the 2023 SDR. Within each replicate, the final weight is developed using the same weighting adjustment procedures applied to the full sample (i.e., the cross-sectional and longitudinal sample combined). In the case of the longitudinal sample, the two-phase nature of the sampling weights will be incorporated into the variance estimation by applying the raking step for each replicate to control totals that are derived from the cross-sectional replicates instead of the fixed control totals used for the cross-sectional sample. The SDRM replicate weights can be used to estimate the variance of point estimates by using survey variance estimation software packages such as SAS or R.

3. METHODS TO MAXIMIZE RESPONSE

3.1 Maximizing Response Rates

The weighted response rate for the 2021 SDR was 65% (unweighted, 65%). To attain a targeted response rate of 70% for 2023, extensive locating efforts, nonresponse follow-up survey procedures, and targeted data collection protocols will be used during data collection. In addition, both an early-stage and late-stage monetary incentive will be offered as outlined in Section A.9 and Section B.4.


3.2 Locating

Continuing sample members who were categorized as being difficult to locate or were not found in 2021 and new sample members with incomplete contacting data will first need to be located before a request for the 2023 survey participation can be made. The 2023 SDR will follow a locating protocol similar to the approach implemented in prior cycles. The contacting information obtained from the 2021 SDR and prior cycles will be used to locate and contact the continuing sample members; the information from the SED will be the starting information used to locate and contact the new sample members in 2023.


2023 SDR Locating Protocol Overview. As in prior SDR cycles, there will be two phases of locating for the 2023 SDR: prefield locating and main locating. Prefield locating activities include batch processing of sampled cases through LexisNexis® (formerly, Accurint®)4 and online searches, address review, and individual case locating (also called manual locating). Prefield locating occurs approximately three months before the start of data collection and is used to ensure that initial invitational outreach by mail, telephone, and email requesting survey participation reaches as many sample members as possible. Prefield individual case locating includes online searches, telephone calls, and emails to sample members, and telephone calls and emails to contact persons (previously provided by the sample member in the SED or prior SDR) who may know how to reach the sample members. Sample members who are eligible for individual case locating may also receive individual LexisNexis® AML Insight match (AIM) searches conducted by the manual locators. Main locating includes manual locating and additional LexisNexis® searching as needed. Main locating activities will begin at the start of data collection and will include contact (by mail, telephone, or email) with sample members and other contact persons. Both the prefield and main locating activities will be supported by an interactive (i.e., real-time) online case management system (CMS). The case management system will include background information about each case, all the locating leads, a history of all searches conducted, and all outreach attempts made which led to the newly found contacting information (including mailing addresses, telephone numbers, and email addresses). CMS information will be integrated with survey paradata and monitoring metrics that support an adaptive design approach (see Section B.4.4 for more information on the adaptive design plans for 2023).


Prefield Locating Activities. The prefield locating activities consist of four major components as follows:


  1. Prior to any mailing, the addresses for both the continuing sample component and the new cohort sample component will be run through Quadient Bulk Mailer SMB software. This is an address cleaning software that identifies potentially undeliverable addresses and incorporates the U.S. Postal Service’s (USPS) automated National Change of Address (NCOA) database to check for forwarding addresses. The NCOA incorporates all change of name/address orders submitted to the USPS nationwide for residential addresses. The NCOA database maintains up to 48 months of historical records of previous address changes. However, the NCOA updates will be less effective for the new sample because the starting contact information from SED could be up to three years out of date.

  2. The sample will be assessed to determine which cases require individual prefield locating. This assessment is different for the continuing cases than for the new cohort sample component.

    1. Prefield locating will be conducted on cases which could not be found in the prior round of data collection or ended the round with unknown eligibility (meaning we could not confirm if the sample member received the SDR contacts). A LexisNexis® batch search will be run on the continuing cohort using the available prior survey cycle information. In addition, all continuing sample members over 49 years of age will receive a more limited LexisNexis® batch search for potential date of death.

    2. For the new cohort, a LexisNexis® batch search will be run using the available information provided in the SED 2020 and 2021. An assessment of SED data and returns from LexisNexis® will determine which of new cohort cases will be identified for individual prefield locating.

  3. The returned results from LexisNexis® will be assessed to determine which cases are ready for data collection and which require further prefield locating. There are four potential data return outcomes from the LexisNexis® batch search for both the continuing sample and the new cohorts:


  1. Returned with a date of death. For those cases that return a date of death, the mortality status will be confirmed with an independent online source and finalized as deceased. When the deceased status cannot be confirmed, the cases will be queued for manual prefield locating and the possible deceased outcome will be noted in the case record so further searching for a deceased confirmation and possible date of death may be conducted.

  2. Returned with existing address confirmed. For cases where LexisNexis® confirms the prior cycle’s address or the SED address as current (i.e., less than two years old), the case will be considered ready for data collection and will not receive further prefield locating.

  3. Returned with no new information. For cases where LexisNexis® provides no new information or the date associated with new contacting information is more than two years out of date, the cases will be queued for manual prefield locating.

  4. Returned with new information. When LexisNexis® provides new and current contacting information, the new information will be used for the start of data collection and the case will be considered ready for data collection with no further prefield locating.

  1. The manual locating effort throughout prefield locating involves a specially trained locating team that will conduct online searches and make limited calls to sample members and outreach to contact persons for those individuals not found via the automated searches. Only publicly available data will be accessed during the online searches. The locating staff will use search strategies that effectively combine and triangulate the sample member’s earned degree and academic institution information, demographic information, prior address information, any return information from LexisNexis®, and information about any nominated contact persons. Locators will search employer directories, education institutions sites, alumni and professional association lists, white pages listings, real estate databases, online publication databases (including those with dissertations), social media platforms, online voting records, and other administrative sources. Bilingual locators will be hired whenever possible to facilitate locating staff who may live outside the US. Locating staff will be carefully trained to verify they have found the correct sample member by using personal identifying information such as name and date of birth, academic history, and past address information from the SED and the SDR (where it exists).

Additionally, the 2023 SDR will use LexisNexis® AML Insight to conduct individual matched searches, also known as AIM searches. AIM allows locators to search on partial combinations of identifying information to obtain an individual’s full address history and discover critical name changes. This method has been shown to be a cost-effective strategy when locating respondents with out-of-date contact information in prior SDR cycles as well as other studies. The AIM searching method will be implemented by experienced locating staff and will be conducted on the subset of cases not found with regular online searches.

Main Locating Activities. Cases worked in main locating will include those not found during the prefield locating period as well as cases determined to have outdated or incorrect contacting information from failed 2023 data collection outreach activities. Prior to beginning the main locating work, locating staff who worked during the prefield period will receive refresher training that focuses on maintaining sample members’ confidentiality particularly when making phone calls, or supplementing online searches with direct outreach to the sample members and other individuals, and gaining the cooperation of those sample members and other individuals successfully reached. The locating staff will continue to use and expand upon the online searching methods from the prefield period and, ideally, gain survey cooperation from the found individuals. In addition to outreach to sample members, main locating activities during data collection will include calls and emails to dissertation advisors, employers, alumni associations, and other individuals who may know how to reach the sample member.



3.3 Data Collection Strategies

As with prior cycles, the 2023 SDR will continue using a multi-mode data collection protocol including self-administered web forms, mailed paper self-administered questionnaires (SAQ), and computer-assisted telephone interviews (CATI) to facilitate survey participation, data capture, and sample member convenience. The 2023 SDR data collection protocols and contacting methods build upon the methodology used in prior cycles. The 2023 SDR data collection will include a variety of mail, email, and telephone outreach efforts organized in four phases over a 6-month period (i.e., starting, interim, late-stage, and last chance). The general contacting strategy for sample members residing in the U.S. and internationally will be the same (as opposed to the 2021 round where international sample members received fewer mailings). The format and content of mailings will differ across each mailing. An email will go out paired with the mailed letters. Messages will be tailored based on past response history and whether the case is from the new cohort or reported being retired in the SDR. The 2023 SDR main data collection protocol is illustrated in Figure 2 of Section B4.4.


In the starting stage, all sample members will initially be encouraged to respond to the 2023 SDR via the online questionnaire as was done in 2021. Most sample members who were in 2021 will initially receive a postal letter and email encouraging them to participate in the 2023 SDR by completing the online questionnaire. New cohort sample members, as well as nonrespondent new cohort sample members from 2019 and 2021, will receive an infocard with their initial invitation. The invitation postal letter will include the 2023 SDR URL and their Personal Identification Number (PIN). The invitation email will include a live link that will take the sample member directly to the starting page of the SDR 2023 web instrument. In 2021, 99% of respondents participated via the online questionnaire. Due to the changes in data collection protocols, NCSES expects this percentage to be slightly lower in 2023 but still over 90%. Sample members will receive a follow up mailing in weeks 3 and 4. Telephone prompting will begin in week 4, except for sample members who refused the survey in the past. Callers will remind respondents about the survey, resend survey access by email (if needed), and offer to provide a CATI interview if preferred. Sample members who responded by paper in previous rounds will be in an accelerated paper mode and will be mailed a self-administered paper form in week 6.


In the interim mode phase, sample members will receive a reminder letter via email and USPS letter mailing at week 7. Hardcopy paper questionnaires will be offered at week 9 to sample members who have not yet responded (except those who received their hardcopy in week 6), followed by a reminder postcard a week later. In addition to these targeted mailings, the contractor will send paper questionnaires upon request throughout the data collection cycle.


The late-stage phase starts with an infocard mailing to all sample members who have not yet responded, followed by telephone prompting in week 14 and a reminder email only in week 15. A second hard-copy paper survey will be sent in week 16 with a reminder postcard a week later. Respondents who have refused in the past will receive only the infocard mailing and the reminder email.


The last chance stage of the contacting protocol begins with a reminder letter in week 19 that will include a non-monetary token of appreciation (like a credit-card sized magnifying lens) followed by telephone prompting in weeks 20 and 21. This stage also offers an abbreviated version of the 2023 SDR, the Critical Item Only (CIO) survey, as a method to motivate participation among these most reluctant sample members who may not have time available to complete the full survey. The CIO instrument has been used in the prior SDR cycles as a method to motivate response as a last call for participation. Sample members will be offered the CIO in a letter, then email, then over the phone in weeks 22 through 24. A final request letter and email will go out during week 25 with a final telephone prompt in week 26.


As with the 2021 SDR and prior cycles, sample members who were a hostile refusal in a prior cycle will receive limited contact requesting their participation in the 2023 SDR. Prior round hostile refusals will only be contacted once, with a mailing of the hard-copy questionnaire with a cover letter that acknowledges their refusal and offers the web CIO. In 2021, the contractor fielded initial invitations to a maximum of 86 sample members who had closed out in the 2019 cycle as hostile refusals and four completed the survey. Any continuing sample members classified as “congressional” refusals will not be contacted in 2023. Table 3 shows the maximum number of contacts to be made by mode and cohort for all other sample members.


Table 3: Maximum number of contacts by cohort and mode of contact


2023 SDR Cohort and Refusal History

Domestic and International

Mail

Email

New, all

11

7

Continuing sample, non-refusal

11

7

Continuing sample, past non-hostile refusal

8

5

Continuing sample, past hostile refusal

1

0



3.4 Incentive Plan for 2023

The use of incentives as part of the SDR data collection strategy began during the 2003 cycle and has continued for all subsequent cycles. As was briefly summarized in Section 9 of Supporting Statement A, the 2023 SDR incentive plan is modeled after the approach used in SDR survey cycles since 2013, using an early- and late-stage incentive protocol to gain cooperation from respondents. In the initial phase of data collection, incentives will be offered at two different time points to two types of sample members: 1) reluctant past participants and reluctant new cohort sample members, and 2) nonresponding continuing sample members and nonresponding new cohort sample members after the starting phase of data collection. Later in the field period, incentives will be offered to a subset of nonresponding sample members not selected for early-stage incentives. The selection of nonresponding sample members for the incentive offer will be incorporated into the adaptive design strategy to mitigate nonresponse bias of underrepresented subgroups in the final data set.


To ensure that incentives are only received by the targeted sample member, they will be primarily sent in the form of a prepaid $30 check made out to the sample member. This is a shift from the 2021 round, which primarily used prepaid debit cards. The 2023 SDR will also incorporate and test an alternate mode of incentive delivery that also ensures only the targeted sample member receives the incentive, the offer of a $30 electronic VISA gift card delivered immediately upon survey completion.


Early Incentive Plan

In the early phase of data collection, an incentive will be offered at two different time points. The first incentive offer will occur in week 3 of field period and will be made to two types of sample members: 1) reluctant past participants, and 2) reluctant new cohort sample members.


The reluctant past respondents are defined as returning sample members who only participated in the SDR after an incentive was offered and that incentive was cashed for the latest two survey cycles. New cohort sample members will be defined as reluctant if they refused or failed to complete the SED. Reluctant past respondents and reluctant new cohort sample members eligible for the incentive will be offered a $30 personalized check in the second postal contact attempt in week 3.


Other continuing sample and new cohort sample members who are slow to participate and remain a nonrespondent after the starting phase of data collection, will be offered an incentive in week 7. This group will be eligible for selection into an incentive offer experiment that will test the efficacy of receiving an incentive offer in the form of a prepaid check (treatment 1) or a post-paid incentive gift code (treatment 2) versus no incentive offer (control). The adaptive design strategy will be applied to select the experimental groups and allocate the incentive offers to a subset of nonresponding sample members to mitigate nonresponse bias of underrepresented subgroups.


Given the large amount of time that has passed since a post-paid incentive has been tested in the SDR and the greater efficiency in delivering a post-paid incentive via an electronic gift card in an email immediately following survey completion over mailing a personalized check by USPS mail, the 2023 SDR incentive experiment will compare the outcomes and costs associated with the two treatments: 1) the survey request sent with a personalized check sent via USPS; and 2) the survey request which offers the an electronic VISA gift card immediately after the survey is complete also sent via USPS (see Appendix E.1 for the survey request letters offering the incentive upfront as a check or after survey completion as an electronic gift card).


Late-Stage Incentive Plan

The overall strategy for the late-stage incentive is to ensure that all U.S. residing sample members who have been subject to the standard survey data collection protocols and remain survey nonrespondents will have a probability of receiving a monetary incentive at the start of the late-stage data collection phase.


To allocate the available limited resources for the monetary incentive to late-stage survey nonrespondents most effectively, there will be an analysis of the characteristics of the remaining nonrespondents using a statistical model to determine which sample members should receive additional inducement to mitigate potential nonresponse bias; the cases who are least similar to the existing survey participants and have the highest propensity to respond will be selected for the incentive provided they reside in the U.S. The volume of late-stage nonresponse cases to be incentivized will be determined based on the requirement that not more than 25 percent of the sample – across both the early and late-stage incentive offers – will be offered the incentive. The final number of cases to be offered an incentive in the late-stage phase will be made through discussions between NCSES and the data collection contractor.


Also, during this late-stage phase, any nonresponding sample members selected for any early incentive in the form of a prepaid check who were not previously sent their incentive due to locating problems or a lack of a mailing address, will be issued or reoffered the incentive at this time. Those nonrespondents who were successfully sent the incentive in the form of a prepaid check during the early phase will receive the non-incentivized late-stage treatment. Any nonresponding sample member selected for an early incentive in the form of the post-paid gift code will be reminded that the incentive offer is still available.




Incentive Experiment Design

SDR sample members are highly educated which is associated with increased levels of geographic mobility.5 Increased geographic mobility leads to a higher need for locating SDR sample members from one cycle to the next and an increased chance that survey outreach contacts could be sent to the wrong USPS or email address. As a result, to reduce the chance an incentive offer is inadvertently sent to the wrong address and redeemed by a non-sample member, the incentive offer method needs to have restrictions on by whom or how the offer can be redeemed. This requirement to restrict who may redeem the incentive offer excludes using cash and non-personalized debit cards as the incentive offer method.


From the 2006 through 2019 cycles, the SDR effectively ensured the incentive offer was only redeemed by the target sample member by sending prepaid personalized checks. An alternative method to ensure the incentive offer is only redeemed by the target sample member is to offer a post-paid incentive after survey completion.


The SDR has only offered a post-paid incentive in the 2003 cycle as a part of a late-stage incentive experiment. The 2003 incentive experiment showed that a $30 prepaid, personalized check was more effective than a $50 post-paid offer. A similar experiment was conducted in 2018 in another study, where oncologists were offered either a $50 prepaid check versus a $50 post-paid check and it also found the prepaid check was more effective.6 However, in a survey of high-achieving, low-income college bound high school seniors, a test of a $15 prepaid incentive versus a $15 post-paid incentive in the form of an immediately redeemable electronic gift card showed the post-paid incentive was more effective in gaining survey response.7


To test the efficacy of the two incentive offer methods, the two modes of incentive delivery will be randomly assigned to selected experimental groups in week 7 of the data collection period. The treatment group 1 will be offered the incentive in the form of a prepaid $30 check personalized for the sample member, and the treatment group 2 will be offered the incentive in the form of a $30 electronic VISA gift card delivered immediately upon survey completion; control cases will be sent a survey request without an incentive offer. For this experiment, incentives will be offered to a subset of sample members and an adaptive design strategy will be applied for selecting incentive samples to mitigate nonresponse bias of underrepresented subgroups in the final data set.


The benefits of using the post-paid electronic gift card incentive method are the extremely high assurance that only those that complete the survey receive the incentive, the incentive is only provided to the target sample member who passed the “sample person verification” process, and the incentive offer method is secure and extremely efficient. New research is needed as the delivery of the post-paid incentive in the form of an electronic gift card is improved and immediate. Electronic gift card incentives have a significant advantage in survey management as they are more efficient to offer, administer, and track than personalized checks. Further, there is no concern that any incentive funds will be provided to a non-participant as they are only fulfilled after survey completion.


Ideally, this research will show that the two incentive offer methods have a similar positive effect on survey response. This finding would allow the government to implement the more efficient incentive strategy in future rounds of the SDR and other surveys with confidence.


4. TESTING OF PROCEDURES

The SDR and NSCG are complementary workforce surveys. Therefore, the two surveys must be closely coordinated to provide comparable data. Many of the questionnaire items in the two surveys are the same, including the reference date of 1 February 2023.


The complementary survey questionnaire items are divided into two types of questions: core and module. Core questions are defined as those considered to be generic to both the SDR and NSCG. These items are essential for sampling, respondent verification, basic labor force information, and NCSES analyses of the science and engineering workforce. The SDR and NSCG surveys ask core questions of all respondents each time they are surveyed to establish baseline data and to update the respondents’ labor force status, changes in employment, and other characteristics. Module items are special topics that are asked less frequently on a rotational basis. Module items provide the data needed to satisfy specific policy or research needs.


As in the 2021 SDR, sample members living inside and outside of the U.S. will receive the same questionnaire content. However, the 2023 questionnaire will reflect four types of modifications relative to the prior cycle.


  • First, in alignment with the NSCG, the 2023 SDR will discontinue modifications that were made in the 2021 SDR to measure the effect of the coronavirus pandemic on the sample members’ employment situation. NCSES has discontinued modifications to the response categories for five of the current SDR employment questions and has removed follow-up questions to the salary and earned income items. As a result of these changes, the 2023 SDR content on employment situation will be returned to their 2019 versions. The SDR will also follow the NSCG by:

    • modifying the telework measure to collect information about remote work in general, removing reference to the coronavirus pandemic, and

    • updating the list of occupation codes.

The SDR will also add a request to consent to receiving text messages for the 2025 SDR data collection prompting. See Appendix D.1 for the set of items that have been added or changed from the 2021 SDR questionnaire found in Appendix D.2.


  • Second, the 2023 SDR will largely continue the implementation of dependent interviewing that was introduced in the 2021 SDR electronic web and CATI questionnaires. The prefilled form version of those questionnaires will be slightly modified based on an assessment of the analytic report from the 2020 SDR Dependent Interviewing Survey Pilot Study and of prior implementation of prefilled items on the 2017 CATI questionnaire.


  • Third, the 2023 SDR will feature a new retirement module to collect objective data on sample members’ pathways to full retirement, as well as circumstances and reasons for retirement, and post-employment activities. The SDR has always included questions about the timing of and reasons for retirement, but these questions have asked about a very limited set of measures and fail to capture key aspects of what can often be a complex and multifaceted process. The retirement module collects data that creates a nuanced picture of the pathways to retirement, recognizing the partial and phased approach people may take to decreasing and eventually breaking off from workforce participation.


The retirement questions were developed from items in the following surveys: Health & Retirement Study (HRS); National Health & Aging Trends Study (NHATS); National Social Life, Health, and Aging Project (NSHAP); Survey of Health, Ageing, and Retirement in Europe (SHARE); and English Longitudinal Study of Ageing (ELSA). However, these studies differ from the SDR in that they have different target populations, thus limiting their utility for making estimates about the science and engineering workforce, and that they are cross-sectional rather than longitudinal. Surveying the same individuals over time lends richness to the data and will yield information on the prevalent pathways to retirement for the nation’s scientists and engineers.


Development and testing of the retirement module items were informed by a literature review conducted to inform the development of the retirement module for the NSCG and SDR is included in Appendix D.3. Findings from the subsequent three iterative rounds of cognitive testing and instrument refinement on the new items are documented in Appendix D.4. The retirement module items for inclusion in the 2023 subsequent to development and several rounds of cognitive testing may be found in Appendix D.1 which shows the set of items that have been added or changed from the 2021 SDR questionnaire, including the retirement module questions.


  • Fourth, the 2023 SDR will include an experiment to test three versions of questions on sex at birth, gender identity, and sexual orientation (SOGI) using the web instrument. This experiment is a part of the NCSES continuing research on optimal methods for collecting the SOGI information for three distinct important demographic constructs. This experiment includes questions developed by OMB’s best practices for general population surveys and by the methodological study tailored for the SED’s new recipients of research doctorates. The differences between the questions asked in the SDR are in response to the differences in privacy and confidentiality requirements for the two surveys. The SOGI items will be included as a separate experimental section of the web survey after the demographic section. Respondents will be informed that the SDR is asking the questions in this experimental section to be more inclusive and to study the differences in employment outcomes among minority groups and that their response will be kept confidential, used for experimental research only, and reported in a format that does not lead to individual identification.


  • After asking the SOGI items, respondents will be asked a brief set of follow-up questions to gauge their privacy concerns related to this new information collection. The 2023 SDR SOGI experiment plan is in Appendix D.5.


  • Fifth, the 2023 SDR will test questionnaire modifications for the race and ethnicity questionnaire items. This research will support ongoing efforts for updating the OMB’s race and ethnicity statistical standards. A set of revised wording options8 for collecting race and ethnicity information will be included in the web survey.


4.1 Survey Contact Materials

To effectively gain survey cooperation in the 2023 cycle, survey contact materials will be tailored to sample members’ sample type, current predicted location, past participation, and reported retirement status (see Appendix E.1 for tailoring implemented in the mailing materials). All contact materials will request sample member participation via the web survey and will include access to the online survey. As has been done since 2003 SDR, the 2023 SDR letterhead stationery will include project and NSF/NCSES website information, NSF mailing address, and the data collection contractor’s project toll-free telephone line and email address. The stationery will contain the survey’s established logo as part of an effort to brand the communication to sample members for ease of recognition. The back of the stationery will display the basic elements of informed consent. See Appendix E.1 for draft copies of the contacting materials including the 2023 letterhead.


Between Cycle Contact Update Form

Before the 2025 SDR administration, we will reach out to sample members via email who need locating (yet have an email address on file) to ask them to update their contact information in advance of survey administration. The between-cycle period will occur several months before the start of pre-field locating. At this time, NCSES will attempt to contact approximately 30% of the production sample to update or confirm their mail, email, and phone information. In the prefield period before the 2025 SDR, sample members will receive an email asking them to complete a short Contact Update Form which should take on average 3 minutes to fill out. This reduces the amount of effort and resources we will need to devote to locating and hopefully increases participation in the 2025 SDR. If typical batch tracing services provide valid contacting information for sample members, they will be dropped from this contact. Drafts of the emails and the Contact Update Form to be used in the 2025 prefield period are in Appendix E.2.


4.2 Questionnaire Layout

There were some changes to the 2021 SDR questionnaire layout for the 2023 survey to accommodate the addition of the new retirement module. These new questions will be asked after questions about the current job (Part A) and past work experiences (Part B), and prior to the module on other work-related experiences (Part C). Through cognitive research, testing, and other policy relevant interests, NCSES continues to review and revise the content of its survey instruments. After the 2021 data collection, NCSES made minor modifications to question lead-ins and response categories common to both the SDR and NSCG to increase consistency between the two surveys. NCSES will review the information after the 2023 round and will propose and test changes and content improvements for the 2025 survey cycle.


4.3 Web-Based Survey Instrument

The SDR first introduced an online mode in 2003. Figure 1 shows the rate of the SDR web survey participation from the 2003 through 2021 survey cycles.


Figure 1: Web Mode Participation Rate: 2003-2021 SDR

*Other response modes are self-administered mail-in form or telephone interview.


As in 2021, the 2023 online survey will be a mobile aware survey that renders in a user-friendly format on mobile devices (e.g., smartphones and tablets) so that the respondent experience with the online survey will be similar regardless of the screen size or web browser used to access the survey. Over 90% of the SDR respondents are expected to participate via web based on the online participation in the previous survey cycles. Of web respondents, 16% participated via a mobile device in 2021.


4.4 Adaptive Design Goals and Monitoring Metrics

Adaptive survey designs provide a framework for data-driven tailoring of data collection procedures to different sample members, often for cost and bias reduction.9 The 2023 data collection will include an adaptive design strategy to help achieve a balanced sample to minimize the potential for nonresponse bias, achieve targeted numbers of completes for key analytic domains, and maintain the efficiency of data collection. This is the 5th cycle of the SDR to apply an adaptive design approach. The 2023 emphasis will continue building on the procedures implemented during prior cycles, including the development of improved monitoring metrics that assess the adaptive design’s effects of treatments and interventions used in prioritizing nonrespondents for data collection. In addition, there will be a continued investigation on how best to use interim estimates produced from flow processing in the adaptive design framework. Flow processing will be conducted weekly after the first month of data collection. This will include running all coding, editing, weighting, imputation, and variance estimation to produce key estimates during data collection.


The 2023 SDR data collection will be implemented in four data collection phases over the course of 26 weeks. The four data collection phases are (1) starting, (2) interim, (3) late-stage, and (4) last chance. Adaptive design interventions will vary across the prefield period and four phases of data collection. As shown in Figure 2, these interventions will impact the following data collection operations: (1) locating, (2) incentive offers, (3) telephone outreach, and (4) mailings.


The 2023 SDR will feature an adaptive design strategy using priority scores (high or low) and case ranking (numeric ordering of cases) to direct differential treatment of cases during data collection and to prioritize the workload in locating and telephone prompting tasks. The priority and case ranking variables will be created at the start of the prefield period and subsequently updated four times before the start of each data collection phase to direct the appropriate treatment of the cases during that phase based on the most up-to-date information available about the sample. The prioritization is based on a mix of prior round outcomes, predicted locating and response probabilities, and current round paradata. In addition, a metric that measures the importance of the case in achieving sample size and precision goals in a set of target analytic domains is used in the prioritization, particularly later in data collection.



Figure 2: 2023 SDR Main Data Collection Protocol and Differential Adaptive Design Locating and Data Collection Treatments

Phase

Week

Main Data Collection Protocol

Differential Locating Treatment

Differential Data Collection Treatment

Prefield

3 months prior to Week 1

Prefield locating

Locating cases searched in priority rank order.

High Priority Treatment
:
• AIM locating
• 25 minutes per case

Not applicable.

Low Priority Treatment:
• 15 minutes per case

Starting

1

Initial invite (M&E)

Locating cases searched in priority rank order.

Telephone prompting calls made in priority rank order.

2

 

High Priority Treatment:
• AIM locating
• 45 minutes per case

 

3

Follow-up contact (M&E)

 

4

Telephone prompting

 

5

Low Priority Treatment:
• 25 minutes per case

 

6

 

Interim

7

Reminder contact (M&E)

Locating cases searched in priority rank order.

While not implemented as part of the overarching adaptive design strategy, week 7 includes an incentive mode experiment.


High Priority Treatment:

Incentive mode experiment treatment groups will be offered $30 pre- or post-paid incentive in week 7.


Low Priority Treatment:

Control groups will receive the reminder mailing only.

8

 

Same protocol as Starting Phase plus…

High Priority Treatment
: Adds Expert locating.

9

Questionnaire #1

10

Reminder PC

11

 

Low Priority Treatment:

Remains Basic search protocol.

12

 

Late-Stage

13

Infocard mailing

Same High and Low treatment as Interim Phase, but cases may change priority rank order.

CATI calls made in priority rank order.

14

Telephone prompting

High Priority Treatment:
• $30 incentive (in U.S.) with infocard
• Infographic 4-color letter with Quex #2 mailing

Low Priority Treatment:
• No incentive with infocard
• Plain letter with Quex #2 mailing (in U.S.)
• Infographic 4-color letter only w/o Quex (outside U.S.)

15

Reminder email

16

Questionnaire #2

17

Reminder PC

18

 

Last Chance

19

Reminder w/token (M&E)

Same High and Low treatment as Late-Stage Phase, but cases may change priority rank order.

CATI calls made in priority rank order.

20

Telephone prompting

High Priority Treatment:
• More CATI calls

Priority Mailing envelope



Low Priority Treatment:
• Fewer CATI calls
• Final request letter or email (in U.S.)
• Final request email only (outside U.S.)

21

22

CIO offer (letter)

23

CIO offer (email)

24

CIO offer (telephone)

Final Request

25

Final request contact (M&E)

26

Final telephone prompt

M&E = parallel contact of email and USPS letter mailing.


5. CONTACTS FOR STATISTICAL ASPECTS OF DATA COLLECTION

The NCSES contacts for statistical aspects of the SDR data collection are John Finamore, NCSES Chief Statistician (703-292-2258), Lynn Milan, SDR Project Officer (703-292-2275) and Wan-Ying Chang, NCSES Mathematical Statistician and the lead SDR sampling statistician (703-292-2310).

1 For 2023 SDR sampling, URM individuals who report they are Hispanic and any race; non-Hispanic Black, alone; non-Hispanic American Indian/Alaskan Native, alone; or non-Hispanic Native Hawaiian/Other Pacific Islander, alone. In a change from the 2021 SDR design, non-Hispanic multi-race individual will no longer be considered URM.

2 https://ncses.nsf.gov/pubs/nsf23315/report/science-and-engineering-degrees-earned#overall-s-e-degrees-earned-by-underrepresented-minorities.

3 More information about the SDR longitudinal panel sample may be found here: ttps://ncses.nsf.gov/pubs/nsf22326.

4 LexisNexis® is a widely accepted locate-and-research tool available to government, law enforcement, and commercial customers. Searches can be run in batch or individually, and the query does not leave a trace in the credit record of the sample person being located and the input data used for the search are not retained. In addition to updated address and telephone number information, LexisNexis® returns deceased status updates and potential email addresses.

5 See Geographic Mobility: 2020 to 2021 reported by the Census, Table 1-1. General Mobility, by Race and Hispanic or Latino Origin, Region, Sex, Age, Relationship to Householder, Educational Attainment, Marital Status, Nativity, Tenure, and Poverty Status: 2020 to 2021. https://www.census.gov/data/tables/2021/demo/geographic-mobility/cps-2021.html.

8 See the proposed example for self-response data collections in figure 2 of the federal register notice at

https://www.federalregister.gov/documents/2023/01/27/2023-01635/initial-proposals-for-updating-ombs-race-and-ethnicity-statistical-standards. For more information on OMB’s efforts to update the race and ethnicity standards, please see https://spd15revision.gov.

9 See https://doi.org/10.1201/9781315153964.

  



File Typeapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
File TitleLIST OF ATTACHMENTS
Authorwebber-kristy
File Modified0000-00-00
File Created2023-07-29

© 2024 OMB.report | Privacy Policy