NPRA Supporting Statement - Section B

NPRA Supporting Statement - Section B.pdf

NSF Survey of Nonprofit Research Activities (NPRA)

OMB: 3145-0240

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 3145-0240 can be found here:

2017-12-04 - Revision of a currently approved collection

Document [pdf]

Download: pdf | pdf

NSF Survey of Nonprofit Research Activities (NPRA), Supporting Statement Section B

Collection of Information Employing Statistical Methods

The purpose of the NPRA Pilot Survey is to evaluate and refine the NPRA questionnaire, sample
design, and data collection operations in preparation for the main survey implementation. The
questionnaire (see Attachment A) has been tested through cognitive interviews and will be further
evaluated in the pilot. With regard to sampling, the pilot study will examine the estimated
eligibility rates and response rates to develop a more efficient stratification and allocation. The
variability of key measures will be evaluated, including total expenditures for performing R&D
and the amount of funding provided for research purposes. Moreover, the pilot study serves as a
trial run for the data collection procedures for the main survey, allowing a chance to verify that the
procedures work effectively and to modify them as needed.
B.1.

Respondent Universe

Target Population and Sampling Frame
The target population for the NPRA Pilot Survey includes all nonprofit organizations categorized
by the Internal Revenue Service (IRS) as 501(c)(3) public charities, 501(c)(3) private foundations,
and other exempt organizations [e.g., 501(c)(4), 501(c)(5), 501(c)(6)]. As recorded on the IRS
Exempt Organizations Business Master File (November 2015), there are 1.5 million tax-exempt
organizations. Nearly 1.2 million of these organizations filed an information return with the IRS
in the past 24 months. Certain organizations are not required to file an information return (e.g.,
churches), but those that are required to file an information form must file Form 990, 990-N, 990EZ, or 990-PF, according to their organization type and financial size (see Table B.1.1).
Table B.1.1: Criteria for Filing IRS Exempt Organization Forms
Type of Exempt Organization
Public Charity and Other Exempt Organizations:
Gross receipts normally ≤ $50,000
Gross receipts < $200,000, and Total assets < $500,000
Gross receipts ≥ $200,000, or Total assets ≥ $500,000
Private Foundations

Required to file
990-N
990-EZ
990
990-PF

Small organizations, those with gross receipts under $50,000 are allowed to file Form 990-N (“epostcard”), which does not require the organization to report financial data. Nearly half of the
filing organizations filed Form 990-N. The remaining organizations filed Forms 990, 990-EZ, or
990-PF, which require financial reporting.
The financial information on Forms 990, 990-EZ, and 990-PF is captured on the National Center
for Charitable Statistics (NCCS) Core Financial Files. We will use the circa 2013 versions of the
core files to construct the sampling frame for the NPRA Pilot Survey. The files are labeled “circa”

2013 because the data may be from different fiscal years. The due date for 990s is rolling based
on the 15th day of the fifth month after the end of the organization’s taxable year. At the time of
the core file construction, most organizations had filed for their 2013 fiscal year; however, a few
organizations had filed only for FY 2012 or had already filed for FY 2014. Further, if an
organization did not file a return, but is presumed to be still active, NCCS uses the form from the
latest filing. As shown in Table B.1.2, three separate core files are available.
Table B.1.2: NCCS Core Financial Files and Number of Records
Core File
501(c)(3) public charities that file Form 990 or 990-EZ
Other exempt organizations that file Form 990 or 990-EZ
501(c)(3) private foundations (Form 990-PF)
Total

Number of Records
382,401
140,219
95,992
618,612

The core files include a total of 618,612 organizations with unique employee identification
numbers (EIN) that we will use as our sampling frame. Of those, 95,992 represent private
foundations of all sizes that complete Form 990-PF. The other 522,620 represent public charities
and other exempt organizations that filed Form 990 or Form 990-EZ. Typically, these
organizations have gross receipts over $50,000, because those with gross receipts under $50,000
are allowed to file Form 990-N and are not recorded on the core files. The distribution of
organizations in the core files is presented by form type and fiscal year in Table B.1.3.
Table B.1.3: Circa 2013 Financial Core Files: Fiscal Year and Form Type
Total

990

990-EZ

2010

2011

11,169

3,788

5,554

1,827

2012

41,638

18,446

19,137

4,055

2013

473,161

23,2705

160,730

79,726

2014

92,643

44,522

37,738

10,383

Total

618,612

299,461

223,159

95,992

Frame Coverage
Using the core financial files as a sampling frame excludes most public charities and other exempt
organizations with gross receipts under $50,000 (note some file Form 990 or Form 990-EZ).2
While organizations with gross receipts under $50,000 represent 70% of the public charities and
other exempt organizations, they represent approximately 0.1% of the total gross receipts
generated by public charities and other exempt organizations.3 Exclusion of these small
organizations is a limitation because they will not be covered by the sampling frame (i.e.,
undercoverage). Alternatively, the IRS Exempt Organization Business Master File (EO/BMF)
2

Based on the filing requirements code, 126,377 organizations on the core file are not required to file a Form 990, 990-EZ, or 990-PF.

Based on gross receipts on the 2013 Exempt Organization Business Master File. For zero-filers, most of whom are allowed to file Form 990-N
and we have no record of their income, we assumed the maximum threshold ($50,000) to calculate the gross receipts percent. The percentage
was 0.8% with this maximum threshold assumption.

includes all exempt organizations regardless of financial size. However, the financial variables on
the EO/BMF are limited to gross receipts, revenue, and total assets. The additional financial
information available on the core files will be used in developing a propensity score model to
improve the sample stratification (see Attachment F. NCCS Core File Description for a list of the
variables on the core files). Although exclusion of these small organizations is a limitation in
coverage, we believe the likelihood that these organizations are performing and/or funding R&D
is small and the impact on the estimate of the total research expenditures will be negligible.
Another source of undercoverage is from new organizations that have not filed a Form 990, 990-EZ
or 990-PF. Similarly, a source of overcoverage is organizations that are no longer in existence (e.g.,
out of business) at the time of the survey. However, we anticipate that these coverage errors will be
small due to an increase of only 1.1% of organizations between the 2014 and 2015 core files.
Frame Exclusions
The core files include some organizations that are not in the scope of this study, such as
organizations outside the United States, churches, and government organizations. These
organizations are excluded from the sampling frame based on the criteria below. Table B.1.4
presents a summary of the exclusion criteria.
Table B.1.4: Summary of Organizations Excluded From the Frame
All NPOs
618,612

Total Organizations on NCCS Core Files
Exclusions
1) IRS subsection code: Exclude organizations with subsection code not equal to 3, 4, 5, or 6 4
2) Foundation code: Exclude organizations with foundation code 10=Church or
14=Government
3) State: Exclude organizations outside the U. S.

57,282
6,613
1,321

4) North American Industry Classification System (NAICS): Exclude organizations with
NAICS code 92=Public Administration

5) “Out of Scope” based on NCCS Coding5

Most exclusions were based on the following 501(c) subsection codes: 02 - Title holding corporations for a tax-exempt organization; 07- Social
and recreational clubs; 08- Fraternal beneficiary societies and associations; 09 - Voluntary employees' beneficiary associations; 10 - Domestic
fraternal societies; 12 - Benevolent life insurance associations, mutual ditch or irrigation companies, etc.; 13 - Cemetery companies, providing
burial and incidental activities for members; 14 - State-chartered credit unions, etc.; and 19 - Post or organization of war veterans. The full list
of the 501(c) subsection codes is available on the data dictionary for the NCCS Core Financial files, available at
http://nccsweb.urban.org/PubApps/showDD.php#Core%20Data.

NCCS assigns a “G” identifying organizations that are considered government entities and “N” identifying organizations that have a physical
IRS contact address, but nearly all program operations are conducted/focused outside the U.S. A total of 33 cases had a code of N or G. One
organization coded as G was retained as a hospital: Christiana Care Health Services. The remaining 32 were eliminated.

Response Rate
One of the main objectives of this testing is to determine how to maximize response rates. Response
rates will be calculated using the American Association for Public Opinion Research standard
definitions.6 This survey was last conducted in 1996-97 and yielded a 41% response rate. However,
the 1996-97 methodology was based on a mail survey, while the current iteration is a web survey.
See section B.3. for a discussion of the methods we plan to use to maximize response rates for the
pilot survey.
B.2

Statistical Methodology

Identifying Likely Performers and Likely Funders of R&D
One of the biggest challenges with this research is the expectation that organizations performing
and/or funding R&D will be rare and difficult to target. We use two strategies to increase the
sampling efficiency of locating performers and funders: (1) frame truncation, and (2) stratification.
Both the frame truncation and stratification use the core financial information and classification
codes to identify organizations that are more likely to perform or fund research.
To identify the core financial variables related to performing and/or funding research and evaluate
the impact of frame truncation, we identified a set of “likely performers” and a set of “likely
funders” on the frame. The likely performers and likely funders are a subset of organizations
identified from auxiliary sources that strongly indicate that they are performing or funding
research. These sources identified 1,655 likely performers and 1,116 likely funders. The auxiliary
sources and the process for matching to the frame are described in Attachment G. Likely Performer
and Funder Sources.
Organizations not identified as a likely performer or a likely funder will be referred to as
“unknown.” The unknown organizations will be a mix of those performing or funding research
(but not flagged via the auxiliary sources), as well as those not conducting any R&D. Whereas, we
expect nearly all of the organizations flagged as a “likely” performer and/or funder to be
conducting R&D. By comparing the organizations identified as “likely” performers or funders
with organizations of unknown status, we will identify characteristics associated with R&D and
use this information to stratify the organizations.
Table B.2.1 presents the likely performer and likely funder organizations by form type. There were
102 organizations that were classified as both a likely performer and a likely funder. Most of these
organizations (96) filed Form 990 while two filed 990-EZ, and four filed 990-PF. A large majority
of the likely performers filed Form 990. However, the likely funders are split between Form 990
(60%) and Form 990-PF (39%).

(AAPOR, 2015) The American Association for Public Opinion Research. 2015 Standard Definitions: Final dispositions of case codes and
outcome rates for surveys. 8th edition. Oak Terrace, IL: Author.

Table B.2.1: Likely Performers and Likely Funders by Form Type

Form 990

257,969

Likely
Performer
1,434

Form 990-EZ

199,546

Form 990-PF

95,835

668

702

Total

553,350

1,539

102

1,011

2,652

All NPOs

Likely
Funder
336

Likely Performer
or Funder
1,866

Both

1,641

Likely Performers

1,113

Likely Funders

Financial Truncation
To increase the efficiency of reaching organizations that perform or fund research, we will impose
a financial threshold. Table B.2.2 presents the mean revenue, assets, and expenses for the likely
performers and the likely funders, as well as those not identified as either.
Table B.2.2: Mean Revenue, Assets, and Expenses

Likely Performer (n=1,641)

Likely Funder (n=1,113)

Not likely performer or funder (n=550,698)

Total Revenue

Mean
$112,281,164

Total Assets

$187,083,005

Total Expenses

$104,143,339

Total Revenue

$53,710,685

Total Assets

$310,315,452

Total Expenses

$33,886,742

Total Revenue

$3,170,711

Total Assets

$6,417,830

Total Expenses

$2,863,612

We examined cut-offs based on revenue, assets, and expenses. For each variable, we evaluated the
percentage of organizations remaining in the frame and the percentage of likely performers or
funders remaining in the sample. The goal is to retain organizations that are likely to be performing
or funding research, while eliminating organizations that are unlikely to be performing or funding
research. Therefore, we would like the percentage of likely performers in the frame to remain high,
while reducing the number of total organizations on the frame, thus increasing the density of likely
performers. We ran the financial truncation separately for organizations filing Form 990-PF (private
foundations) and those filing Forms 990 or 990-EZ (public charities and other exempt organizations).
Forms 990 or 990-EZ
Table B.2.3 compares the three financial measures where the percentage of likely performers or
funders is 90%, as well as 85%. Expenses and revenues perform better than assets in reducing the
overall organizations, while maintaining a high percentage of likely performers. An expenses

threshold of $260,000 or more removes 65% of the overall organizations, yet 90% of the likely
performers and funders remain in the frame. An expenses threshold of $460,000 or more removes
74% of the total organizations, while maintaining 85% of the likely performers and funders.
Similarly, a revenues cut-off of $250,000 reduces the number of overall organizations by 63% while
retaining 90% of the likely performers and funders; a revenues cut-off of $490,000 reduces the
number of overall organizations by 74% while retaining 90% of the likely performers and funders.
Table B.2.3: Financial Thresholds to Achieve 90% and 85% of Likely Performers in the Frame:
Forms 990 or 990-EZ

Financial Threshold
No Threshold
90% Threshold
Total Expenses >$260,000
Total Revenue > $250,000
Total Assets> $180,000
85% Threshold
Total Expenses >$460,000
Total Revenue > $490,000
Total Assets> $380,000

Likely
Performers/Funders
N
%
1,950
100%

990, 990-EZ
Organizations
N
%
457,515
100%

1,747
1,747
1,751

90%
90%
90%

159,423
167,423
208,919

35%
37%
46%

1,650
1,651
1,650

85%
85%
85%

118,568
118,423
157,183

26%
26%
34%

To find a balance between the density of R&D performers/funders and the coverage of R&D
performers/funders, we measured the percentage decrease in organizations when increasing to
threshold (t) from the previous threshold (t-1) difft = (pt-1 - pt), where t runs from $10,000 to $1
million in increments of $10,000. Then we modeled the change as a function of the increase in t
for likely performers or funders and for all organizations. If the slopes between all organizations
and the likely performers and funders were significantly different at 0.05 testing level, we
eliminated organizations that did not meet threshold t. We repeated this process until the slopes
were no longer significantly different at 0.05 testing level. The threshold where this occurred
became the final threshold.
At a threshold of $520,000 in expenses, 84% of likely performers remain in the frame, while only
24% of all organizations remain in the frame. For thresholds beyond this point, the percentage
decrease in likely performer or funder organizations is no different than the percentage decrease
for all organizations. Similarly, a threshold of $470,000 in revenues results in 85% of likely
performers or funders remaining and 26% of all organizations remaining. After these points, there
is no additional gain in efficiency.
Revenues and expenses perform similarly in increasing the frame efficiency. Table B.2.4 presents
a cross-tab of the revenues and expenses cut-offs (both rounded to $500,000). Only 18 likely
performers/funders met the expenses threshold, but did not meet the revenues threshold. Similarly,
only 27 likely performers/funders met the revenues threshold, but did not meet the expenses
threshold. Given the high concordance of these two measures and the higher density of likely

performers/funders for expenses (calculated from Table B.2.4), we will use $500,000 in expenses
as the financial threshold, which includes 84% of the likely performers/funders and 24% of all
organizations. The 113,297 organizations that meet these criteria represent 97% of total revenue,
98% of total expenses, and 96% of total assets.
Table B.2.4: Comparison of Financial Thresholds: Expenses>=$500,000 and Revenue>=$500,000

Expenses<$500,000

All organizations
Likely performers/funders

Expenses>=$500,000

All organizations
Likely performers/funders

Revenue<$500,000

Revenue>=$500,000

335,930

8,288

285

4,522

108,775

1,620

Form 990-PF
Table B.2.5 compares the three financial measures where the percentage of likely funders is 90%,
as well as 85%. Assets perform better than revenue and expenses in reducing the overall
organizations while maintaining a high percentage of likely performers or funders. An assets
threshold of $1,650,000 or more removes 74% of the total organizations, while maintaining 90%
of the likely funders. Similarly, a threshold of $2,750,000 eliminates 86% of the overall
organizations, while maintaining 85% of the likely performers/funders.
To determine the point where no further efficiency is gained, we measured the percentage decrease
in organizations when increasing to threshold (t) from the previous threshold (t-1) difft = (pt-1 pt), where t runs from $50,000 to $10,000,000 in increments of $50,000. Then we modeled the
change as a function of the increase in t for likely performers/funders and for all organizations. If
the slopes between all organizations and the likely providers were significantly different at 0.05
testing level, we eliminated organizations that did not meet threshold t. We repeated this process
until the slopes were no longer significantly different at 0.05 testing level. The threshold where
this occurred became the final threshold. The point where the slopes are no longer significantly
different occurs at a threshold of $2,750,000 in total assets. After this point, the percentage of
likely funder organizations removed is not significantly different from the overall organizations
removed when increasing the assets threshold. This financial threshold includes 88% of the likely
funders and 19% of all organizations. The 18,107 organizations that meet this threshold represent
90% of total revenue, 93% of total expenses, and 94% of total assets.

Table B.2.5: Financial Thresholds to Achieve 90% and 85% of Likely Performers or Funders in the Frame:
Form 990-PF
Financial Threshold
No Threshold
90% Threshold
Total Expenses >$20,000
Total Revenue > $270,000
Total Assets> $1,650,000
85% Threshold
Total Expenses >$50,000
Total Revenue > $310,000
Total Assets> $3,700,000

Likely Funders
N
%
702
100%

All organizations
N
%
95,835
100%

633
633
632

90%
90%
90%

29,662
25,366
25,054

31%
26%
26%

594
596
597

85%
85%
85%

17,588
18,997
14,854

18%
20%
16%

Stratification
After financial truncation, the frame includes 113,297 Form 990 and Form 990-EZ organizations
and 18,107 Form 990-PF organizations. Table B.2.6 includes the number of organizations meeting
the financial threshold by form type.
Table B.2.6: Number of Organizations Meeting the Financial Truncation Threshold
Form 990
Form 990-EZ
Form 990-PF

All Organizations
113,216
81
18,107

Likely Performers/ Funders
1,637
1
615

To increase the sampling efficiency, we will stratify the organizations based on frame variables
associated with R&D. The stratifying variables will be codes available on the frame, such as
National Taxonomy of Exempt Entities (NTEE) codes (e.g., hospitals, research institutes), as well
as a propensity score measuring the likelihood that the organization is performing or funding
research.
The propensity score is developed from logistic regression models, with likely performer or funder
as the outcome, and codes and financial characteristics from the core files as the predictor
variables. The propensity scores will be used to classify organizations into density strata, where
organizations with high propensity scores will be classified into the high-density stratum, and the
organizations with low propensity scores will be classified into the low-density stratum. Because
we expect the percentage of performers or funders in the high-density stratum to be higher than
the low-density stratum, we will oversample the high-density stratum to increase the likelihood of
sampling a performer or funder.
The financial fields available on the core files depend on the IRS form filed by the organization.
There are 73 financial fields available for organizations filing Form 990, with a subset of 42
available for those filing Form 990-EZ. Further, the 106 financial variables for the organizations
filing Form 990-PF are largely different from those filing Forms 990 or 990-EZ, although there

are 12 variable equivalents across the form types. The variables available on the core files are
presented in Attachment F. NCCS Core File Description. Due to the difference in available
information across the three forms, we will build two separate models, one for Form 990
organizations (public charities and other tax exempt organizations) and one for Form 990-PF
organizations (private foundations). We will exclude the Form 990-EZ organizations from the
frame. After the financial truncation, there were only 81 organizations that filed Form 990-EZ,
including a single likely performer. Because all of the financial variables are not available from
Form 990-EZ, excluding these organizations from the frame simplifies the stratification. The
propensity score models are presented in Attachment G. Propensity Models for Performer and
Funder Stratification.
As described in the following sections, the propensity score strata will be combined with other
stratifiers to form the final stratification (see Table B.2.7).
Table B.2.7: Final Stratification for the Pilot
Strata
1. Likely performers or funders

2.
3.
4.

Hospitals (NTEE = E2 )
Research organizations7
Form 990 organizations

Form 990-PF organizations

Substrata
a. Both a likely performer and funder
b. Likely performer, not likely funder
c. Likely funder, not likely performer

a.
b.
c.
a.
b.
c.

High density (top decile of propensity scores)
Medium density (deciles 8, 9)
Low density (deciles 1–7)
High density (top decile of propensity scores)
Medium density (deciles 7–9)
Low density (deciles 1–6)

Density Stratification
Using the propensity scores resulting from the final models, we classify organizations into three
strata for Forms 990 and 900-PF: 1) high, 2) medium, and 3) low density. The stratification is based
on the deciles with high density representing the organizations with the highest propensity scores.
Our recommended density stratification is below. As seen in Table B.2.8, the highest stratum for
Form 990 includes 67% of the likely performers and 75% of the likely funders; the medium density
stratum represents nearly 20% of likely performers and 15% of likely funders. The Form 990-PF
model results in 46% of likely funders in the highest high stratum and 34% in the medium stratum.
Although there are few likely performer organizations, most fall in the high stratum.

Research organizations are defined as 1) NTEE major group = H-Health - Disease Specific (research) or U-Science and Technology; 2)
Common Code = 05-Research Institutes and/or Public Policy Analysis; and 3) Non-Private Foundation Reason Code = 05-Medical Research.

Table B.2.8: Deciles for the Likely Performer and Funder Models
Decile

Total
Orgs

High

10,922

Medium

10,923

175

1.6%

141

10,923

108

1.0%

10,923

Form
990

Likely
Performers
n
%
818
7.5%

Likely Funders
n
289

%
2.6%

1.3%

0.4%

0.8%

0.2%

0.6%

0.5%

0.1%

0.5%

0.4%

0.1%

10,923

0.4%

0.3%

0.0%

10,923

0.2%

0.0%

10,923

0.1%

0.0%

10,921

0.1%

0.0%

11,494

0.1%

0.0%

High

1,809

271

13.6%

0.6%

263

13.3%

Medium

1,810

108

4.8%

0.1%

106

4.7%

1,810

2.9%

0.1%

2.8%

1,809

2.4%

0.0%

2.4%

1,810

1.6%

0.1%

1.5%

1,810

1.7%

0.0%

1.6%

1,809

0.9%

0.0%

0.9%

1,810

0.9%

0.0%

0.8%

1,810

0.7%

0.0%

0.7%

1,810

0.2%

0.0%

0.2%

Low

990-PF

Likely Performer
or Funder
n
%
1028
9.4%

Low

Final Stratification
The final stratification will be based on classification codes available on the core files, the
propensity score, and the likely performer and likely funder flags. The likely performer and likely
funder flags have been used to evaluate classification codes and financial variables highly
associated with research performance and funding.
As shown in Table B.2.9, a number of classification codes on the core files are strong predictors
of likely performer or funder. Because of the high percentage of likely performers and funders
found in these four categories of organizations, we will stratify these organizations into a
“Research organization” stratum. We will then stratify all other organizations on the basis of the
density stratum described below in Table B.2.10.

Table B.2.9: Classification Codes Identifying Research Organizations
Code
NTEE major group

Common Code

Value
H-Health-Disease Specific
(research)
U-Science and Technology
05-Research Institutes and/or
Public Policy Analysis

Non-Private Foundation
Reason Code

05-Medical Research

Total
Orgs

Likely
Performers
or Funder

Likely
Perform
%

Likely
Fund
%

846

194

16.5%

7.9%

826

196

22.0%

4.4%

897

103

10.5%

2.0%

183

33.9%

2.2%

Table B.2.10: Stratification for Pilot Survey

3,428

Likely
Performers
or Funder
111

Likely
Perform
%
3.1%

Likely
Fund
%
0.2%

All

2,565

491

16.1%

4.7%

High

8,765

558

4.7%

2.1%

Medium

21,635

275

1.0%

0.3%

Low

76,947

215

0.2%

0.1%

High

1,758

262

0.5%

14.6%

Medium

5,394

211

0.0%

3.9%

Low

10,831

129

0.0%

1.1%

Density
Stratum

Total
Orgs

Hospitals

All

Research organizations

Organization Stratum

Other 990 Organizations

Other 990-PF Organizations

Sample Allocation
The pilot study has two primary sampling objectives. First, obtain 200 survey responses from
performers and 200 from funders. Second, allocate sufficient sample size to each stratum to
estimate the incidence of performers and funders by stratum to optimize the sample allocation for
the full study. On the basis of these two sampling objectives, we established two criteria:
1) Obtain 200 performer surveys and 200 funder surveys
2) Minimum sample sizes of 180 returned surveys for each density stratum and for research
organizations
In order to limit respondent burden on this pilot study, but still achieve a spectrum of responses
from nonprofit organizations, the maximum sample size that could be selected was constrained to
4,000 total organizations. For clarity, we will use the following terminology in describing the
sample allocation:
1) Selected sample—organizations selected from the sampling frame
2) Survey responses—survey responses from an organization that may or may not be a
performer or funder

3) Performer/funder surveys— survey responses from an organization that identifies as a
performer or funders
To determine the sample allocation for the pilot, we made some assumptions about the percentage
of performers and funders in each stratum. We assumed high incidence for the organizations
flagged as likely performers or funders; moderate incidence for those identified as research
institutes and those in the high-density stratum; and low incidence for those in the medium- and
low-density strata. On the basis of the assumed incidence of performers and funders, we estimated
the stratum population sizes for performers (P) and funders (F). The incidence assumptions and
estimated population sizes are presented in Table B.2.11.
We also used the assumed incidence to estimate the relative cost per performer/funder survey in
each stratum (C). All costs are scaled to the expected performer/funder surveys in the Likely
Performer and Funder stratum. In this stratum, we expect 1.5 performer/funder surveys per every
survey response (note that we allow an organization that funds and performs research to count
toward the performer and funder sample size). In the Likely Performer stratum, we expect 0.75
performer/funder surveys per survey response. Therefore, the cost ratio is 2:1. The cost ratios per
stratum are presented in Table B.2.11.
Table B.2.11: Pilot Stratification and Sample Allocation Assumptions
Stratum
Likely funders and performers
Likely performers (not funders)
Likely funders (not performers)
Hospitals
Research organizations
High
Form 990
Organizations
Medium
Low
High
Form 990-PF
Organizations
Medium
Low
Total

Total on
Frame (N)
99
1,257
896
3,317
2,074
8,207
21,360
76,732
1,496
5,183
10,702
131,323

Assumed
Perform
%
75.0%
75.0%
0.0%
3.0%
20.0%
7.5%
1.5%
0.3%
0.5%
0.0%
0.0%

Assumed
Fund
%
75.0%
0.0%
75.0%
0.0%
5.0%
2.5%
0.5%
0.3%
20.0%
5.0%
1.5%

Total
Performers
(P)
74
943
0
100
415
616
320
230
7
0
0
2,705

Total
Funders
(F)
74
0
672
0
104
205
107
230
299
259
161
2,111

Relative
Cost
(C)
1.0
2.0
2.0
50.0
6.0
15.0
75.0
250.0
7.3
30.0
100.0

On the basis of the relative cost per performer/funder survey and the estimated performer and
funder population sizes, we allocated the sample to strata in three steps.
First, we optimally allocated the sample to the strata based on a combined measure of performers
and funders. We weighted funders by a factor of 1.5 because fewer of them are in the frame. We
allocated the sample to the strata to obtain a total of 370 performer/funder surveys. Based on our
assumptions, we expect this to result in approximately 200 performers and 200 funders (30 of
which do both).

The number of completed interviews allocated to stratum h is:
𝑛ℎ = 370 × (𝑃ℎ + 1.5 × 𝐹ℎ )/√𝐶ℎ ⁄∑ (𝑃ℎ + 1.5 × 𝐹ℎ )/√𝐶ℎ
ℎ

Second, we calculated the sample size required to obtain the number of completed interviews based
on the assumed performer and funder incidences in each stratum and a response rate of 65%. Although
we strive to maximize response to the survey (refer to B.3), we used a conservative response rate of
65% for planning the sample for the pilot survey. Because this is a new data collection, one of the
objectives of this pilot survey is to measure stratum incidences and response rates. Using a conservative
response rate increases the probability achieving or exceeding the sample size goals.
Finally, we adjusted the sample sizes to provide smaller strata with additional sample to measure
the percentage of performers and funders. For the Research Organization stratum and the highdensity 990-PF stratum, we increased the sample size to target 180 survey responses. To
counterbalance the increase in these strata, we reduced the sample size in the likely performer and
likely funder strata. We also decreased the sample size for the Form 990 Organizations, low- and
medium- density strata to obtain 400 completed surveys. The number of organizations in these
strata is large, but the expected percentage of performers and funders is low. Reducing the sample
size allows us to measure response rate and incidence, yet not consume resources where the yield
of funders and performers is expected to be low. Table B.2.12 presents the sample allocation.
Table B.2.12: Optimal Allocation and Expected Number of Performer/Funder Surveys

Stratum

Optimal
Performer/
Funder
Allocation
(nh)

Sample Selection
Optimal

Adjustment

Survey Responses
Total

Total

Performer

Funder

Likely funders and
30
60
60
40
30
30
performers
Likely performers (not
105
210
-10
200
130
100
funders)
Likely funders (not
110
225
-25
200
130
100
performers)
2
115
115
75
2
Hospitals
35
220
60
280
180
35
9
Research organizations
High
35
560
560
365
25
9
Form 990
Organizations Medium
9
655
-40
615
400
6
2
Low
5
1,435
-820
615
400
1
1
High
25
195
85
280
180
1
35
Form 990-PF
Organizations Medium
10
335
335
220
10
Low
4
380
380
245
4
Total
370
4,390
-750
3,640 2,365
200
200
Notes:
1) Selected sample—organizations selected from the sampling frame
2) Survey responses—survey responses from an organization that may or may not be a performer or funder
3) Performer/funder surveys— survey responses from an organization that identifies as a performer or funders

Sample Selection
The sample will be a systematic (1-in-k) random sample of organizations within each stratum.
Organizations will be selected with equal probability. The organizations will be implicitly stratified
by size before selecting the systematic sample to ensure the sample is proportionally distributed by
size. The organizations in each strata will be sorted by total expenditures for all strata.
Estimation
Weighting

Estimates from the NPRA Pilot Survey will be weighted to be representative of nonprofit
organizations in scope for the survey. These weights will apply to all estimates derived from the
survey data, such as total expenditures on R&D. The weighting process will begin with the
computation of sampling weights for all selected organizations. The sampling weight is calculated
as the number of organizations in the frame (Nh) divided by the number of organizations selected
(mh) in stratum h, or the reciprocal of the probability of selection, W1 = Nh/mh. The base weight
corrects for differential sampling rates for the strata.
We will then adjust for non-response within each stratum. The stratum non-response adjustment
will be a ratio adjustment where the respondents (r) will be weighted to account for the nonrespondents (nr). The base weight is used to calculate the adjustment factor,
𝑓1 = (∑𝑟+𝑛𝑟 𝑊1)⁄(∑𝑟 𝑊1). The base weight is adjusted by the non-response factor, W2 = W1
* f1.
We will do a second non-response adjustment to increase the sample representativeness relative to
the data on the frame (i.e., classification codes and financial variables). This will be based on a
propensity score adjustment. For this adjustment, first, we build a logistic regression model with
survey response as the outcome (1=respond, 0=no response). The outcome will be modeled based
on the frame data. Respondents and non-respondents will be grouped into quintiles on the basis of
their response propensity score. Within each quintile, the respondents (r) will be weighted to
account for the non-respondents (nr), 𝑓2 = (∑𝑟+𝑛𝑟 𝑊2)⁄(∑𝑟 𝑊2). The non-response adjusted
weight is W3 = W2 * f2. The non-response adjustment will be conducted for Form 990 and Form
990-PF separately.
Finally, the weights will be poststratified and ratio adjusted to match the total number of nonprofit
organizations on the frame. NTEE codes will be the primary poststratification variable. With this
approach, the adjustments will ensure that the analysis weights sum to known frame population
totals by NTEE code.
Estimators

We will use the pilot data to evaluate estimators of research expenditures. The estimators will build
off the weighted estimate of the total using the final weights from above.

Let
i = 1…n represents the sample of responding organizations
𝑊𝑖 = Final weight
𝑦𝑖 = Research expenditures from survey response
𝑥𝑖 = Auxiliary variable available on the frame (e.g., total expenditures)
An estimate of the total research expenditures is expressed as 𝑌̂ = ∑𝑛𝑖=1 𝑊𝑖 ∙ 𝑦𝑖 . Building on this,
we will compute a ratio estimator, 𝑌̂𝑅 = (𝑌̂⁄𝑋̂) ∙ 𝑋, where 𝑋̂ is the weighted total of the auxiliary
variable for the sample of organizations and X is the frame total of the auxiliary variable. Similarly,
we will compute a regression estimator, 𝑌̂𝐿𝑅 = 𝑌̂ + 𝑏(𝑋 − 𝑋̂), where 𝑏 = 𝑐𝑜𝑣(𝑥, 𝑦)⁄𝑣𝑎𝑟(𝑥). We
will examine these estimates based on the variance, bias, and mean squared error.
Variance estimation

We will use successive difference replication (SDR) (Fay, Train, 1995), a replication variance
method based on a variance estimator for systematic samples presented by Wolter (1985):
𝑣2 = (1 − 𝑓)⁄(2𝑛(𝑛 − 1)) ∑𝑛𝑗=2(𝑦𝑗 − 𝑦𝑗−1 )2
SDR will result in R sample replicates where each organization is weighted by a replicate factor
of 1, 1.7, or 0.3. For each sample replicate, we will construct replicate weights as described above.
The variance estimate for survey estimate (e.g., total research expenditures) is 𝑣̂ =
(4/𝑅) ∑𝑅𝑟=1(𝑌̂𝑟 − 𝑌̂)2 , where 𝑌̂𝑟 is the estimate based on replicate r.
B.3

Methods to Maximize Response Rates and to Deal with Issues of Nonresponse

The methodology includes features that promote increased response to surveys, including multiple
contact attempts and personalized communications, the two features most effective in increasing
response rates (Cook, Heath, Thompson, 2000). In addition, the methodology includes multiple
modes of contact (e-mail, mail, telephone); tailoring and varying the message of contact attempts;
short and to-the-point communications; clear and detailed instructions for responding to the
survey; and easy survey access for multiple respondents within the organization. These features
are all associated with increased response rates (Dillman, Smyth, Christian, 2008).
The initial contact for the selected nonprofit organizations will be a personalized prenotification
letter sent to the head of the organization. This initial correspondence allows respondents to take
immediate action by going to the survey URL and entering their organization’s unique
identification number without having to wait for additional communication from NSF. This
immediate call to action may improve initial response and overall organizational engagement.
For authenticity of the study, the letter will be on National Science Foundation letterhead and
signed by the Director of the National Center for Science and Engineering Statistics. Enclosed in
the letter will be an information sheet with definitions that describe the types of information we

will be asking, as well as examples of persons in the organization who might assist with the
completion of the instrument (e.g., finance department, research department).
A survey invitation will be sent by e-mail approximately 1 week after the prenotification letter.
The invitation will contain a unique login for the organization. The primary contact will be able to
create his or her username and password for easy access to the survey. In addition, the organization
can create unique logins for an additional two (total of three per organization) people in the
organization to facilitate the data collection from multiple staffers. The invitation assures the
respondent that the information will be protected on a secured system.
We will send multiple reminders by mail and e-mail. Each reminder will contain a link and
instructions for accessing the survey. To maximize response, each reminder will have a slightly
different appeal to the respondent to participate. The first reminder will be by e-mail with a version
for non-responders and a second version for those who have started, but not completed the survey.
The focus of both of these reminders is that we “need your help.” The second reminder will be a
postcard sent by mail. The postcard will be the picture of a calendar with the deadline date
highlighted, appealing to respondents that time is running out. The third reminder will be an e-mail
with a version to non-responders and a version to those who have started, but not completed the
survey. The appeal to the non-responders is that we are trying to “offer you the opportunity to
participate.” The appeal for those who have started, but not completed the survey is to thank them
for the help so far and request that they complete it by the deadline. The final reminder will be by email focusing on the pending deadline and the importance of the information “to develop
comprehensive statistics on research expenditures in the U.S.” The last reminder will also bear the
signature of the NSF project officer in an effort to add to the urgency and authenticity of the request.
After the initial deadline, NSF and its contractor, ICF International, will examine the performance
of the sample strata. This phase will include multiple contact attempts by telephone and e-mail.
The first follow-up attempt is by telephone, requesting both organizations that have partially
completed and those that have not responded to log in and complete the web survey. Project staff
will use a five-attempt protocol to reach each of these organizations, leaving a voice mail message
on the second or third attempt. That telephone call will be followed by an e-mail signed by the
NSF project officer with a link and instructions for accessing the survey. A third and final followup attempt will be made by telephone, using the same five-attempt protocol.
Nonresponse Adjustment
To mitigate the risk of non-response bias, we will develop weighting adjustments to increase the
sample representativeness relative to the data on the frame. The information on the frame includes
classification codes (e.g., NAICS and NTEE) and financial variables (e.g., total expenses and
revenues). As part of the non-response adjustments, we will conduct an analysis of frame variables
that are related to non-response. This analysis will be based on response propensity scores using a
logistic regression model with survey response as the outcome (1=respond, 0=no response). The
outcome will be modeled based on the frame data. This analysis provides an evaluation of
representativeness based on the frame data, which will be quantified in the form of an R-indicator

as described by Schouten et al (2009). The R-indicator measures the variability of the propensity
scores (𝑝), 𝑅 = 1 − 2𝑆(𝑝). Values close to 0 indicates weak representativeness and values close
to 1 indicate strong representativeness, relative to the independent variables used in the model.
Debriefing Interviews
After the data collection period is over, we will conduct 40 debriefing interviews. Twenty of these
interviews will be conducted with organizations that have completed the questionnaire and focus
on the survey content and process. The other 20 interviews will be conducted with organizations
that chose not to respond. These non-respondent interviews will examine why the decision was
made not to respond. Specific issues that will be covered are general attitudes toward the survey;
whether the organization received the contacts, and, if not, which modes did they not receive;
whether the contacts were sent to the right person; and the frequency of attempts. Information from
the debriefing interviews will be used to refine the data collection strategy for the main survey.
B.4.

Tests of Procedures and Methods

The purpose of the pilot study is to evaluate the planned procedures for the main survey. No
additional methodological tests are planned for the pilot.
B.5.

Individuals Consulted on Technical and Statistical Aspects of Design

The NPRA Survey will be conducted by NSF's National Center for Science and Engineering
Statistics. ICF International is the contractor in charge of data collection. The name, title,
affiliation, and telephone numbers for those consulting on the project are below.
John Jankowski
Program Director, RDS Statistics
National Science Foundation
(703) 292-7781

Randal ZuWallack
Sr Statistician
ICF International
(802) 264-3724

Ronda Britt
Project Manager, NPRA Survey
Survey Statistician
National Science Foundation
(703) 292-7765

Dr. William Bryan Higgins
Sr Methodologist
ICF International
(703) 934-3498

Jock Black
Mathematical Statistician
National Science Foundation
(703) 292-7802

Adam Lee
Statistician
ICF International
(301) 572-0814

Rebecca Morrison
Survey Statistician
National Science Foundation
(703) 292-7794

Arlen Rosenthal
Project Manager
ICF International
(301) 572-0222

Michael Gibbons
Survey Statistician
National Science Foundation
(703) 292-4590

Dr. Ronaldo Iachan
Sr Statistician
ICF International
(301) 572-0538

Bibliography
Cochran, WG. 1977. Sampling Techniques. New York: John Wiley & Sons.
Cook C, Heath F, Thompson R. 2000. A Meta-Analysis of Response Rates in Web- or InternetBased Surveys. Educational and Psychological Measurement, 821–836.
Dillman DA, Smyth, JD, Christian, LM. 2008. Internet, Mail, and Mixed-Mode Surveys: The
Tailored Design Method. Hoboken, New Jersey: John Wiley & Sons.
Fay RE, Train GF. 1995. Aspects of survey and model-based postcensal estimation of income
and poverty characteristics for states and counties. Proceedings of the Section on
Government Statistics, pp. 154–159. American Statistical Association.
Pettijohn SL. 2013. The nonprofit sector in brief: Public charities, giving, and volunteering,
2013. Available at http://www.urban.org/research/publication/nonprofit-sector-briefpublic-charities-giving-and-volunteering-2013.
Salamon LM. 2015. A profile of the nonprofit sector in the United States. In National Research
Council (C. House, H. Rhodes, & E. Sinha, Rapporteurs, Committee on National
Statistics, Division of Behavioral and Social Sciences and Education). Measuring
Research and Development Expenditures in the U.S. Nonprofit Sector: Conceptual and
Design Issues, Summary of a Workshop, pp. 9–26. Washington, DC: The National
Academies Press.
Schouten B, Cobben F, Bethlehem J. 2009. Indicators for the representativeness of survey
response. Survey Methodology, 101–113.
Wolter KM. 1985. Introduction to Variance Estimation. New York: Springer-Verlag.

Attachments
A. 2016 Pilot Survey of Nonprofit Research Activities
B. Pilot Survey Correspondence
 Contact 1: Prenotification Letter with 1-Page Handout [Mailing]
 Contact 2a: Invitation [E-mail] High Likelihood
 Contact 2b: Invitation [E-mail] Moderate/Low Likelihood
 Contact 3a: Reminder 1 [E-mail] Follow-up/Non-responders
 Contact 3b: Reminder [E-mail] Reminder to Complete
 Contact 4: Reminder 2 [Mailing]
 Contact 5a: Reminder 3 [E-mail] Non-responders
 Contact 5b: Reminder 3 [E-mail] Reminder to Complete
 Contact 6: Final Reminder 4 [E-mail]
 Contact 7a: Incomplete Follow-up 1 [Telephone]
 Contact 7b: Non-response Follow-up 1 [Telephone]
 Contact 8: Follow-up 2 [E-mail]
 Contact 9: Final Follow-up 3 [Telephone]
 Contact 10: Follow-up 4 [E-mail]
 Contact 11: Follow-up 5 [Telephone]
 Contact 12: Final Follow-up 6 [E-mail]
 Contact 13: Submittal Acknowledgement [E-mail]
 Contact 14: Thank you [Letter]
C. Comment Letter from Andrew Reamer, George Washington University
D. Debriefing Interview Protocols
 Protocol for Debriefing Interview with Respondents
 Protocol for Debriefing Interview with Non-respondents
E. Debriefing Interview Correspondence
 Contact 1a: E-mail to NPRA Respondent
 Contact 1b: E-mail to NPRA Non-respondent
 Contact 2a: NPRA Debriefing Interview with Respondents Recruitment Script
 Contact 2b: NPRA Debriefing Interview with Non-respondents Recruitment Script
 Contact 3a: Respondents Confirmation Letter
 Contact 3b: Non-respondents Confirmation Letter
 Contact 4a: Respondent Electronic Meeting Invitation
 Contact 4b: Non-respondent Electronic Meeting Invitation
 Contact 5: Thank you letter on NSF Letterhead
F. NCCS Core File Description
 Variables – Private Charities and Other
 Financial Means – Private Charities and Other
 Variables – Private Foundations
 Financial Means – Private Foundations
G. Likely Performers and Funder Sources
H. Propensity Models for Performer and Funder Stratification

File Type	application/pdf
Author	Jolene Smyth
File Modified	2016-05-05
File Created	2016-05-05