Supporting Statement

0693-0033 TREC Supporting Statement_revised 11-5-09.doc

Generic Clearance for Program Evaluation Data Collections

Supporting Statement

OMB: 0693-0033

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 0693-0033 can be found here:

Document [doc]

Download: doc | pdf

NIST-RTI Economic Impact Assessment of the
Text Retrieval Conferences (TREC)

1. Explain who will be surveyed and why the group is appropriate to survey.

RTI International^¹ will conduct a survey of members of the information retrieval (IR) researcher community to collect data to estimate the economic impact of the NIST TREC Program. The target population for this survey is computer engineers and scientists who are conducting IR research in commercial IR firms and academic institutions. These individuals are the appropriate group to survey because they would be the most able to determine whether the technical accomplishments of TREC have had an impact on either the research they conduct or the set of products and services that use IR research. TREC was initiated in 1992 as a means to foster IR research, and its technical accomplishments include the following:

Creating large “test collections”: creating large data files to be used for evaluating IR systems; TREC created larger test collections than had been available in the past, thus lowering the cost of evaluation and increasing the quality of IR systems.
Developing standardized IR evaluation methods: developing rigorous IR evaluation methodologies and processes to enable higher quality evaluation methods at a lower cost.
Hosting TREC workshops: offering a venue for IR researchers to compete against one another in narrowly defined forums and receive feedback on the relative performance of their IR systems; as a result, researchers were able to identify much more quickly which IR techniques worked and which did not, and information exchange was facilitated.

RTI will survey members of the IR community because only they will be able to estimate the impact of these NIST investments accurately.

2. Explain how the survey was developed including consultation with interested parties, pretesting, and responses to suggestions for improvement.

RTI’s data collection effort will consist of two components: an Internet survey and an in-depth interview. First, the Internet survey is expected to take approximately 15 to 20 minutes to complete and is designed to determine how much IR researchers value the technical accomplishments of TREC that were listed above. The survey instrument to be used is attached. In general, it is composed of several types of questions:

Background Questions: The survey will ask individuals several background questions, such as how much their employer spent on R&D in the previous year. This information will not be published and will primarily be used to extrapolate data collected from the Internet survey to nationwide impact estimates (this will be discussed in more detail under Question 4 below).
Yes or No Usage/Adoption Questions: The survey instrument includes several questions about whether certain aspects of TREC have benefited the IR research conducted by the respondent. These questions can be answered quickly with a yes or no response.
TREC Valuation Questions: These questions will ask respondents to explicitly estimate the value that using TREC’s resources has had on their IR research. The data they provide will be the foundation for RTI’s quantitative assessment of the benefits generated by TREC.

The survey instrument was developed internally at RTI by a team of technology economists and engineers with background in IR research. To ensure that the members of the target population would be able to answer the questions included in the survey, drafts of the survey instrument were pretested with several IR researchers located in commercial IR firms and academic institutions.

In addition to an Internet survey, RTI will collect data through informal interviews. Interview subjects will primarily be identified through the Internet survey, which will ask respondents if they are willing to participate in a 20-minute informal interview that will dig deeper into the survey responses they provided. The purpose of these interviews is to collect quantitative and qualitative information on the value of TREC accomplishments that cannot be obtained through the Internet survey. Potential discussion topics include the following:

What types of benefits has the use of TREC resources provided you? (e.g., have you experienced cost savings or improvements in the quality of research?)
When did the benefits of using TREC resources accrue? (Did the benefits from using these resources only accrue during the time you used these resources or were they spread out over time?)
How did TREC affect the timing of product and service offerings in areas such as Web and enterprise search?

3. Explain how the survey will be conducted, how customers will be sampled if fewer than all customers will be surveyed, expected response rate, and actions your agency plans to take to improve the response rate.

As previously stated, the survey will be conducted over the Internet. At this time, RTI is coding the Internet survey. Once the survey is operational, it will be uploaded to a Web site that will be housed on RTI’s encrypted servers. Our security policy will ensure that information provided by respondents is secure. The survey URL will be http://TRECsurvey.rti.org and should be ready to go live in mid-November.

In order to ensure that the sample for this survey provides the minimum mean square error estimate, RTI developed a cut-off sample of the largest companies and academic institutions conducting IR research. This cut-off sample represents over 80% of the total R&D expenditures, as suggested by OMB’s data collection methods. The discussion below describes how the universe for this sample was created, how data on R&D expenditures were collected, and how the response rates required to meet the sample cut-off will be achieved.

Universe Creation

To create a universe of potential respondents, RTI identified the major organizations conducting IR research, which included both firms and academic institutions. RTI used two primary sources to identify these organizations:

NIST provided a list of academic institutions and public companies that participated in TREC conferences between 1992 and 2009.
RTI used publicly available information on the proceedings of the Association of Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR) to identify organizations presenting IR research at SIGIR’s annual conference.

RTI supplemented these sources by looking at information on attendance at other IR-related conferences (such as the 2009 Enterprise Search Summit) and by looking at industry reports for several submarkets related to IR (such as Gartner’s 2008 report on the market for information access technologies and software).

Data Collection

After a universe of potential respondents was compiled, RTI collected data on the R&D expenditures for the organizations included. These data were collected from different sources based on whether the organization was an academic institution or a private firm:

For academic institutions, data on R&D expenditures in computer sciences were obtained from the 2007 Survey of Research and Development Expenditures at Universities and Colleges (National Science Foundation [NSF], 2008). R&D expenditures in computer sciences were used because academics conducting computer science research are the most likely to use TREC resources.
For firms, RTI used two methods to obtain estimates of total 2007 R&D expenditures. First, RTI collected data on public companies reporting R&D expenditures through the S&P North American COMPUSTAT database provided through the Wharton Business School. Second, R&D expenditures were estimated for public companies that did not report R&D expenditures by multiplying the sales of these companies (reported in the COMPUSTAT database) by the average R&D to the sales ratio for companies that did report R&D expenditures. This is also the method used to calculate R&D expenditures for private firms (however, sales data were instead collected for these companies from MANTA.com).^²

Cut-Off Sample Calculations

After R&D data was collected for each of the 125 organizations included in the universe (81 academic institutions and 44 firms), these organizations were ranked based on the amount of R&D being conducted in each. Next, RTI identified the largest organizations that accounted for greater than or equal to 80% of the R&D being conducted in each type of organization. The results of this analysis are presented in Table 1. In total, RTI has identified 34 organizations that represent over 80% of the research expenditures. Although the majority of these organizations are academic institutions (27), the largest share of R&D expenditures is accounted for by commercial firms (7).

Table 1. Estimated Total Respondents Calculation

Stakeholder Group	Total Number of Organizations	2006 U.S. R&D Expenditures ($million)	Number of Organizations Needed to Achieve 80% Sample Coverage	Estimated Total R&D Expenditures of Surveyed Organizations in 2007 ($million)	Share of Organizations to be Surveyed
Academic Institutions	81	$1,049	27	$837	80%
Public and Private Firms Conducting IR Research	44	$31,569	7	$25,724	81%
Total	125	$32,618	34	$26,561	81%

RTI expects to solicit participation from representatives of all 34 organizations. The next section describes how RTI intends to achieve this level of response.

Expected Response Rate

In order to collect information from the 34 organizations that represent over 80% of their R&D being conducted today, RTI is advertising its survey through authoritative information outlets that are monitored by large portions of the IR research community:

RTI will advertise its survey through the Friends of TREC e-mail list. This list is estimated to reach approximately 500 individuals spread across multiple IR research organizations. This list will represent those individuals that have participated in previous TREC workshops and will therefore be most likely to be interested in the subject of this survey.
RTI is also coordinating with the Association for Commuting Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR) to advertise the survey to its membership through its moderated electronic newsletter. This newsletter is regularly e-mailed to approximately 1,700 individuals that comprise individuals employed by academic institutions and firms conducting IR research.

Although there is likely some overlap, RTI expects, through the combination of these efforts, to advertise its survey to approximately 2,000 individuals spread across IR organizations and we are targeting a response rate of approximately 5 percent, meaning that 5 percent of these individuals will provide complete, usable survey responses. This anticipated response rate is based on two primary factors:

RTI’s previous experience in conducting similar surveys has proven that surveying individuals involved in R&D can be very difficult (as described below).
A relatively large, but unknowable, portion of these 2,000 individuals will be located outside the United States and therefore outside the scope of this survey (lowering the number of potential U.S. respondents).

Based on this anticipated response rate, RTI expects up to 100 people to respond to the survey, many of whom will be individuals located at the same organization. However, of these 100 respondents, only individuals located at the 34 organizations that represent over 80% of IR R&D will be selected for follow-up interviews. This implies that 34 individuals will complete the survey and interview (a total 40 minute burden per respondent), while up to 66 individuals will only complete the survey (a 20 minute burden per respondent).

Efforts to Achieve/Improve Response Rates

In order to ensure a sufficient response rate, RTI is pursuing several of the widely-acknowledged procedures for improving response rates that are outlined in the 2006 “Guidance on Agency Survey and Statistical Information Collections” (OMB, 2006). In particular, RTI will do the following:

Coordinate with professional societies and research organizations that are well respected in the IR research community to promote awareness of the survey. Specifically, RTI will work with TREC and SIGIR to advertise its survey, as previously described.
Design an on-line questionnaire that is both easy to access and easy for respondents to use. This questionnaire will be formatted to be user-friendly and compatible across popular Internet browsers.
Allow users to complete the survey on a hard copy or via phone, at their request.
Effectively communicate to respondents that their information will be held confidential. Specifically, RTI will inform respondents that their responses will be kept on secure computers at RTI International and that RTI will not share their individual responses with any other third-party, including NIST.

However, in spite of these efforts, RTI is still concerned about generating a sufficient response from members of the IR research community. This concern is based on RTI’s previous experience, which has found that surveying individuals involved in R&D can be difficult. For example, in RTI’s 2007 Study “Economic Analysis of the Technology Infrastructure Needs of the U.S. Biopharmaceutical Industry”, RTI only received 58 responses to its survey that were sufficiently complete to be included in its analysis after sending advertisements to approximately 10,000 members of the Biotechnology Industry Organization (RTI, 2007).

Therefore, in order to further ensure that a sufficient response rate is achieved, RTI will offer a nonmonetary incentive to survey respondents. Specifically, respondents will be automatically entered into a drawing to win a free MP3 player ($400 value) as a result of participating in the survey.^³Because each person has an equal chance of winning the prize, a drawing offers an equitable means of providing incentive for respondents to participate in the survey.

In addition to improving coverage of a specialized group of respondents, the decision to use incentives is justified by the following factors, which are identified by OMB (2006) as principles that should be considered when justifying the use of incentives:

Past experience: Numerous empirical studies have shown that incentives can significantly increase response rates (e.g., Abreu & Winters, 1999; Shettle & Mooney, 1999; Singer et al., 1999; Watt, 1999). Specifically, Watt (1999) provides evidence from several studies, including a Web survey of individual investors conducted for a private investment firm, that illustrates how offering raffle incentives is an effective means of boosting response rates in Internet surveys. In addition, studies conducted by RTI International have found that using raffles increased response rates. For example, RTI conducted a survey of new mothers as part of its study on the Pregnancy Risk Assessment Monitoring System (a surveillance project of the Centers for Disease Control and Prevention and state health departments) that used a raffle incentive and resulted in a response rate of over 70%. Table 2 summarizes these studies and the response rates they achieved.
Reduced survey costs: In previous studies, such as the 2009 “Retrospective Economic Impact Assessment of the NIST Combinatorial Methods Center”, RTI has sought to improve response rates of its surveys by devoting considerable resources to non-response follow-ups. RTI believes that providing an incentive could help avoid many of these costs by encouraging respondents to complete the survey without the need for continual follow-ups. In particular, given that the incentive being offering is valued at $400, it would only have to save RTI a handful of hours in order to pay for itself. Assuming that each successful follow-up might require 2 to 3 hours, based on RTI’s previous experience, this suggests that the incentive would only have to spare RTI from following up with 1-2 respondents to reduce survey administration costs.
Data Quality/Complex study design: As previously described, the survey RTI will be fielding includes question on valuing TREC resources. Based on discussions with several members of the IR research community, these questions can sometimes be difficult for respondents to answer. Therefore, providing an incentive may help to make respondents more engaged in the survey process and result in higher data validity by giving them a material stake in completing the survey.
Burden on the respondent: IR researchers are a specialized group of technological professionals whose time is valuable. As a result, there are many other demands competing with our survey for their attention. By providing an incentive, we recognize the time burden being asked of them and convey our appreciation for their participation.

Table 2. Studies Involving Respondents Receiving Raffle Incentives and Corresponding Response Rates

Study	Population	Incentive Provided	Response Rate Achieved
Study for private client (Watt, 1999)	Individual Investors	Participation in a raffle for a $450 cash award	≥88%
Pregnancy Risk Assessment Monitoring System (PRAMS)	New mothers	Participation in a raffle for a cash award	≥70%

4. Describe how the results of the survey will be analyzed and used to generalize the results to the entire customer population.

Respondents will be asked in the Internet survey to provide estimates of how much they value TREC’s technical accomplishments. These responses will be used as the primary indicator of the total benefits generated by TREC for research conducted in commercial firms and academic institutions. To obtain national impact estimates, these responses will be combined with secondary data sources (e.g., NSF, Bureau of Labor Statistics, Census, Gartner) on industry size, R&D, and labor rates as appropriate. RTI will use this secondary data to extrapolate impact estimates to the industry level. The end result will be the calculation of costs and benefits of TREC between 1992 and 2008 as received by both commercial firms and academic institutions.

References

Abreu, D. A., & Winters, F. (1999). Using monetary incentives to reduce attrition in the survey of income and program participation. In Proceedings of the Survey Research Methods Section of the American Statistical Association. http://www.amstat.org/ sections/SRMS/proceedings/. Last updated on May 24, 2007.

Gartner. (2008). Magic Quadrant for Information Access Technology. http://mediaproducts.gartner.com/reprints/microsoft/vol6/article4/article4.html. As obtained on August 27, 2009.

National Science Foundation (NSF). (2008). 2007 Survey of Research and Development Expenditures at Universities and Colleges. http://www.nsf.gov/statistics/srvyrdexpenditures/. As obtained on August 26, 2009.

Office of Management and Budget (OMB). (2006). Guidance on Agency Survey and Statistical Information Collections. Memorandum from OMB Administrator John Graham for the President’s Management Council. http://www.whitehouse.gov/OMB/inforeg/pmc_survey_guidance_2006.pdf.

RTI International. (2007). Economic analysis of technology infrastructure needs of the U.S. biopharmaceutical industry. Prepared for the National Institute for Standards and Technology (NIST). http://www.nist.gov/director/prog-ofc/report07-1.pdf

RTI International. (2009). Retrospective Economic Impact Assessment of the NIST Combinatorial Methods Center. Planning Report 09-1 prepared for the National Institute of Standards and Technology. http://www.nist.gov/director/prog-ofc/report09-1.pdf

Shettle, C., & Mooney, G. (1999). Monetary incentives in U.S. government surveys. Journal of Official Statistics, 15, 231-250.

Singer, E., Van Hoewyk, J., Gebler, N., Raghunathan, T., & McGonagle, K. (1999). The effect of incentives in interviewer-mediated surveys. Journal of Official Statistics, 15, 217-230.

Watt, J. H. (1999). Internet systems for evaluation research. “Information technologies in evaluation.” In Social, moral, epistemological and practical implications, G. Gay & T. Bennington (eds), pp. 23-44. San Francisco: Josey-Bass.

1 RTI International is the trade name of the Research Triangle Institute.

2 Since the aim of this study is to estimate only the U.S. economic contributions of TREC, the ideal variable to use when creating the cut-off sample for private companies would be total U.S. R&D expenditures in the computer sciences. However, since most companies do not report R&D expenditures by location or research area, RTI must use total company R&D expenses as a proxy for preparing the sample.

3 The amount of the incentives being used in this survey was determined through discussions with IR researchers employed by RTI who have experience with completing similar types of surveys and the incentives they typically offer. A smaller incentive would not appear sufficiently attractive to IR researchers.

File Type	application/msword
File Title	Assessing the Biopharmaceutical Industry’s
Author	amicar
Last Modified By	dyonder
File Modified	2009-11-06
File Created	2009-11-06