OMB_package_supporting_statement for CISS JAN 2015 - Final Sec B GT

OMB_package_supporting_statement for CISS JAN 2015 - Final Sec B GT.docx

Crash Investigation Sampling System (CISS)

OMB: 2127-0706

⚠️ Notice: This form may be outdated. More recent filings and information on OMB 2127-0706 can be found here:

Document [docx]

Download: docx | pdf

SUPPORTING STATEMENT

FOR

P.L. 89-663, Title 1, Section 106, 108, 112. - COLLECTION OF CRASH DATA

OMB Control Number: None

B. COLLECTIONS OF INFORMATION EMPLOYING STATISTICAL METHODS

Describe the potential respondent universe and any sampling or other respondent selection methods to be used.

The CISS universe, or sample frame, is the set of police-reported motor vehicle crashes on a traffic way involving a passenger vehicle and in which a passenger vehicle is towed from the scene resulting a police accident report (PAR). The estimated CISS population size is about 1.9 million a year. CISS samples this basic frame through a stratified multi-stage cluster scheme as follows:

Divide the country into geographic units called Primary Sampling Units (PSUs). A PSU is a county or group of counties. PSUs were formed as groups of adjacent counties with end-to-end distance no more than 65 miles for urban area and 130 miles for rural area to ensure efficient data collection. PSUs were also formed in such a way that with 90% of chance there are at least 5 fatal crashes involving a passenger vehicle in a given year. PSU formation respects region and urbanicity boundary. Some outlying areas of Alaska and small islands of Hawaii were excluded. There are total 1,784 PSUs in the PSU frame.

The PSU frame is then stratified into 8 primary PSU strata by two variables – region (Northeast, West, South, and Midwest) and urbanicity (urban and rural). Within each primary stratum, PSUs are further stratified by other secondary stratification variables such as Road Type, Total Crash, and Total Vehicle Mile Traveled. PSUs with similar characteristics were grouped into secondary strata with approximately equal MOS sizes and minimum within stratum variances. As the result, total 24 PSU strata are formed.

A composite measure of size (MOS) is assigned to each PSU in the frame. PSU MOS is a summation of PSU’s injury crash count proportion and new vehicle crash count proportion in the population weighted by predetermined sampling rate so that PSUs with more severe and new vehicle crashes have better chance to be selected. A probability proportional to size (PPS) sample of 2 PSUs per stratum is selected systematically from each of the 24 PSU strata. The resulting sample with 48 PSUs serves as the first phase PSU sample.

We anticipate no more than 48 PSUs will actually be used for CISS data collection due to budget constraints. The actual CISS PSU sample size is smaller than 48. To cope with the uncertainty of future budget level, a sequence of nested PSU sub-samples were selected from the 48 PSU first phase sample. For example, the second phase PSU sample has 40 PSUs and 20 PSU strata. To select the second phase PSU sample, 4 first phase PSU strata were collapsed with other strata to form total 20 second phase strata. Then 2 PSUs were selected from the collapsed first phase PSU sample in each of the 20 second phase strata. Total 5 phases of PSU sample were selected – each is a sub-sample of previous phase PSU sample with sample size range from 16 to 48 plus one certainty PSU (see figure and table below). The PSU collapsing order is predetermined. Each sample can be used as PSU sample and the resulting PSU selection probability is PPS. This approach produces flexible PSU sample and minimize the impact to existing PSU sample when sample size changes. Typically, non-certainty PSU selection probability is:

Here h is PSU stratum, i is PSU, is PSU MOS, summation is over the non-certainty PSUs in the stratum.

Shape1

Scenario	Number of PSU Strata	Number of Certainty PSU	Total Number of Sampled PSUs	Sampled PSU per Stratum
1	24	1	49	2
2	20	0	40	2
3	16	0	32	2
4	12	0	24	2
5	8	0	16	2

Within each PSU, PARs are stratified by the police jurisdictions (PJ) where PARs are available and PJs become the second stage sampling units. A composite MOS is assigned to each PJ in the selected PSUs. PJ MOS is a summation of PAR domain crash count percentages in the PSU weighted by the predetermined PAR domain sampling rate. So PJs with larger desirable crash composition are selected with larger probabilities. PJs are then stratified into two PJ strata by their MOS (largest 50% vs the rest) in addition to certainty PJs. A PJ sample is then selected from each PJ stratum using sequential Poisson sampling.

Sequential Poisson sampling method (see Ohlsson, Esbjörn (1998): Sequential Poisson Sampling, Journal of Official Statistics, Vol.14, No.2, pp. 149–162) produces an approximate PPS sample, handle the frame changes and minimize the changes to the existing sample at the same time.

Sequential Poisson sampling method was applied to the PJ sample selection for each of non-certainty PJ strata (large MOS or small MOS stratum) within the sampled PSU , as following:

Generate a permanent uniform random number for each PJ in the PJ frame.
Identify certainty PJs by the condition:

Here is the PJ sample size and is the PJ frame size for a PJ stratum within PSU i. is the PJ MOS. The identified certainty PJs are set aside. And this process is repeated to the remaining PJs based on the reduced PJ sample size until there is no more certainty PJs. Let the total number of certainty PJs be .

For the remaining non-certainty PJs in the frame, divide their permanent random number by the MOS to obtain the transformed random number: . Then, sort the transformed random number from the smallest to the largest as following:

Thus, the certainty PJs plus the first non-certainty PJs on the above list are the PJ sample for a PJ stratum within PSU .

Sequential Poisson sampling is approximately PPS. The PJ selection probability is:

Here j is for PJ, is the PJ sample size for PSU i, is the PJ MOS. The summation is over all non-certainty PJs in the selected PSU.

PARs are selected on a weekly basis. In other words, the crashes over the year are stratified by weeks. Each week in each selected PSU, technicians visit the selected PJs and list PARs recorded at that jurisdiction since the last visit. During the PAR listing process, PARs are labeled into 10 analysis domains. For some large PJs with large number of PARs, a systematic sample of PARs is listed. If one of every PARs is sub-listed in PJ , PSU , the sub-listing probability for all sub-listed PARs are:

After PARs in all sampled PJs are listed, all listed PARs in the same PSU are pooled together. A PAR MOS is assigned to every listed PAR. PAR MOS is the product of a PAR domain MOS factor, the PJ design weight, and the sub-listing factor (the inverse of ). PAR domain MOS factor is determined by simulation to ensure the predetermined PAR domain sample sizes can be achieved. PARs in rare domain and/or smaller PJ receive larger MOS.

A PAR sample of average 1.75 crashes per data collector in each PSU every week is selected using sequential Poisson sampling. Sequential Poisson sampling produces a scalable sample so it allows us to replace the non-responding cases (the cases with key vehicle information missing) and to better handle workload changes. Under sequential Poisson sample selection, the PAR selection probability is:

Here is the weekly PAR sample size for PSU i, is PAR MOS. The summation is over all listed non-certainty PARs of the week in the PSU.

The overall selection probability is:

The design weight is the inverse of .

PSU, PJ and PAR sample sizes are estimated using optimization by minimizing variance subject to cost assuming three stage simple random sampling without replacement.

The optimization model consists of the objective function, cost constraint, and variance constrains as following.

: Subscript of the identified key estimate, . Here .

: Identified key proportion estimate.

: Optimal sample sizes of PSUs, PJs, and cases (PARs) to be determined.

: Population size of PSUs

: Average population size of PJs.

: Average population size of PARs

: Variance of the identified key estimate .

: Variance component at PSU-, PJ-, and case-level.

: Total, fixed, PSU-, PJ-, and crash-level cost coefficients.

: Variance of the identified key estimate in the current system (NASS CDS).

: known case load.

Under the current CDS budget, the current CDS cost components, and with 2 technicians per PSU, it was determined the optimum sample allocation is about 24 PSUs, 8 PJs per PSU, and 200 cases per PSU.

PSU Strata	PSU Sample Size	PSU Population Size
1	2	34
2	2	61
3	2	63
4	2	61
5	2	76
6	2	318
7	2	57
8	2	254
9	2	614
10	2	47
11	2	22
12	2	177
Total	24	1784

Standard errors for seven key estimates under current CDS were used as constraints in the above optimization model to ensure the corresponding degree of accuracy under CISS will be at least as good as CDS.

Once crashes are selected a detailed investigations is conducted on the selected crashes.

There are 340 PJs that are in the sampled PSUs but were not selected into the PJ sample. NHTSA collects crash counts from the non-sampled PJs in the selected PSUs to improve the accuracy of the national estimates.

The following table shows the estimated number of police reported crashes to be listed and selected at the 24 CISS PSUs for calendar year 2016. The “National Police Reported Crashes” is an estimate of the total number of crashes that occur across the United States as reported by the General Estimates System.

National Police-Reported Crashes

1,859,000

Police-Reported Crashes in 24 CISS PSUs

92,661

Police-Reported Crashes in Sampled CISS Police Jurisdictions

64,746

Crashes Selected for CISS Investigation

4,878

Once a crash is selected for the CISS sample it remains in the sample. Since the crash was identified by a police report, the information on the report itself provides some of the data needed for each component of the CISS investigation. Expected completion rates for the additional investigation stages are scene inspection 98%; vehicle inspection 85%; occupant interview 83%; and, occupant injury 88%. If the key vehicle is missing, a replacement PAR is selected for investigation from the listed PARs sorted by sequential Poisson sampling method. This dramatically increases the effective sample size and data quality.

After design weights are calculated, the weights need to be adjusted for the following reasons:

Refusal/non-respondent adjustment at all 3 stages;
Frame coverage bias correction;
Matching marginal totals to other data sources – for example, total fatality to FARS;
Large weight trimming;

Calibration technique will be used as the adjustment method. The potential auxiliary information to be used for calibration includes FARS, Census population counts, and PSU level total crash counts.

The calibration adjustment method that handle all the above have been implemented in SUDAAN 11 WTADJX procedure. SUDAAN WTADJX procedure will be used to create the final analysis weights.

Some key item missing values will be imputed. Several imputation methods will be considered and used for imputation, depending on the missing variable and available information. The imputation methods include but not restricted to: logical imputation, regression imputation, and hot deck imputation.

The actual CISS PSU sample size to be used is 24 with 2 PSUs selected from each of the 12 PSU strata. The average PSU population count is about 148 PSUs per stratum. Therefore, the resulting PSU sampling rate is quite low. We expect the PSU sample selection can be approximated treated as with-replacement sample selection. The standard specialized software such as SAS SURVEY procedures and SUDAAN procedures can be used for CISS data analysis.

Describe collection of information procedures.

Once a crash has been selected for investigation, several activities are initiated by the CISS team. Researchers locate, visit, measure, and photograph the crash scene; locate, inspect, and photograph all involved vehicles; conduct a telephone or personal interview with each involved person or surrogate; and, record injury information from hospitals or emergency rooms for all injured victims. During each activity the researchers record information on the crash, vehicle, and occupant forms as appropriate.

Describe methods to maximize response rates and to deal with issues of non-response.

CISS has a three stage sample design. The first stage sampling units are counties or groups of counties. A PSU becomes a non-responding PSU only if all selected police jurisdictions (PJs) within the PSU are non-responding PJs. In CISS, PJ samples are selected using sequential Poisson sampling method. The whole PJ frame can be used as replacement sample. Therefore, a PSU becomes non-responding PSU only if all PJs in the frame are non-responding PJs. Therefore, by design, it is unlikely there will be any non-responding PSUs in CISS.

The second stage sampling units of CISS are PJs. A sampled PJ becomes non-responding PJ if it refuses to cooperate. To improve PJ cooperation rate, NHTSA plan to visit each selected PJ and meet with local law enforcement officers to gain cooperation. In CDS and in the first 5 PSUs of CISS, we haven’t experienced any refusals and we don’t expect any non-responding PJs in the rest of CISS PSUs.

At the third stage, first all police accident reports (PARs) in the selected PJs are listed. Then a sequential Poisson sample of PARs is selected. Because the sampling units are police reports and a PAR must be available before it is listed, therefore by design, there is no unit non-response at the third stage of sampling for CISS.

The item response rate of CDS varies. For example, scene inspection variables have item response rates around 98%; vehicle inspection 85%; occupant interview 83%; and, occupant injury 88%. Based on the CDS, we expect that CISS’ has similar response rates. The Crash Investigation Sampling System (CISS) quality control system will be designed to produce the most accurate, reliable, and complete database possible within the limits of available resources. All data will be automated and edited by a complex algorithm which checks for inconsistencies and questionable items. A sample of all crashes will be given a thorough review by an experienced researcher at a Data Quality Control Zone Center. Zone Center personnel will visit each PSU regularly to observe the team’s investigation activities and to discuss systematic problems revealed in edit and Zone Center reviews of the team’s cases.

Since the interview is vital to a complete case, CISS teams will make special efforts to complete an interview when at all possible. Occupants will be contacted by telephone. CISS researchers will call at varying hours (often in evenings or on weekends) until they have located the person sought. When the person is unavailable, other passengers or witnesses are contacted. If the person sought cannot be located by telephone, researchers use personal visits or mail questionnaires. Each CISS researcher will be given special training in interviewing. This increases the possibility that persons will cooperate once they have been located and contacted. As a result of these procedures used in our legacy program (NASS), it’s anticipated that CISS teams will complete more than three-quarters of all occupant interviews. Proposed interview forms to be used for CISS are displayed in Attachment 7.

As a final check on CISS data, approximately 5% of those interviewed will be recontacted by Zone Center personnel to establish that they had in fact been interviewed and to verify some of their responses. This type of interview takes approximately 5 minutes.

Describe any tests of procedures or methods to be undertaken.

NHTSA will test new data collection procedures for six (6) months. The test will include: gathering police crash reports and identifying qualified crashes to investigate, conducting the interviews to assess interview questions and procedures, entering data from scenes and vehicles with electronic devices, analyzing data, and monitoring for quality control.

The attached spreadsheet (Attachment 3) shows the list of data elements to be collected from the detailed investigation of the selected crashes. The electronic forms and protocols are being developed to collect this information on tablet computers.

Provide the name and telephone number of individuals consulted on statistical aspects of the design and the name of the agency unit, contractor(s), grantee(s), or other person(s) who will actually collect and/or analyze the information for the agency.

Ms. Chou-Lin Chen, National Center for Statistics and Analysis, NHTSA, 202-366-1048 is responsible for CISS survey design.

NHTSA has decided to undertake a basic redesign of the National Automotive Sampling System that will attempt to meet new and diverse requirements through expanding its scope and making it more responsive to changing needs. Accordingly, NHTSA has contracted with Westat (contract DTNH22-12-F-00389) to lead the survey modernization effort, but also participated jointly with Westat in developing the new Crash Investigation Sampling System (CISS). The CISS contractors are Calspan Corporation (contract DTNH22-14-D-00363) and KLD Associates, Inc. (contract DTNH22-14-D-00366).

Page 7

File Type	application/vnd.openxmlformats-officedocument.wordprocessingml.document
File Title	SUPPORTING STATEMENT
Author	Ruth Isenberg
File Modified	0000-00-00
File Created	2021-01-25