INFORMATION COLLECTION
SUPPORTING JUSTIFICATION
Survey of Northeast Regional and Intercity Household Travel Attitudes and Behavior
This ICR is to request the Office of Management and Budget’s (OMB) approved clearance for the information collection entitled, “Survey of Northeast Regional and Intercity Household Travel Attitudes and Behavior.”
The Federal Railroad Administration (FRA) proposes to collect information from the public to determine current intercity and regional travel behavior of northeast residents. The information collected will include frequency of trips, origin and destination, modes of travel (and class of service if applicable), trip purpose, party size, trip costs, and other trip characteristics. It will also ask for travel preferences under alternative choice scenarios that include different and new modes, classes of service, costs, and amenities.
1. Explain the circumstances that make the collection of information necessary. Identify any Legal or administrative requirements that necessitate the collection. Attach a copy of the appropriate section of each statute and regulation mandating or authorizing the collection of information.
a. Circumstances making the collection necessary
The Northeast region has one of the most extensive multi-modal passenger and freight transportation systems in the world—highways, airports, ports, intercity and commuter rail, and public transit serving all major cities and many intermediate markets. However, despite significant investment over decades in all modes, the region still faces major congestion and capacity constraints. These constraints, if not addressed, have the potential to curtail future mobility and economic growth and place the Northeast at a competitive disadvantage to other regions of the U.S. and the world.
With these issues in mind, FRA awarded a collective grant to the northeast states through which the NEC passes for planning investments in the NEC. FRA is the designated lead agency for the resulting planning program called NEC Future. The goal of the NEC FUTURE program is to prepare a Passenger Rail Corridor Investment Plan (PRCIP) for the northeast region. The PRCIP, consisting of a Service Development Plan (SDP) that articulates the overall scope and approach for proposed service and Tier 1 Environmental Impact Statement (EIS), when complete will define an integrated, comprehensive passenger rail transportation solution for the Northeast. The purpose of this solution is to improve mobility, effectively serve travel demand due to population and jobs growth, support economic development, reduce growth in carbon emissions and dependence on foreign oil, and contribute to improved land utilization and investment in both urban and non-urban communities in the region.
The results of the survey will be used to develop a new model for forecasting future travel behavior in response to future services provided by different modes of travel in the Northeast.1 The primary use of the model will be to analyze the ridership impacts of alternative rail investments plans for northeast corridor (NEC) as part of the aforementioned PRCIP for the northeast region. In addition, the model and the data underlying the model will be available to FRA for use in future projects involving the NEC.
The NEC Future planning program has three phases, as shown in Figure A.1.
The recently completed Phase I involved the early service planning and alternatives development and evaluation. Technical work completed in Phase I includes: data collection, development of the project’s Purpose and Need, preparation of a Public/Stakeholder Involvement Plan, analysis of existing ridership and currently available ridership forecasts, performance of operations analysis, identification and evaluation of service alternatives, identification of infrastructure requirements, and the initiation of Early Engagement with Resource Agencies and National Environmental Policy Act (NEPA) Scoping process. The market analysis and forecasting supporting the identification and coarse screening of alternatives in Phase 1 was based on existing available market data, forecast results from prior studies, and the use of existing models to forecast ridership and revenue results from alternatives.
Phase 2, which has just started, will involve further refinement of the alternatives and the preparation of the Draft Tier 1 EIS and the Draft SDP. Detailed alternatives analyses will be conducted in Phase 2, supported by ridership and revenue forecasting based on the new survey data and new model developed as part of this project. Phase 3 results in the preparation of the Final Tier 1 EIS and Record of Decision, as well as the final SDP.
The project team has divided Phase 2 into two subparts. During Phase 2A, the analysis of alternatives will be supported by existing data and models. Phase 2B will begin once the data collection is complete and the new model has been developed and is available to for the detailed analysis of alternatives. Originally, survey work was planned and programmed starting in mid-June 2013 and surveys were subsequently put into the field in August, beginning with an initial sample of 300 respondents constituting a “pilot phase”. The results of the “pilot phase” were summarized for review in the November 2013. Based on the observed “pilot phase” results, the survey has been restructured as a single phase phone survey. Pending OMB approval of the revised survey in February 2014, it is expected that field work will commence in March 2014 and continue for about 10 weeks to generate the specified number of completed surveys. This schedule will allow for the data to be available (after processing) by June 2014. The delivery of data by June 2014 will allow for model estimation to occur in early Summer 2014. The alternatives developed in Phase 2B can be evaluated during late Summer and Fall of 2014. Further slippage in schedule would lead to an equivalent delay of the project which would delay the EIS process, selection of an alternative, and the ultimately impact when the investments can start. Or, otherwise, the NEC FUTURE study would proceed on its current schedule without the full benefit of the new model.
Further development and implementation of a number of major Northeast Corridor projects must await completion of the NEC FUTURE Tier 1 NEPA Record of Decision. This includes advancing portions of the Gateway project, perhaps the most important program on the Northeast Corridor at this time, where resolution of key issues impacting the number and location of platforms at Penn Station must await completion of NEC FUTURE. The size and alignment for various other critical improvement projects currently pending on the NEC – including the Susquehanna Bridge replacement in Maryland – similarly are dependent on the findings coming from the NEC FUTURE project. Final design and construction of these projects must hold until completion of NEC FUTURE.
FIGURE A.1 PROJECT STRUCTURE
FRA has the authority to conduct this information collection through the National Environmental Policy Act (NEPA) (42 U.S.C. § 4321 et seq.) and 49 U.S.C. § 103 (FRA’s authorizing statute). FRA’s authorizing statute gives the FRA Administrator the authority to “support rail intermodal development and high-speed rail development, including high speed rail planning” and “ensure that the programs and initiatives . . . benefit the public and work toward achieving regional and national transportation goals.” 49 U.S.C. 103(j)(5)-(6). NEPA directs Federal agencies to study and consider the environmental impacts of any “major Federal actions significantly affecting the quality of the human environment.” 42 U.S.C. § 4332(2)(C). Federal agencies shall “utilize a systematic, interdisciplinary approach which will insure the integrated use of the natural and social sciences and the environmental design arts in planning and in decision-making which may have an impact on man’s environment” in any NEPA study. 42 U.S.C. § 4332(2)(A); 40 C.F.R. 1501.2(a) and 1502.6.
This information collection is instrumental to FRA’s rail service planning efforts and Tier 1 environmental impact statement (EIS) for the Northeast Corridor (NEC), known as the NEC FUTURE program. The survey will provide the FRA with information on where and how people travel along and within the NEC as well as their preference for how they would like to travel in the future. FRA will then use the gathered information to develop a new model to forecast travel demand within the NEC by mode (rail, air, intercity bus and car). The travel demand modeling tool will provide FRA with valuable information on the characteristics of NEC travelers (e.g., age, income or vehicle ownership), trip purpose (e.g., business, leisure or commute), detail on party size, trip origins and destinations, and traveler preferences. FRA will use this information to make better planning and investment decisions on a regional basis by providing the tools to evaluate changes in travel demand generated by the way in which train service is operated on the Northeast Corridor and by the types of investments made to enhance the NEC infrastructure. In addition, the information collection will provide the data necessary for FRA to ensure that its planning initiative and Tier 1 EIS benefit the public and achieves regional and national transportation goals, as set forth in FRA’s authorizing statute. See 49 U.S.C. § 103(j)(5)-(6). Moreover, the information collection is exactly the sort of “interdisciplinary approach” required by NEPA. See 42 U.S.C. § 4332(2)(A); 40 C.F.R. 1501.2(a) and 1502.6. The survey data will provide insights into travel patterns and traveler preferences that will inform the definition of alternatives, ensuring a sufficiently broad spectrum of alternatives are considered in the Tier 1 EIS. This data will further be valuable input to the analysis of several environmental factors including noise and vibration, air quality, and transportation.
The information collection will support the development of a new model to forecast travel demand by mode within the Northeast Corridor (NEC ) as part of the NEC FUTURE program for developing a Passenger Rail Corridor Investment Plan (PRCIP) for the northeast region. The FRA is named as the lead agency for the effort. The PRCIP, consisting of a Service Development Plan (SDP) that articulates the overall scope and approach for proposed service and Tier 1 Environmental Impact Statement (EIS), when complete will define an integrated, comprehensive passenger rail transportation solution for the Northeast.
NEC FUTURE requires a travel demand forecasting model capable of forecasting future travel demand by mode, including different intercity and regional/commuter modes of travel, within the NEC. None of the existing available travel demand models maintained by Amtrak, regional agencies, or others within the NEC address the full extent of the NEC geography and the existing and future available intercity and regional/commuter modes of travel. For this reason, the NEC FUTURE study scope includes developing a new travel demand forecasting model for the NEC.
Specifics of the Model
The new NEC model will be a trip-based model similar in structure to other existing models, including Amtrak’s NEC travel demand forecasting model. The new NEC model consists of two major components, addressing:
Total Travel Market Size, based on existing travel market size and estimated growth
Mode Shares, based on the characteristics of the competing modes
These two components are actually applied in reverse order (i.e., mode share before total travel) so that the mode share model results can be incorporated within the total demand model structure. This linkage provides the total travel model with sensitivity to changes in the level of service provided by all modes, where each mode’s contribution to the overall “impedance” or “ease of travel” within a geographic market is effectively weighted by mode share.
Total Travel demand forecasts define the total market size to which the modal shares are applied to forecast demand by mode. In general, there are two distinct types of factors that influence total travel demand between geographic areas:
Population growth and changes in economic activity in the geographic areas
Changes in the modal levels of service provided between the geographic areas
Measures used to represent the impacts of these changes respectively include:
Socio-economic data and forecasts, provided by Moody’s:
Population
Household Income
Employment
Composite modal level of service, defined by the mode share model structure and equivalent to summing across all top level choices, as follows:
i - mode
Ui - utility of choice i
The total travel volumes are estimated using a ratio formulation that relates total travel market growth to growth in the independent variables, computed as the ratio of the forecast year to the base year values. This is illustrated by the following equation:
where:
TRP - trips F - future year
POP - population B - base year
INC - household income i - home zone
EMP - employment j - attraction zone
LOS - level of service
That is, inter-zonal trips (TRP) are projected to grow in proportion to population in the home zone (POPi), adjusted for its estimated effect, a; in proportion to population changes in the attraction (non-home) zone (POPj), adjusted for its estimated effect, b; in proportion to changes in household income (INCi) in the home zone, adjusted for its estimated effect, c; in proportion to the employment changes in the attraction zone (EMPj); adjusted for its estimated effect, d; and in proportion to changes in the overall level of service, adjusted for its estimated effect, e.
The total travel demand models will be estimated from base year travel data, using non-linear regression techniques, with total trips as the dependent variable and population, income, employment, and level of service as the dependent variables. Given that this is cross-sectional data from a single point in time, the actual model form to be estimated is given by:
where:
TRP - trips i - home zone
POP - population j - attraction zone
INC - household income
EMP - employment
LOS - level of service
The estimated constant drops out when the model is applied in the ratio formulation, shown earlier. Separate models will be estimated for each of the trip purpose market segments reflected in the mode share models, described below.
An essential input to the total travel model estimation and application is provided by base year travel demand – existing trips by geographic market. NEC FUTURE will develop estimates of existing trips by market and by mode from a variety of sources, including:
Passenger car/truck/van trips, from a parallel study by the NEC Commission, which is collecting auto data from toll records and surveys to build a NEC auto mode trip table
Plane trips, from the FAA 10% ticket sample
Rail trips, from:
Amtrak, for intercity rail
Commuter Rail agencies, for regional/commuter rail
Bus trips, from current service offerings and estimated load factor (although the NEC study continues to seek other sources of intercity bus data, if they exist) and from regional agencies that also provide longer haul bus service
In addition to the above modal data sources, the new NEC FUTURE survey will collect data from a mode-neutral household-based NEC sample, which will be utilized to provide additional market detail, such as a basis for allocating trips by trip purpose, which is not provided by many of the above modal sources.
The Mode Share component estimates the share of total person travel by mode. This model component addresses travel by the following modes:
Passenger car/truck/van
Plane
Bus
Train, addressing the following types of train service separately:
Premium high-speed rail (similar to Amtrak’s Acela train service)
Regional intercity rail (similar to Amtrak’s Regional train service)
Commuter rail (similar to the train service provided by MBTA, MNR, LIRR, NJ Transit, SEPTA, MARC, and VRE within specific regions)
Metropolitan rail, which would provide a new type of service to a mix of longer distance commuter and shorter distance intercity markets with amenities and pricing between existing commuter and regional intercity rail service; this service would be offered as a one-seat ride or with a required connection
The new model will estimate shares among these as a function of the following key independent variables describing the service characteristics:
Travel time
Travel cost or fare, taking account of the cost implications of travel by group and individuals and also including parking charges
Schedule of service provided by air, rail, and bus Mode specific constants reflecting the differences between modes not directly measured by other independent variables in the model (factors and traveler perceptions such as the comfort and convenience provided by each mode would be reflected here)
Income and/or occupation variables to account for differences in value of time and possibly included as modifiers of the trip cost to specifically account for the differential sensitivity of travelers’ income levels to travel costs
The mode share model will reflect the following trip purpose market segmentation:
Business trips
Non-business/non-work trips
Commute (journey to work) trips
Models of travel choice can be based on revealed (RP) or stated (SP) data. Each type of data provides certain advantages over the other. Combining the two sets of data to estimate a single model can produce a model that retains the advantages of both RP and SP models and eliminate or dramatically reduce the disadvantages of each. The NEC FUTURE travel survey is intended to collect data which can be used to study travel patterns and travel behavior along the Northeast Corridor. This information will be used to estimate a forecasting model of travel mode choice in the Northeast Corridor.
Over a number of years, it has been recognized that a combination of revealed preference (RP) and stated preference (SP) questions can yield a rich dataset which is capable of retaining the advantages of both types of data and minimize the limitations. The resulting models obtain a strong connection with real behavior as the RP data takes account of the real world constraints the respondents face and the SP data takes account of the wider range of alternatives and alternative attributes. The SP data also enables the survey designer to create alternatives in which the explanatory variables have a larger range of variability within and between alternatives and break the correlation between explanatory variables within each alternative. (Louviere, et al, 230). In addition, there are also several situations where they can provide insight into a future traveler’s choice behavior. These situations include the introduction of a new choice alternative with new attributes or include features where estimation results are limited due to little variability for a subset of explanatory variables or a subset of explanatory variables are highly collinear, such as time and cost (Louviere, et al, 21).
Models estimated solely from SP data require careful calibration to match base conditions in order to produce reliable results. A major advantage to utilizing SP data in addition to RP data is that the survey design can dictate much wider ranges of attributes and control the relationships between attributes, which increases the robustness of models estimated with SP data over models estimated with RP data (Louviere, et al, 231). Including stated preference questions in a survey also increases the number of records in the survey dataset substantially, as revealed preference by definition is one response per respondent, while stated preference questions can return multiple responses per respondent, enhancing the dataset with the ability to test for individual tipping points (Louviere, et al, 24). In the NEC FUTURE survey, each respondent will answer 6 SP questions, in addition to the questions about their actual trip.
One way to improve the realism of the SP data responses is to use the pivot method (Train and Wilson, 192). This method specifically references the RP response in the SP questions to encourage the respondent to think about their response with a higher level of real world constraints, and assuring the alternatives are similar to what the respondent might actually experience. In the NEC FUTURE survey, the SP questions use the same origin and destinations as the RP response, one of the three modes being tested is the chosen mode from respondent’s own choice set, and the characteristics of these modes are based on the current service offerings for that market. This ensures that the modes being tested are viable options for the respondent and the stated response will more likely mimic what they would actually chose in a real-world situation.
Unlike other choice situations where stated preference data may have been used unsuccessfully, transportation mode choice is an exercise that respondents are familiar with as they consider the available options to make choices that satisfy their own individual travel requirements. We believe that the stated preference experiments in the survey mimic this process by providing respondents with similar information that they may find from published schedule or timetables, published fares or pricing, and their own experiences about travel by their chosen mode.
While models can be estimated with each type of data separately, the most robust models combine RP and SP data in order to take advantage of the unique characteristics of each type. The NEC FUTURE mode choice models will use a nested logit structure, to reflect the differential substitution that exists between different modes of travel. There will be three models in total for each trip type: journey to work, business, and other non-work/non-business trips.
The nested logit structure is preferable for mode choice modeling over MNL because of the Independence of Irrelevant Alternatives (IIA) property inherent in the MNL model. The IIA property is an issue because any changes or additions to the alternatives cause a proportional change to the probabilities of all other alternatives. In other words, there is no differentiation among choices to account for similarities between modes and the potentially higher propensity for respondents to switch to a similar alternative. The nested logit model, on the other hand, allows for grouping similar alternatives, so that they are more competitive within the nest versus than with other alternatives outside of the nest.
Ultimately, the survey data itself will dictate the nested logit structure used in the final mode choice model, but examples of possible nested logit structures to be tested in the model estimation are shown in Figure A.2. These nesting structures show that common carrier modes have similarities to each other while auto is quite different in very important ways such as accommodating larger group sizes with minimal additional cost, flexible departure times, and flexible stopping patterns for multi-destination trips. The appropriate nesting structure for the rail alternatives will be an important consideration and will be data driven.
FIGURE A.2 CANDIDATE MODE CHOICE NESTED LOGIT STRUCTURES
The utility equations for the nested logit structure for the RP data all follow the same formulation, which is shown below with the anticipated variables to be tested.
The formulation of each mode’s probability is dependent on its location in the nesting structure. The following formulas show the probabilities for the first nesting structure in Figure A.2.
Final Probabilities – RP Model:
Conditional Probabilities, Logsums, and Logsum Parameters – RP Model:
The combined RP-SP model can be structured similarly to any model structure. For example, a nested logit structure can be used with either RP or SP data alone as shown in Figure A.2. Because of the differences between the two types of data though, this model structure has to be modified to combine RP and SP data. The use of a scaling factor applied to the SP data allows for the combined estimation of the choice model, to account for greater uncertainty and possible biases in the SP data. (Ben-Akiva, et al, 339). Figure A.3 shows a revised structure which incorporates the scaling factor between the RP and SP models.
Because each respondent provides multiple SP responses, the SP questions are typically weighted at a lower value than the RP questions (Ben-Akiva, et al, 345). Judgmental approaches can use weights which can range from equal weight between each SP and RP question or equal weight between the set of SP questions and each RP question. Because the SP questions offer a more limited choice set that the RP, the SP question weights will probably not be set at 1/6 of the revealed preference, but will still be much lower per question than the RP. In addition to the RP/SP weighting, each record will also receive the respondent survey weight which adjusts for any over/under-sampling of specific subgroups.
The choices present in the RP nest versus the SP nest can differ based on the choices presented to the respondents (i.e. a new mode would only be present in the SP data nest). The scaling factor can be found in the SP data nest, which is estimated during the model estimation process. Other model parameters can be constrained to be equal between the RP and SP data.
For the NEC FUTURE model, it is anticipated that some initial testing will be done using the RP data alone, which can utilize the nesting structure shown in Figure A.2. Most estimation testing though, including the final model estimation runs, will use the richer SP data and combined RP-SP data, which requires modifications to the model structure to accommodate the differences between RP and SP data.
FIGURE A.3 EXAMPLE NESTING STRUCTURE WITH SCALING FACTOR
The utility equations for the RP modes do not change for the combined RP-SP model. In addition to the original utility equations, there is now a separate equation for each SP mode, with the only difference being an additional alternative-specific constant. The other coefficients are the same and are thus jointly estimated from the RP and SP. This can be seen in the equations below.
Utility Equations – RP-SP model
The probabilities for the RP modes are all the same as previously described for the previous structure, with the addition of the SP nest term in the denominator. The SP probabilities all incorporate the scaling factor into the probability equations, with the new terms shown below.
Final Probabilities – RP-SP Model:
As described in Whitehead, et al (37), there are two tests which can be done to test the predictive validity of the combined RP-SP model. These are the within sample test (testing how well the model predicts behavior of respondents in the model) and the out-of-sample test (testing how well the model predicts behavior of individuals outside of the model). Whitehead, et al (38) finds that in the literature jointly estimated RP-SP models have much greater predictive validity than independently estimated RP or SP models with the within sample test, and that the out-of-sample test tends to show similar results among the three model types.
As described above, the proposed model structure for the NEC mode choice model is the Nested Logit (NL) model. If during model estimation, there is difficult getting nesting parameters or convergence for the models, or if key parameters do not have statistically significant coefficients, it may be necessary to investigate alternative model forms. Additional flexibility can be obtained in formulated discrete choice models relative to the MNL or NL models. Three of these are:
The Paired Combinatorial Logit (PCL) model, a two level model that includes nests equal to the number of pairs of alternatives. Each nest includes a distinct pair of two alternatives; each alternative appears in the number of nests equal to the number of alternatives less one. In addition to the utility function parameters, these models require estimation of a nesting parameter for each pair. The alternatives are equally proportionally assigned to each nest in which they appear (i.e. they are equally similar to each alternative they share a nest with). [F.S. Koppelman and C-H Wen, The Paired Combinatorial Logit Model: Properties, Estimation and Application, Transportation Research-B, V.34, N.2, 2000, pp.75-89.]
The Cross Nested Logit Model (CNL) is a model derived from the Generalized Extreme Value (GEV) model, like the others discussed here, which allows alternatives to belong to multiple nests. The CNL allows different proportions of each alternative to be assigned to nests, relaxing the constraint of the PCL, but each nest has the same structural parameter, unlike the PCL model. [P. Vovshva, Application of Cross-Nested Logit Model to Mode Choice in Tel Aviv, Israel, Metropolitan Area, Transportation Research Record, V. 1607, 1997, pp.6-15.]
The Generalized Nested Logit (GNL) is also a two level model but it includes nests selected by the analyst, which combines the flexibilities of the PCL and CNL models. In addition to the utility function parameters, and the nesting parameter for each nest, a set of allocation parameters is estimated to represent the degree to which each alternative is allocated to each nest. This allows maximum competitive flexibility among pairs of alternatives. [C-H Wen and F. S. Koppelman, The Generalized Nested Logit Model, Transportation Research-B, v. 35, N. 7, 2001, pp. 627-641.] It includes all two level extreme value (logit type) models as special cases.
A further level of model flexibility can be obtained by adopting the Network Generalized Extreme Value Model (NGEV) [A. Daly and M. Bierlaire. A general and operational representation of generalized extreme value models. Transportation Research Part B: Methodological, 40:285–305, 2006] which is a multi-level extension of the Generalized Nested Logit Model Structure. The use of this model has been limited, to date, as its complexity may overwhelm its advantages. Newman (2008) identified and demonstrated the need for constraints to ensure that the model is identifiable. [J. P. Newman. Normalization of network generalized extreme value models. Transportation Research Part B: Methodological, 42(10):958–969, 2008]. The constraint imposed is arbitrary; that is, the choice of which constraint to impose does not change the goodness of fit of the model; but the nesting interpretation differs depending on the constraint adopted.
Additional model flexibility in any of the above models can be obtained by combining the concept of the Mixed Logit Model [D. McFadden and K. Train, Mixed MNL Models for Discrete Response, Journal of Applied Econometrics, 15:447-470 (2000)] with any Logit Model (MNL, NL, PCL, GNL, and NGEV). The concept of the mixed logit model is to add random parameters to the already structured model. While originally developed for use with the multinomial logit model, it can be used with any member of the logit family or Generalized Extreme Value model. There are four different ways in which random parameters can be used. The first two contribute to an improved estimate of the behavior under study; the second two can be used to offset problems associated with the use of SP data based models or combined SP and RP data based models. The four methods, all of which are relevant to the study of intercity travel, are:
The first use of mixed logit it to take account of variable values of a level of service value. E.g., in the case of variability of the value of travel time, the equation
V_1t=⋯+β_ζ x_1tζ+⋯
V_2t=⋯+β_ζ x_2tζ+⋯
V_3t=⋯+β_ζ x_3tζ+⋯
can be modified to add a random value as follows,
V_1t=⋯+(〖β_ζ+δ_ζ)x〗_1tζ+⋯
V_2t=⋯+(〖β_ζ+δ_ζ)x〗_2tζ+⋯
V_3t=⋯+(〖β_ζ+δ_ζ)x〗_3tζ+⋯
This allows the parameter βtt to be increased by a draw from the random distribution of
δtt . Any distribution can be used but it is generally desirable to use a bounded distribution; this can be accomplished by truncating a distribution with an infinite tail of tails.
The second use of mixed logit is to take account of random preference bias for a particular alternative. This might be appropriate in the case when a new alternative is considered and it is expected that the people will have variable values of the new alternative. E.g., the constant for a new alternative can be modified from
V_1t=α_1t+⋯
V_2t=α_2t+⋯
V_3t=α_3t+⋯
to
V_1t=α_1t+⋯
V_2t=α_2t+γ_2t+⋯
V_3t=α_3t+⋯
in the case of alternative 2 being a new alternative.
The third use of mixed logit is to represent correlation between sets of common alternatives in a series of experiments presented to a single individual. It is possible that an individual’s preferences among alternatives are common for the series of experiments. This can be represented by introducing random parameters, added to the constant, for each alternative. For example, the bias parameter changes to
V_1t=α_1t+⋯
V_2t=α_2t+η_2t+⋯
V_3t=α_3t+η_3t+⋯
or
V_2t=α_2t+η_2t+⋯
V_3t=α_3t+η_3t+⋯
V_4t=α_4t+⋯
Finally, the fourth use of mixed logit is to represent the expected bias in favor of the chosen alternative in the reported trip by adding a common random variable to that alternative in every case in which the real chosen alternative is an alternative in the stated preference experiment. Thus, the basic equation
V_1t=α_1t+⋯
V_2t=α_2t+⋯
V_3t=α_3t+⋯
can be modified to
V_1t=α_1t+λ_1t+⋯
V_2t=α_2t+⋯
V_3t=α_3t+⋯
because alternative one was chosen in the RP case.
Regardless of the model structure used in estimation, the data requirements are the same, as they belong to the same family of models. The proposed NEC survey will provide adequate data to estimate any of the model structures discussed above.
Questions particularly relevant to mode choice of the respondent include those about the specific one-way trip (questions 1-24), which provide trip details to estimate revealed preference models, and mode choice trade-off stated preference (SP) questions (questions 25-30), which provide data that can be used to estimate the model.
Model development does not end with model estimation, however. The estimated model will be implemented within an NEC model application package that includes procedures for processing input data and summarizing output results. Current input data will be applied to the model and outputs compared to actual observed current travel volumes to validate the model and confirm its accuracy. This will include reviewing not only overall NEC-wide results, but also individual markets to confirm that the model is addressing the broad range of different markets that exist in the NEC.
As may be necessary, the model will be further adjusted to better match the existing data. However, significant adjustments are not expected to be necessary because the model estimation was based in part on observed RP data addressing all modes. This is not always the case in other corridors throughout the US where, for example, rail service and travel volumes may not currently be significant.
The Travel Model Improvement Program (TMIP) has produced a Travel Model Validation and Reasonableness Checking Manual,3 which provides guidance on validation techniques. Disaggregate validation is typically performed on any models which are estimated (as opposed to asserted), such as the new NEC model. This is typically done by applying the estimated model using a data set with known choice results (such as a revealed-preference survey data set) and checking the results by one or more segmentation variables (income level, vehicle availability, and geographic segmentation or trip length segments). Ideally the application dataset should be independent of the model estimation data set, but this is typically difficult given the small sample sizes require all records to be used in estimation. Because of this, it the Manual does not give criteria or guidelines for disaggregate validation checks, but focuses instead on reasonableness (of the parameter estimates) and sensitivity testing as high priorities for validation.
For this project, disaggregate validation will be done by utilizing the survey dataset itself. Eighty percent of the clean dataset will initially be used for estimating the new model, with twenty percent of the survey dataset set aside for validation. Once the model is estimated, it will be applied on the twenty percent of remaining survey records, and checked against the actual choice from the survey record.
In addition to the disaggregate validation method of matching the survey records, aggregate checks will also be done on the new model. One of the key checks is to match the observed aggregate mode shares in key markets along the corridor, which will be determined from several available data sources, such as FAA and Amtrak ticket sales. In the combined RP-SP model estimation process, the model is scaled to the revealed preference dataset, so it should already adequately represent the observed shares in the estimation database.
Elasticities and sensitivity testing of key variables such as time and cost will also be done, to ensure that the estimated model parameters behave as expected and fall into acceptable ranges. The elasticities can be calculated using the following formula:
where:
Sensitivity testing will be done by making large changes in key variables and determining the impact on the mode share. For example, travel time by train will be doubled, and the impact on both train mode share as well as the other mode shares will be analyzed for acceptable behaviors. Cost is also a key variable for sensitivity testing.
In addition, the results of the model estimation will be explored at a market level and checked for reasonableness.
In contrast to existing available models, the new NEC model will provide NEC FUTURE with a demand forecasting tool that addresses the full range of the available modes of rail travel, including intercity and regional commuter services. Although Amtrak and regional agencies maintain forecasting models that collectively address the Northeast, they do not address the interaction of the intercity and regional commuter services. For example, Amtrak’s model focuses exclusively on intercity services that operate between regions, even though some such trips may be served by commuter rail. Further, most regional models ignore intercity service that may serve local markets too.
The new model is designed to address NEC travel markets served by future integrated operations and service planning alternatives under development in NEC FUTURE, encompassing both intercity and commuter services along the NEC, including shared markets (where intercity and commuter services operate in parallel) and linked markets (where intercity and commuter services connect to each other, with the commuter service essentially serving as a feeder/distributer from/to local markets not directly served by the intercity service). As described above, none of the existing models used by Amtrak and/or regional agencies within the NEC adequately addresses all of these potential markets and services, which go beyond the existing and historical institutional and geographic limitations of Amtrak and commuter rail services.
Because it will address a broad range of modes and geography throughout the Northeast, the development of a new NEC model will require new surveys designed to address these dimensions. These new surveys include questions addressing existing travel by intercity and regional commuter modes of travel between and within the Northeast (see footnote 1). All of the existing available survey data is tied to specific existing models developed by Amtrak and regional agencies that address more limited geography and/or modes of travel. For example, Amtrak’s survey data focuses exclusively on intercity travel modes and survey data collected by regional agencies is limited to a specific region and does not address intercity modes. Although they collectively address all of the major NEC markets, these existing data and models do not provide a consistent integrated analysis and forecasting basis that spans all geographies and modes throughout the NEC.
Data Collection for Model
The original design of data collection efforts called for a two-phase survey approach. The recruit survey was conducted by telephone via computer-assisted telephone interviewing (CATI) using a dual frame samples with both landlines and cell phones. The follow-up survey was conducted mostly via self-administration by respondents on the Internet. Respondents without Internet access were able to complete the follow-up survey by viewing a mailed packet of survey visuals and then providing answers to follow-up questions via a telephone interview.
To test general operational and content issues with the survey, a pilot effort was initiated to obtain 307 completed surveys. While the pilot results showed that the survey was able to obtain the necessary information for modeling, the cumulative 2-phase response rate of 4% (9% in recruitment and 49% in the follow-up) was lower than expected. As such, the data collection approach was reconsidered and revised as follows:
Changed from a two-phase survey to a one-phase survey;
Shortened survey from an average length of 22 minutes in the pilot to an estimated 18 minutes.
Increased incentives from $5 to $10.
Increased number of attempts per sample from 5 to 10.
With these changes in place, the response rate is expected to increase to 12-15%. This represents a significant increase from the 4% two-phase rate obtained in the pilot.
The questionnaire is titled “Survey of Northeast Regional and Intercity Household Travel Attitudes and Behavior.” and is included in this ICR package. It is estimated that the survey will take about 18 minutes to complete.
The following text explains in further detail what information the questionnaire collects and how it will be used.
Survey Questions: Screener
Question C.1 allows for cell phone respondents to be called later at a better, safer time.
Questions S.1- S.1b identify a random member of the household to participate in the survey.
Question S.1c asks cell phone users for their home area (this is already known for land lines).
Questions S.2A-S.2E ask if the respondent’s regular commute trip is to an eligible out-of-state location and, if so, how many times in a typical week they make the trip, by mode. Eligible areas exclude the respondent’s home area, other nearby areas (typically less than 50 miles away from the home), and areas where the trip would be entirely outside of the NEC.
Questions S.3A-S.3E ask if the respondent has taken any non-commute trips to eligible out-of-state locations and, if so, how many times in the past 12 months by mode and trip purpose. Again, eligible areas exclude the respondent’s home area, other nearby areas (typically less than 50 miles away from the home), and areas where the trip would be entirely outside of the NEC.
If no trips were found in the above series of questions, the survey skips to collect demographic information in Questions D-1 through D-12. In this case, the respondent is not counted as a completed survey.
Next, the survey randomly selects a specific mode and trip purpose for those identified above. This is the “reference trip” for the respondent. Then, Question S.4 asks if a round trip was taken and, if so a specific direction is randomly selected for the reference trip.
–Survey Questions: Main Questionnaire
Question 1 asks for the type of train service that was used (if respondent’s “reference trip” was by train). This is important since we need to distinguish among different types of train service in the model and the survey. Questions 4A, 5A, and 6A apply only if respondent’s “reference trip” was by train and ask for the rail station used to board the train, the mode of access used to get from the origin to the station, and the time spent at the station prior to boarding the train. Again, this provides more detailed information necessary for a more precise definition of the trip and its key characteristics and will be used in estimation of the RP portion of the model.
Questions 4B, 5B, and 6B apply only if respondent’s “reference trip” was by plane and ask for the airport used to board the plane, the mode of access used to get from the origin to the airport, and the time spent at the airport prior to boarding the plane. Again, this provides more detailed information necessary for a more precise definition of the trip and its key characteristics and will be used in estimation of the RP portion of the model.
Question 5C applies only if respondent’s “reference trip” was by bus and asks for the mode of access used to get from the origin to the bus. Again, this provides more detailed information necessary for a more precise definition of the trip and will be used in estimation of the RP portion of the model. However, given that there are many bus stops, and many are not at formal terminals like airports and rail stations, we do not attempt to ask about specific stops (thus, there is no Question 4C or 6C).
Questions 9A, 10A, 11A, 12A, and 13A apply only if respondent’s “reference trip” was by train and ask for the rail station used to get off the train, the mode of access used to get from the station to the destination, if a connection from one train to another was required, and information about the fare
paid by the respondent for the train. This self-reported fare will be used as the base fare value for use in SP choice exercises. If respondent does not remember the fare paid, or if the value they provide is unreasonable, default values will be used based on published train fares for travel between the origin place and destination.
Questions 9B, 10B, 11B, 12B, and 13B apply only if respondent’s “reference trip” was by plane and ask for the airport used to get off the plane, the mode of access used to get from the airport to the destination, if a connection from one plane to another was required, and information about the fare paid by the respondent for the plane. This self-reported fare will be used as the base fare value for use in SP choice exercises. If respondent does not remember the fare paid, or if the value they provide is unreasonable, default values will be used based on published air fares for travel between the origin place and destination. Since plane fare information is collected in the screener for phone follow-up respondents, 12B and 13B are skipped in the phone follow-up.
Questions 10C, 11C, and 12C apply only if respondent’s “reference trip” was by bus and ask for the mode of access used to get from the bus to the destination, if a connection from one bus to another was required, and information about the fare paid by the respondent for the bus. As for the origin end, given that there are many bus stops, and many are not at formal terminals like airports and rail stations, we do not attempt to ask about specific stops (thus, there is no Question 9C). This self-reported fare will be used as the base fare value for use in SP choice exercises. If respondent does not remember the fare paid, or if the value they provide is unreasonable, default values will be used based on published bus fares for travel between the origin place and destination.
In the pilot survey phase, most self-reported rail, air and bus fares by respondents were reasonable when compared to published fares.
Questions 14A, 14B, 14C and 14D apply only if respondent’s “reference trip” was by passenger car/truck/van and ask for the estimated one-way travel time, from a range of network times, and the estimated cost for tolls, parking, and fuel. This self-reported cost estimate will be used as the base travel cost value for use in SP choice exercises for internet follow-up respondents. To the extent that the respondent does not remember these costs or if they are unreasonable, default values will be used based on published tolls likely to be encountered along the highway network between the origin and destination and per-mile costs applied to the trip distance along the highway network.
Question 15 asks for the overall purpose of the respondent’s trip. Trip purpose is one of the most important trip characteristics defining different market segments.
Questions 16 through 19 ask if the trip was made alone or with others and, if it was not alone, collect information on the composition of the group. Mode choice is often impacted by group size and composition. Group size and composition also impact travel costs.
Questions 24 through 24B ask if the respondent made a round trip and, if so, the overall duration of the trip (expressed as nights away from home). The duration of the trip can also have important implications with respect to scheduling and pricing. Respondents making day trips or single overnight trips tend to have less schedule flexibility. Trip duration can also impact costs.
Questions 25 through 30 are the 6 Stated Preference (SP) choice exercises that really represent the “core” of the survey and provide the primary basis for estimating the new mode choice model. These SP questions ask respondents to choose from among three modes of travel, each with specific characteristics. These modal characteristic vary across the questions, with values developed from an experimental design that minimizes correlations among variables which can be problematic when seeking to estimate sensitivities to variables independently. Supporting Statement B provides details on the experimental design.
Questions D-1 through D-12 collect demographic information from respondents who completed the entire survey and also from those who live in the study region but did not qualify based on their trip-making behavior. This information serves several purposes. It provides a demographic profile of the respondents which can be compared to other information to confirm the survey sample is representative of the NEC. Some of these specific question responses, like income, are used to account for differences in value of time and are included as modifiers of the trip cost to specifically account for the differential sensitivity of travelers’ income levels to travel costs in the estimated model. This information also provides a demographic profile of those without qualifying trips, which will be compared to demographic profile of respondents and the NEC.
3. Describe whether, and to what extent, the collection of information involves the use of automated, electronic, mechanical or other technological collection techniques or other information technology. Also describe any consideration of using information technology to reduce burden.
The data will be collected electronically through the use of Computer Assisted Telephone Interviewing (CATI). The CATI system allows a computer to perform a number of functions prone to error when done manually by interviewers, including:
Providing correct question sequence;
Automatically executing skip patterns based on prior answers to questions (which decreases overall interview time and consequently the burden on respondents);
Recalling answers to prior questions and displaying the information in the text of later questions;
Providing random rotation of specified questions or response categories (to avoid bias);
Ensuring that questions cannot be skipped; and
Rejecting invalid responses or data entries.
The CATI system lists questions and corresponding response categories automatically on the screen, eliminating the need for interviewers to track skip patterns and flip pages. Moreover, the interviewers enter responses directly from their keyboards, and the information is automatically recorded in the computer’s memory.
CATI systems typically include safeguards to reduce interviewer error in direct key entry of survey responses. CATI also allows the computer to perform a number of critical assurance routines that are monitored by survey supervisors, including tracking average interview length, refusal rate, and termination rate by interviewer; and performing consistency checks for inappropriate combination of answers.
Presently there is no information on consumer preference available regarding the feasibility of High Speed Rail in the Northeast Corridor. To FRA’s knowledge, data do not exist anywhere dealing with the specific new rail service options that are being explored as alternatives for the NEC FUTURE project and certainly no data which will offer the degree of specificity which will be obtained from a data collection effort of this size and scope. A sample of up to 12,500 respondents will allow the FRA to analyze the data by subgroups and regions which will not be possible using smaller sample sizes.
The collection of information involves randomly selected individuals, not small businesses.
6. Describe the consequence to Federal program or policy activities if the collection is not conducted or is conducted less frequently, as well as any technical or legal obstacles to reducing burden.
Without this information from travel surveys, the new NEC travel demand forecasting model cannot be developed as described above. The new model has been designed to address the needs of NEC FUTURE by providing a basis forecasting response to a broad range of intercity and regional modes of travel throughout the Northeast. The survey program has been designed specifically to support the development of a new NEC forecasting model. Without this new model, there will be inadequate travel demand forecasting capabilities to fully address the range of future intercity and commuter services to be evaluated as part of the NEC FUTURE Program. As described above, none of the existing models address the availability of both intercity and commuter services, which do compete with each other in some NEC markets. Instead, these transportation modes are currently addressed by separate models maintained by Amtrak and the regional agencies respectively.
No special circumstances require the collection to be conducted in a manner inconsistent with the guidelines in 5 CFR 1320.6.
As required by the Paperwork Reduction Act of 1995, FRA published a 60-day notice in the Federal Register on September 20, 12, soliciting comments on the particular collection of information. See 76 FR 7116. FRA received no comments from the railroad industry, the general public, or any other interested party regarding this information collection.
On November 21, 2012, FRA published in the Federal Register a 30-day Notice Regarding Collection Information from the Public to Determine Current Intercity and Regional Behavior of Northeast Resident. See 77 FR 225.
Respondents will receive a $10 check for their participation in the survey. The $10 check is mentioned during the introduction and awarded after the respondent completes the survey. The $10 is a token of appreciation for the respondents’ effort and will help maximize response rate. The pilot phase study offered a $5 incentive for a two-phase survey effort. Given that the survey has been reduced in total length to 18 minutes and can now completed as a one-phase study, we believe an increased incentive of a $10 check should be sufficient. Up to 500 respondents who did not respond to the survey will be targeted for a non-response follow-up (NRFU) survey. These respondents will receive a $20 check as token of appreciation for their time and to help maximize the NRFU survey.
In the survey’s introduction, respondents are informed that participation is voluntary, and their answers will be kept private and will be used only for statistical purposes. The only personal information which will be collected will be name, and address, so the $10 check can be mailed to the responding household. Name and address, along with phone number will be stripped from the data file that the FRA will receive.
The survey does not contain any questions related to matters that are commonly considered sensitive or private. The survey questions are directed at consumer preference for traveling in the Northeast Corridor.
Data collection will involve a survey of 12,500 respondents. The survey is expected to take 18 minutes to complete. The total estimated burden is, therefore, 3750 hours. Additionally, there will be a 5 minute non-response follow-up (NRFU) survey with a sub-sample (n=500) of households that were refusals (not screened), qualified refusals or qualified callbacks. In Table 2, below, we show the maximum expected number of responses and calculate the total burden hours based on these assumptions.
TABLE 1
ESTIMATED BURDEN HOURS
Phase |
Minutes |
Respondents |
Burden Hours |
Survey (CATI) |
18 min. |
12,500 |
3,750 |
Non-Response Follow-Up (CATI) |
5 min. |
500 |
42 |
Total |
|
|
3,792 |
There are no record keeping or reporting costs to respondents. Respondents will be contacted randomly, and asked questions about their recent travel as well as preferences for travel throughout the Northeast Corridor. All responses are provided at the time of the survey; no prior preparation is needed. Respondents do not incur: (a) capital and startup costs, or (b) operation, maintenance, and purchase costs as a result of participating in the survey.
The estimated cost to the government for conducting the survey is as follows:
Number of completed interviews 12,500
main surveys/500 non-response follow-up surveys
Total estimated cost of conducting survey $1,680M
Cost per completed interview $134.40
This estimate is based on the total cost of the updated survey contract divided by the specified number of completed survey interviews. Costs of conducting the survey will be concentrated within a one year period, making the annual cost to the government the full $1,680M.
The table below presents the estimated cost for federal oversight of this project while it is in the field.
Position |
Grade/Step |
Cost per Hour (Pay and Benefits) |
Hours |
Cost |
Project Manager |
15/6 |
$ 110 |
8 |
$ 880 |
Deputy Project Manager |
13/6 |
$ 75 |
16 |
$ 1,200 |
Technical Reviewer |
13/6 |
$ 75 |
40 |
$ 3,000 |
Total |
|
|
|
$ 5,080 |
Thus the total annual cost to the government including both conducting the survey and federal oversight will be $1,680M.
This is a revision to a previously approved collection to conduct a pilot survey under this collection. As such, it requires a program change to add the estimated 3,750 hours for the new information collection to existing burden.
Weighted frequencies will be computed for each of the questions in the survey. Cross-tabular analyses of the survey data by population subgroups and key analytical variables will also be conducted. The key analysis activity, the model development itself, will rely on a maximum value estimation procedure for estimating parameter values from the survey data. Simply stated, a maximum likelihood estimator is the value of the parameters, on the independent variables, for which the observed choice, the dependent variable, is most likely to have occurred. Several statistical tests are used to evaluate the significance of the estimators, including “t” tests of individual parameter significance and likelihood ratio tests used to evaluate a set of parameters.
Findings will be disseminated through internal briefings to FRA managers who must make strategic planning decisions, as well as through printed technical reports distributed to stakeholders at the national, State and local levels.
FRA will display the expiration date for OMB approval on the Web survey instrument and the hard copy survey instrument. The interviewer will provide the OMB number, if requested by the respondent, during Phase I data collection.
No exceptions to the certification are made.
1 The Northeast includes all MSAs along the existing rail transportation corridor linking Washington, Baltimore, Philadelphia, Newark, New York, New Haven, Providence, and Boston – known as the NEC spine – as well as MSAs that can be reached by train directly or via a single transfer to connecting corridors from the NEC spine, including Richmond, Harrisburg, Albany, and Springfield (see Supporting Statement B Table B.1 for a detailed listing). This definition of the relevant market for NEC service was developed by the project team based on the NEC FUTURE alternatives under consideration.
2 Field work for a new data collection sponsored by the Northeast Corridor Commission (NECC) has recently been completed. This data collection is supporting the development of estimates of the current number of intercity trips taken by automobile in the northeast and will contain the necessary details of those trips.
3http://media.tmiponline.org/clearinghouse/FHWA-HEP-10-042/FHWA-HEP-10-042.pdf
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
File Modified | 0000-00-00 |
File Created | 2021-01-27 |