Applying the ACSI Technology to the Management of Government Services

CFI_ACSI Methodology in Mgmt of Govt Svcs Merz paper.pdf

Customer Satisfaction Measure of Government Websites

Applying the ACSI Technology to the Management of Government Services

OMB: 1505-0186

Document [pdf]

Download: pdf | pdf

Applying the American Customer
Satisfaction Index (ACSI)
Technology to the Management of
Government Services:
Rationale, Rigor and Results
By Russ Merz, Ph.D.
Research Consultant, CFI Group

Applying the American Customer satisfaction Index (ACSI) Technology to the
Management of Government Services:
Rationale, Rigor and Results

Introduction
Background: In the private sector the American Customer Satisfaction Index (ACSI) has a proven relationship with
1
2 3
4
5
6
customer spending , shareholder value , cash flows , business performance and GDP growth . The technology
upon which it is based is backed by over 70 years of rigorous scientific inquiry in the fields of consumer psychology
7
and psychometrics, coupled with advanced analytic techniques from statistics, econometrics, and chemometrics .
While applicability of the ACSI technology to the management of commercial product and service companies has
been repeatedly demonstrated in the literature8,9, this paper addresses how the management of government services
can also benefit from its unique analytic power.
High performance private companies in the competitive marketplace rely on frequent feedback measures that tell
them whether they are “winning or losing.” Corporations have the bottom line of profit or loss, but they also measure
many other aspects of their performance such as customer satisfaction and retention. In fact, Tom Peters and Bob
Waterman, the authors of In Search of Excellence, characterized top companies as “measurement-happy and
performance-oriented.”10 The best companies refuse to fly in the dark or drive by looking in the rearview mirror alone.
All organizations with a customer (or citizen) orientation recognize that performance measures based on customer
satisfaction not only tell them where they have been, but more importantly, what to expect in the future. Without that
11
ability they cannot control their destiny—others will .
Unlike commercial product and service companies, government agencies are not providing services in a turbulent
12
competitive marketplace. With some exceptions , federal law mandates the services provided by government
agencies, and they are typically the sole provider (i.e., a monopoly) of those services. This means that the citizen or
business users of those services have few or no alternatives—e.g., taxpayers must interact with the IRS. Obviously
users of government services want their needs met in the most effective, efficient and caring way possible. If they are
not, the user cannot simply switch providers.

Claes Fornell and Roland Rust, “The effect of customer satisfaction on consumer spending growth,” under review, 2005.
Claes Fornell, Sunil Mithas, Forrest Morgeson, and M. S. Krishnan, ”Customer Satisfaction and Stock Prices: High Returns, Low
Risk” under review, 2005.
3
Eugene Anderson, Claes Fornell and Sanal Maznancheryl (2004) ”Customer satisfaction and Shareholder Value,” Journal of
Marketing,(October) Vol. 68, no.4, 172.
4
Gruca, Thomas S., and Lopo L. Rego (2005) “Customer Satisfaction, Cash Flow, and Shareholder Value,” Journal of
Marketing,(July) Vol.69, 115-130.
5
Morgan, Neil and Lopo Rego (forthcoming 2006) “The Value of Different Customer satisfaction and Loyalty Metrics in Predicting
Business Performance,” Marketing Science.
6
Claes Fornell, Paul Damien, Marcin Kacperczyk, and Michel Wedel, “The Empirical Relationship between Buyer Satisfaction and
GDP Growth under Parameter and Distributional Uncertainty,” under review, 2004.
7
The main difference between econometrics and chemometrics is that while both are focused on prediction, chemometric methods
do a superior job of identifying and separating components from the underlying “noise” in a measurement system. The ACSI
technology utilizes a type of chemometric analysis to extract meaning from the inter-correlations between predictors in structural
equation models. See Svante Wold’s article “Chemometrics: what do we mean with it, and what do we want from it?” in
Chemometrics and Intelligent Laboratory Systems, 30 (1995) 109-115, for more details.
8
Fornell, Claes, Michael D. Johnson, Eugene W. Anderson, Jaesung Cha and Barbara Everitt Bryant, (1996), "The American
Customer Satisfaction Index: Nature, Purpose and Findings," Journal of Marketing, Vol. 60, October, 7-18.
9
Anderson, Eugene W., Claes Fornell and Roland T. Rust (1997), "Customer Satisfaction, Productivity and Profitability: Differences
Between Goods and Services," Marketing Science, Vol. 16, No. 2, 129-145, Summer.
10
Thomas J. Peters and Robert H. Waterman, Jr., In Search of Excellence: Lessons from America’s Best-Run Companies (New
York: Harper & Row, 1982), p. 240.
11
th
Best, Roger, Market-Based Management: Strategies for Growing Customer Value and Profitability, 4 Edition, (2005) PrenticeHall, chapter 1 and 2.
12
For instance the GSA, the Federal government’s largest purchaser and provider of products and services, has increasing
competition from both private sector purveyors of products and services as well as other government agencies.
2

The Government Performance and Results Act (GPRA) of 1993 (P.L. 103-62) mandates that government agencies
institute performance measurement programs to help managers improve the delivery of services to citizens. Effective
implementation of the GPRA’s requirements relies upon measurement systems that provide internal and external
feedback to managers in a way that helps them make improvements. One key component is the importance of citizen
and Federal agency customer satisfaction measurement. Managers faced with constrained budgetary resources and
charged with making the service quality measurements mandated in the GPRA require analytic tools with the power
to:
•
•
•
•
•

Handle the multiple, often conflicting, objectives confronting decision-makers responsible for building
relationships with citizens and Federal agency customers,
Sort out the best, most cost effective courses of action that can be taken to improve service delivery and
increase citizen satisfaction,
Provide prescriptive guidance for the most effective marginal allocation of resources,
Predict the effects of program and policy changes, and ultimately,
Ensure that citizens and Federal customers can feel confident about relying on the government to meet their
needs.

The ACSI Technology is the best method available for meeting these multiple requirements. In a world where there
are many competing measurement methodologies a manager can choose, the ACSI technology is superior to the
diagnostic and prescriptive tools offered by other firms. It is also much more than a single statistical method – it is a
system for assessing the current state and future behaviors of the citizen/customer base and for directing the most
effective allocation of resources budgeted for the management of that base.
Purpose: The purpose of this paper is to provide the reader with an overview of how the ACSI technology as
delivered by CFI Group meets the performance measurement requirements of the GPRA and improves the
management and delivery of government services to citizens and customers. This is accomplished in four parts:
•

•

•
•

The paper begins with a discussion of the rationale behind the need for citizen-centered performance
measurements. This is done by briefly reviewing the GPRA mandate for a citizen-centered performance
measurement approach; the kinds of objectives performance measurement programs can help government
organizations achieve; and the characteristics of quality performance measurement data.
Next the paper describes the methodology used by CFI Group to harness the power of the ACSI technology.
This section focuses on highlighting the critical elements of the methodology that provides highly accurate
measurement coupled with sensitive diagnostic and powerful prognostic capability. This section is
accompanied by a comparison of the ACSI technology base with some of the more common alternative
methodologies offered by competing firms. The material is conceptual in nature and does not require an indepth knowledge of statistics to understand.
Finally the paper concludes by summarizing the benefits that government users of the technology can
realize. This is supported by a compendium of case studies that illustrate many of the points made
throughout the paper.
The paper also includes detailed technical appendices that provide in-depth discussions of the ACSI
technology as it is implemented by CFI Group.

Rationale—Why is the ACSI Technology Needed by Government Agencies?
Legal Requirement: In August 1993, Congress passed the Government Performance and Results Act (GPRA). Under
GPRA, leadership in the public sector was legally obligated to address issues such as performance planning and
management—as well as report on the results of those efforts. Many felt, erroneously as it turns out, that government
management was "different," that the rules of performance management and measurement that applied to the private
sector could not apply to the public. After all, government agencies don’t have a bottom line or profit margin.
Recent efforts have shown, however, that not only do the basic concepts apply to the public sector; they can also be
used to create a successful organization. For example, agencies may not have a financial bottom line, but they do
have goals and outcomes that can indicate success (e.g., reduction in pollution)13.
Other concepts apply as well, as was borne out by Executive Order 12862, signed by President Clinton in September
1993. This order requires federal agencies to determine from their customers the kind and quality of service they
seek. In the same way that the private sector experienced noticeable changes by measuring beyond business
results, government agencies have also begun to balance a greater constellation of measures by incorporating
customer needs and expectations into their strategic planning processes. This balanced approach to performance
planning, measurement, and management is helping government agencies achieve results Americans—
14
whether customers, stakeholders, employees, or other—actually care about.
Implementation: The U.S. General Accounting Office has provided guidelines to federal agencies for implementing
the GPRA.15 The recommended steps fall into three key steps along with a number of critical supporting practices
within each.

A number of the practices embodied in the GAO’s guidelines are critical to ensure that a useful performance
measurement system is implemented. In particular practices 4 and 6 if not executed well can result in a less than
adequate measurement system, especially where citizens are concerned. These two practices align with activities

13 Many agencies do have a bottom line in the sense that, if they are industrially funded, i.e., aren’t funded through annual
congressional appropriations, they have to recoup their cost of operations somehow. For instance, in the GSA’s case, it’s via sales.
Even if they are appropriation funded, they still will have “bottom line” targets for cost control.
14
National Partnership for Reinventing Government, Balancing Measures: Best Practices in Performance Management
August 1999.
15
U.S. General Accounting Office, Executive Guide: Effectively Implementing the Government Performance and Results Act
(Washington, D.C. 1996).

that lie at the core of what the ACSI technology is based on—valid and reliable measures, embedded in a web of
interrelated cause and effect paths, constructed in a manner that reflects management objectives. If Government
managers charged with GPRA implementation are using measurement systems without these features, then it is
dubious that they have a clear picture of how well they are meeting the needs of the American public.
Information Source: The value of ACSI based performance measurement to public sector organizations lies in its
usefulness as an information source for management and policy decisions and in its significance as a tool of
accountability. In general performance measurement methods such as customer/ citizen satisfaction (CS) programs
can be used as management information sources in eight key areas:
Accountability: Well-designed CS performance measures document progress towards achievement of goals
and objectives thereby facilitating government fulfillment of their accountability obligations to their citizens,
clients, elected officials, etc.
Strategic Planning: CS measurement supports strategic planning and goal setting by gauging progress towards
established goals (i.e. citizen satisfaction levels). Many observers feel that without such a mechanism to “hold
government’s feet to the fire,” it is unlikely that a mere plan will lead to meaningful change.
Program Management and Service Quality: Once CS measures have been agreed upon, progressive
organizations may choose to give managers greater flexibility in determining how to achieve the desired results.
Expanded operational authority enables them to respond more rapidly to changing conditions and needs while
still ensuring accountability. Not only can CS performance measures identify problem areas that need attention,
but they can also bring to light approaches that are working particularly well and which might warrant replication
in other settings.
Budgeting and Resource Allocation: The use of CS performance measurement in the budget process links
financial costs to program results. This leaves policymakers better prepared to assign priorities, expand or
reduce programs, and more accurately assess the costs of achieving desired results.
Contract Monitoring: As governments increasingly contract out the provision of services to private vendors or
other governments or nonprofit agencies, CS performance measurement becomes a critical tool in controlling
risks and ensuring service quality. In short, contract monitors need performance measures to know whether or
not contractors are fulfilling their performance obligations.
Personnel Management: CS performance measures can increase employee motivation and provide an
objective means of assessing the achievement of group and/or individual targets. In fact, the establishment of
clear departmental expectations and goals alone can go a long way towards increasing the motivation of
managers and employees, many of whom otherwise see little direct connection between their efforts and any
long-term goals.
Interdepartmental Collaboration: By providing a clear direction for efforts in a particular functional area, CS
performance measurement can promote interdepartmental communication and collaboration.
Communication with the Public: Public reporting of performance measures can enhance citizens’
understanding and support of public programs. Moreover, a government that reports its own performance to
citizens, rather than totally relinquishing that task to the media, has far more control over the manner in which
16
information is disclosed and greater opportunity to describe its response to particular problems.
According to the Reason Public Policy Institute (RPPI), a critical element of government’s success is being receptive
and responsive to the needs and wants of citizens as measured by citizen satisfaction. Citizens, as the recipients of
government services, can best identify which areas of government are functioning well and which areas need
improvement. They can also be instrumental in identifying how best to improve quality and efficiency. To this end,
customer satisfaction measurement programs are valuable tools available to policymakers.
Citizens are demanding results—they want to know how their money is being spent, why it’s being spent that way,
and how much they’re getting for their money. Pressure has been thrust upon policymakers to continually strive for
better, more efficient service delivery. Strategic planning, performance-measurement, budgeting, and citizen
satisfaction surveys provide the framework for a government to be efficient, effective, and responsive to its citizenry.

Adapted from Paul Epstein, Using Performance Measurement in Local Government (New York: Van Nostrand Reinhold
Co.,1984), and U.S. General Accounting Office, Executive Guide: Effectively Implementing the Government Performance and
Results Act (Washington, D.C. 1996).

In order for performance-measurement systems to work, several different types of data need to be collected17.
In the absence of a single overriding metric such as earnings or shareholder value, governments and their citizens
need to look at five different types of data to get the total picture. The five main categories are:
Input indicators;
Output/Workload indicators;
Intermediate outcomes;
End outcome/Effectiveness indicators; and
Explanatory information.
An emphasis on end outcomes forces the organization to focus there first and, going backward, derive all means for
production or services from the desired result, as in the performance measurement model below.

Performance Measurement Model
Inputs
Outputs
Amount of resources
devoted to a program
activity.
e.g.,
Total operating
expenditures
Total full-time
equivalencies (FTEs)
Total capital
expenditures

Tabulation, calculation or
recording of activity or
effort, expressed in a
quantitative or qualitative
manner.
e.g.,
Total number of
responses
Number of education
programs/ participants

Intermediate
Outcomes
Direct Influences and
impact that the outputs of
an agency have on shortterm, leading indicators.

End
Outcomes
Assessment of the results
of a program activity
compared to its intended
purpose.

e.g.,

e.g.,
Average response
time
Cost per response
Time spent per
response

Perceptions of quality
Citizen overall
satisfaction
Likelihood to use
again

Adapted from: Reason Public Policy Institute (RPPI), Policy Report No.292
Measurement Quality: For performance measurement systems based on citizen satisfaction to work well, high quality
measures are needed. A review of the major writing on this topic suggests that quality performance measures include
characteristics such as:18
Meaningfulness: The measures are directly related to the organization’s mission and goals and provide
information that is valuable to both policy and program decision makers.
Comprehensiveness: The measures capture the most important aspects of an agency’s performance.
Valid and Reliable: The indicators measure what they purport to measure and they do so consistently,
exhibiting little variation due to subjectivity.
Understandable: Policymakers, practitioners, citizens, and other stakeholders easily understand the measures.
Timely: The measures can be compiled and distributed promptly enough to be of value to operating managers
or policymakers.
Resistant to undesired behavior: The development of a performance measure raises the profile of whatever is
being measured. A higher profile sometimes brings unintended consequences or even strategies designed to
“beat the system”-for instance, a focus on more highly educated clients if training programs are measured solely
on job placement rates, or overzealous traffic ticket-writing if the police department is measured by that activity
alone. The best sets of performance measures have little vulnerability to such actions because they have been
devised carefully and also because they typically include multiple measures that address performance from
several dimensions and thereby hold potentially perverse behavior in check.

Geoffrey Segal and Adam Summers, “Citizen Budget Reports: Improving Performance and Accountability in Governments,”
Reason Public Policy Institute (RPPI), Policy Report No.292 (March 2002).
18
The characteristics of good sets of performance measures identified in this section have been drawn from Hatry,1980; Bens,
1986; Hatry et al, 1992; and Ammons, 1996, and Geoffrey Segal and Adam Summers, “Citizen Budget Reports: Improving
Performance and Accountability in Governments,” Reason Public Policy Institute (RPPI), Policy Report No.292 (March 2002).

Non-redundant: The best sets of performance measures limit information overload by avoiding the use of any
two measures that focus on virtually the same aspect of performance. Each measure should contribute
something distinctive.
Sensitive to data collection costs: Although most dimensions of government performance can be measured
either directly or through proxies, data collection expenses for some indicators can occasionally reach levels that
exceed their value. Good sets of performance measures include the best choices from among practical
measurement options.
Focused on sphere of influence: Good sets of performance measures emphasize outcomes or facets of
performance that are influenced by policy initiatives or management action. At the same time, few measures are
completely under the control of a single agency or program. The inclusion of explanatory information in
performance reports is therefore of critical importance.
The ACSI technology as delivered by CFI Group meets the quality measurement tests outlined above. How this is
accomplished and why it is superior is the subject of the next section.

Rigor—What is the CFI Group Advantage?
The simple essence of the CFI Group’s implementation of the ACSI technology is measurement, diagnosis and
prognosis.
Building upon the knowledge developed from 70 years of social psychology research, CFI Group measures the three
levels of a customer’s thought process resulting from an experience with a product or service:
•
•
•

Perceptions of the performance delivered by the various facets of the product and/or service experience,
Overall attitudinal evaluation of the experience, and
Future behavioral intentions towards the product or service in question.

These measures are embedded in a diagnostic model of cause and effect linkages that helps quantify the measures
while at the same time empirically connects the three measurement levels; i.e., how do perceptions affect evaluation,
and how does evaluation affect future intentions. The linkages quantify the changes that are necessary at one level to
effect the greatest amount of change in the subsequent measurement level.

Perceived Performance

Attitudinal Evaluation

Future Intentions

“Service Delivery”

“Satisfaction”

“Use Service Again”

Finally, the diagnostic framework is then used to provide prognoses about how best to invest resources in programs,
practices and procedures that affect the perceived performance levels of products or services, and what can be
expected (in terms of evaluation and future intentions) as a result of the investments.
For commercial enterprises this powerful set of metrics, with their cause and effect linkages, gives a company an
unequaled ability to manage the economic or relationship value of its customer base by providing marginal resource
19
allocation guidance for product and service quality . Government agencies may have different outcomes as
objectives, but the same principles apply. In the following sections each of these elements (measurement, diagnosis
and prognosis) is described in detail.

The marginal resource allocation concept is sometimes called “derived” importance. It should be noted that in the cause and
effect measurement networks executed by CFI Group, all experience facets are fundamentally “important” to the customer/citizen.
However from a prognosis perspective the concern centers on how to achieve the greatest amount of change in a desired outcome
(e.g., satisfaction), so the issue is most efficient marginal allocation of resources—not the reallocation of resources. An efficient
allocation of resources is an allocation that satisfies the rule marginal benefit=marginal cost for each area of investment.

Measurement
Good measurement requires reliability, validity and sensitivity.
•

Reliability: Reliability is the quality of a measurement tool that allows it to obtain similar results over time
and across situations (this is also referred to as the internal consistency of a measure). It is the degree to
which measures are free from random error and therefore yield consistent results.
o Example: a rifle that is fired at a target the same way each time by the same rifleman should result
in the same pattern of hits each time it is fired. If it does, then the rifle is considered to be reliable. If
it doesn’t, then there may be a flaw in the construction of the rifle (the sights are loose) that
prevents it from being consistent.

•

Validity: Validity is the quality of a measurement tool to measure what we intend it to measure. In other
words, extending the rifle analogy, does the rifleman hit the bull’s-eye of the target? It is the degree to which
measures are free from measurement error and reveal the truth about an object or quality of an object.
o For example, in measuring “intention to buy”, if a question is not worded correctly there could be a
systematic bias to identify brands “I wish I could afford” rather than the brand usually purchased.

•

Sensitivity: The sensitivity of a measurement tool is important, particularly when changes in attitude, or
other hypothetical constructs, are under investigation. Sensitivity refers to the ability of an instrument to
identify variability in stimuli or responses over successive measurement occasions or between groups
(power to detect change).
o The sensitivity of a scale which is based on a single question or a single item can be increased by
adding additional questions or items.
o In other words, because index measures allow for a greater range of possible scores, they are
more sensitive than single-item scales.

Reliability and Validity

Reliability versus Validity: Reliability, although necessary for validity, is not in itself
sufficient. Target A illustrates low reliability (shots are ungrouped) and low validity (very
few hit the target—high error). Target C illustrates high reliability (tightly grouped) with
no validity (none hitting the intended target). Target B shows high reliability (tight
© CFI Group 2004
19
grouping)
and validity (most hitting the intended target—low error).

The ACSI technology implemented by CFI Group is based upon an advanced measurement and analysis system that
combines best practices from psychometric science with an advanced causal modeling algorithm that insures potent
levels of precision (validity combined with reliability) and power (sensitivity—ability to detect change).

What are the salient characteristics of the CFI Group system that make it superior to competitive measurement
approaches?
•

The use of “voice of the customer” (VOC) techniques to discover the true meaning of a customer’s
experience and convert the customer’s “voice” into survey questions20. VOC techniques are far superior to
alternative methods for developing questionnaires that rely upon judgment or experience of researchers.

•

Reduction of measurement error through the use of multiple measures of important experience factors and
satisfaction levels. It is a well documented scientific fact that the use of multiple item measures are far
superior to single items for capturing the underlying “truth” of customer experiences and satisfaction.
Multiple item measures are the best way to measure intangible psychological concepts such as performance
perceptions and attitudes, since a single measure has a very high probability of “missing the target.”

•

The derivation of optimal measure weights based on the cause and effect relationships between
experiences, evaluations and intentions for combining the multiple measures into a single index.

How the CFI Group Measurement System Realizes Precision and Power: The ACSI technology relies upon advanced
psychometric science as the basis for developing valid and reliable measures. Fundamentally the main focus in
measurement should be on insuring measure validity. While there are different types of validity the most important is
construct validity—i.e., does the measure actually measure what it purports to measure.
Construct validity is often violated by CFI Group competitors. For example, while there are a number of ways to
measure satisfaction, most firms make the mistake of treating satisfaction as a simple binary concept. Simple in the
sense that only one question is used; binary in the sense that customers are categorized as either satisfied or
dissatisfied (a so called “Top Box” approach) – often in percentage terms (e.g., we have 80% satisfied customers) or
frequency counts. This approach is flawed because it does not provide sufficiently valid information in a reliable
21
manner . This is because there is more measurement error in “Top Box” measures and a lower likelihood of
detecting a change in customer satisfaction. Given the low quality of the resulting metric it is not surprising that many
firms fail to find any relationship between quality and satisfaction and between satisfaction and profit.
As an illustration, compare satisfaction, as a concept, to intelligence. Both are “multidimensional” (i.e., they possess
many different aspects), and they are not directly observable (i.e., one cannot “see” intelligence or satisfaction by
observing somebody). Any attempt to measure intelligence by a simple question (are you dumb or smart?) is not
likely to yield useful information. It is not reasonable to think that one can assess a person’s intelligence by a single
question (or by a single test question). Likewise, it is not reasonable to assume that one can capture the concept of
satisfaction by a single overall question (what if the target is missed? There is no “perfect” measure.).
The same logic also applies to the many different experiences that customers have with products or services. Each
experience is multi-faceted. To get a “true” unbiased picture of what customers are experiencing requires a number of
questions (3 to 5 is usually sufficient) to triangulate on the essence or truth of the experience. This is essential to
have a valid measurement tool. As illustrated below, the more overlapped (and highly correlated) the individual
measures are, the more valid (or true) the resulting combined measure is likely to be—i.e., the greater the likelihood
22
of hitting the target .

Griffin, Abbie and John Hauser, “The Voice of the Customer,” Marketing Science, Winter, 1993, 12,1,1.
Binary or dichotomous measures (also known as nominal scales) have 2 to 3 times the amount of error around the estimated
population parameter (which is a proportion) than measures based on 10-point interval scaled measures (usually means) at the
same confidence level.
22
It is important to note that just because a measure uses multiple indicators does not ipso facto result in a “valid” measure. It
depends on how the indicators were developed. Questionnaire items that are based on the judgment or guess work of the
researcher may be completely unrelated to the concept being measured. The result will be a flawed multi-item measure that may
give reliable results—but completely “miss” the target. Only by using VOC qualitative methods can one be reasonably confident that
the customer measures are valid.
21

Decomposition of Observed Score

Observed Score = True Score + Measurement Error
66%
34%

TRUE
SCORE

Source: Institute for Social Research
© CFI Group

Clearly all measurement involves some degree of error. Ryan, Buzas and Ramaswamy (1995) found that the CFI
Group measurement system leads to an increase in precision (expressed as confidence intervals) over traditional
methods by 20-30%. This can lead to a direct reduction in sample size requirements on the average by 22% and still
obtain the same precision as conventional methods. Also, the explanatory power with respect to the consequences
of satisfaction (e.g., behavioral intentions) is 56% better than with conventional methods. This is a result of using
23,24
. The increase in measurement precision implies that smaller samples
multiple measures for overall satisfaction
can be used with the same measurement precision as traditional methods, which results in very high cost savings for
the client (or, alternatively, in higher precision with the same sample size).
Without enough measurement precision in the satisfaction index, the achievement of a performance outcome (such
as retention or repeat purchase) will suffer25. The reason is that lack of precision shows up as random variation in the
measure. As a result, it will be much more difficult to identify how satisfaction changes as management institutes
quality improvements. Overall, the importance of the gain in precision that the CFI Group system offers can hardly be
understated. In most cases, it would mean that the cost (to the client) of using CFI Group should be substantially
lower than using a system by anybody else. On the average, about 50% of the CFI Group cost of is data collection
and the size of the sample has a direct impact on precision.
The schematic below illustrates the relationships between precision, power and prediction error as a function of the
type of measurement used. For more details about the identification of the appropriate questionnaire items see the
VOC discussion in Appendix A. For an explanation about why 10-point scales are preferred in customer satisfaction
measurement programs see Appendix B.

Fornell, Rhee, and Yi “Direct Regression, Reverse Regression, and Covariance Structure Analysis,” Marketing Letters, 1991,
309-320.
24
Ryan, Michael J., Thomas Buzas and Venkatram Ramaswamy (1995), "Making Customer Satisfaction Measurement a Power
Tool," Marketing Research, Vol. 7, No. 3, Summer, 11-16.
25
Hauser, John R, Simester, Duncan I, Wernerfelt, Birger. “Internal customers and internal suppliers,” Journal of Marketing
Research, Aug 1996. Vol. 33, Iss. 3; p. 268

The Measurement “Pyramid”

Multiple Item scale,
equal weights
Single Item,
10 point scale
Single Item,
5 point scale

Top Box Approaches

PRECISION: Width of Score Confidence Interval
© CFI Group

ER PRE
RO DI
R CT
IN IO
TE N
RV
AL

POWER: Ability to Detect Change

Multiple Item scale,
“optimal” weights

How Multiple Measures are Optimally Weighted: After good measures have been identified, a major issue in
measurement is how best to combine the multiple measures into their respective indices—the formation of what is
known as a “measurement model”. The method chosen can have important effects on the analysis results, especially
if the results will be used for diagnosis and prognosis.
The typical CFI Group measurement system is based upon a network of multi-dimensionally measured concepts that
are linked together in a cause and effect framework. The scores of the various experience indices; the customer
satisfaction index and the performance outcomes, are a function of the simultaneous optimization of the entire
framework. This empirical process is superior to any other method for ensuring diagnostic and prognostic power.
Competitors use methods that are piecemeal replicas by comparison.
For example, some firms in developing a satisfaction index use relative weights derived from the factor analysis26 of a
number of questions about different aspects of product or service on quality. The resulting index is simply a
consequence of the shared aspects (correlation) of the questions without regard to some optimizing criterion such as
a dependent variable like customer retention or other desired behavioral outcome. A particularly debilitating drawback
of this approach is that if there are more questions about a particular attribute, that attribute will have a
disproportionate representation in the index and can bias the resulting score. The fact that quality aspects correlate
among themselves often has little to do with customers’ satisfaction levels, yet some firms persist in using this
confounded measure by mixing a customer’s experience with their satisfaction levels—the causes are lumped
together with the effects. Since the weights applied to the variables to create the satisfaction index are based on the
inter-correlations among the quality measures themselves, there is little reason to expect that the resulting indices
have any relationship with performance outcomes such as customer retention. Thus, this weighting scheme is based
on an irrelevant criterion (inter-correlations as opposed to optimizing on an objective criterion). To be useful, a
performance index or a satisfaction index must be based on a more relevant criterion (such as repurchase or
27
willingness to pay, for example) .
26

The purpose of factor analysis is to discover simple patterns in the pattern of relationships among the variables. In particular, it
seeks to discover if the observed variables can be explained largely or entirely in terms of a much smaller number of variables
called factors.
27
Other firms use even less sophisticated methods for combining individual items into a satisfaction index by relying upon summing
or averaging of the ratings on the various questionnaire items.

The CFI Group system relies on a measurement model that empirically produces a system of optimally weighted
indices. It is optimal because the weights for the product and service quality experience measures are derived based
on the maximization of relationships (i.e., the correlations) between the various experience measures with customer
satisfaction and future behavior. The way the system works is that the weights for all of the measures in the
measurement model are “adjusted” so that the correlations between the variables along the cause and effect
pathways in the measurement system are maximized. The simple two-component model shown below schematically
28
illustrates the process .

MV1

MV2

MV3

Corrmax
LV1

LV2

MV4

MV5

MV6

The weights are adjusted to maximize the correlation
between LV1 and LV2, then the scores are calculated:
LV1 Score = w1MV1 + w2MV2+ w3MV3 ,and
LV2 Score = w4MV4+w5MV5+w6MV6

The weighting process used in the development of the measurement model is the first critical part of the CFI Group
CFI Group measurement system. Unlike other weighting schemes, an objective criterion of importance to managers
(maximization of the relationships or correlation) is used to optimally weight the various measures in the
product/service quality and customer satisfaction indices. Since the weights are determined based on the
performance-satisfaction-behavior relationships in the model, this minimizes the common problem (experienced by
competitors using less sophisticated weighting schemes) that an increase in a precursor index (e.g., service quality)
does not lead to an increase in a successor index (e.g., customer satisfaction).

Diagnosis
Impacts versus Importance: As discussed above, the connective pathways between the experience indices, customer
satisfaction and behavioral intentions play an important role in the determination of the weights used for score
calculation. But these paths also provide the backbone for the second key feature of the CFI Group measurement
system—impacts.
The most fundamental task of any organization (commercial or government) is the efficient allocation of scarce
resources needed to accomplish desired performance outcomes. The CFI Group system quantifies the impact of
experience changes on satisfaction and, in turn, the impact of satisfaction on future behavior. Managers can then use

Note: The correlation is not the same as an impact. The correlation coefficient is simply used as the criteria for adjusting the
weights in a manner that ensures the strongest relationships between the concepts in the model (LV1 and LV2 in the schematic)
given the available information in the individual measures (MV1…MV6).

this information for efficient resource allocation. What are the properties of the CFI Group system that makes this
29
possible ?
CFI Group’s system is a cause-and effect system that isolates the effects of a change in an experience on the
change in customer satisfaction (and the subsequent change in desired behavioral outcomes). It is also characterized
by a “simultaneous” treatment of all its components (i.e., quality, satisfaction, profit). All of these aspects make it
different from other competitive approaches.
It is not well understood, but a cause-and-effect assumption is made every time a management decision is made (“if
we do x, y will happen”). Unfortunately, managers often base their decisions on hunches, cross-tabs or correlation
coefficients that do not support any sort of casual inferences. The CFI Group system is different. It supports causal
inferences based on considerable scientific backing.
The reasons for this are several. The first is somewhat technical. The logic is the same as in path analysis and
covariance structure analysis: the decomposition of correlations into causal paths. This involves a comparison of the
empirical correlations in the data and the correlations imposed by the model (expected correlation matrix). If those
sets of correlations are identical (within sampling error), there is evidence for the causal structure imposed by the CFI
Group model (e.g., experience component x leads to customer satisfaction).
The second important point concerns what is meant by “effect”. The CFI Group system defines this as the marginal
effect of component x on y when other components are held constant—i.e., the effect of a change in x on y. If we
graph x on the horizontal axis and y on the vertical axis, it is represented by the slope of the function as illustrated in
the schematic below.

Same Correlation But Different Slopes
Case “B”
Y Slope is the “tilt” of line

Case “A”
Y

Correlation is the oval
X

X has the same correlation with Y in both cases, but in case A a change
in X will have a larger effect on Y than in case B (Slope = ∆Y/ ∆ X).

It is critical to understand this concept because it is different from what most other competitors provide and the results
may be different from what seems intuitive to the client. Market research firms, for example, often talk about
“importance” and use correlation coefficients as measures of importance. But a high correlation does not imply that a
change in x will cause a change in y.

The reader will find a more technical discussion of the CFI method for calculating impacts in Appendix A.

Other firms use “stated” importance measures, but these are equally flawed for the measurement of customer
satisfaction. For example, Allen and Rao (2000) state that: “Few, if any, consultants advocate the stated importance
framework today. Its shortcomings have been illustrated with the airline safety example in which stated and derived
30
importance metrics lead to disparate conclusions .” In addition, such methods increase the length of the
questionnaire by requiring shadow importance measures for every perceived performance or experience item
included on the questionnaire. If ranking or constant sum scaling methods are used instead, then some kind of
reduction of measures needs to be performed since respondents are psychically unable to rank or allocate points
over more than 5-7 measures in a meaningful way. Plus this approach is not based on the sound psychometric
principles of multiple measures and error reduction described above. Thus practitioners advocating stated importance
methods are basically offering measures that have high levels or unknown levels of error in them, which is then
exacerbated when the perceived performance/ importance pairs are manipulated either by multiplying or subtracting
the measures to arrive at some confounded indication of “effect” or focus. Resource allocations targeted for the
management of customer satisfaction and retention based on measures of this nature are akin to using a dartboard
for decision-making and are ultimately doomed to failure31.
For management to efficiently allocate its resources, they need to know what will happen if there are changes
(usually improvements) in a certain aspect of the customers’ experiences – this is what CFI Group’s system provides.
It also means that the use of the term “important” in this context refers to what will happen as a result of a change in
something – not what is important per se. For example, both price and quality can be highly correlated to satisfaction,
but a change in one of them may produce a greater effect in terms of changing satisfaction than the other.
Quantifying Effects—Standardized or Unstandardized Measures?: The proper use of analysis tools is critical when
quantifying effects. Other satisfaction analysts usually miss this point. For example, some firms in Europe use some
of the same theoretical foundations (LV-PLS) as CFI Group, but do not understand that the core LV-PLS program is
unsuitable without the CFI Group modifications. Basically, the problem is this: In order to solve the unknowns in
equations with latent variables, some restrictions have to be put on the system – otherwise there would be too many
unknowns. One set of restrictions, that are quite common in psychology, is to set all variances to unity and all means
to zero – that is to standardize all variables. However, in terms of quantifying effects, standardization renders the
results useless and destroys comparability between samples. What is then interpreted as importance is the impact of
quality x on the spread (standard deviation) of satisfaction. This makes no sense and is, of course, very different from
the CFI Group system (which does not rely on standardization). In practice, it turns out that our results are quite
different from what the generic LV-PLS program provides. The modifications by CFI Group to the LV-PLS algorithm
are proprietary and highly technical. They involve a solution to the multicollinearity problem and a rescaling method to
insure comparability of results (see Appendix A for more detail).
The schematic below illustrates the problem with using standardized measures. The example shows two models for
two different business units in the same company. The bolded (red) quantities are the unstandardized measures
(component means and impacts), while the italicized (blue) quantities are the standardized measures (means and
impacts). Using unstandardized measures is straightforward—for business unit 1, a 1 unit (point) change in the
Autonomy score yields a 0.22 change in the JobAtt score. Using the standardized measures is less intuitive—for
business unit 1, a 1 unit (standard deviation) change in Autonomy yields a 0.27 standard deviation change in JobAtt.
Notice also the rather large differences in the standardized scores (Autonomy has a standardized score of 0.06 and
Recognition is 0.23) of the variables both within each business unit model (reflecting the different variances for each
component), as well as across business units (Recognition in unit 1 is 0.23, and in unit 2 it is 0.04).
This illustrates that because standardized measures are depended on the variation (or spread) in the data, which can
differ from sample to sample, comparability is lost. For this reason, it is best not to compare groups using
standardized means or impacts.

Allen, Derek and Tanniru Rao, Analysis of Customer Satisfaction Data, ASQ Quality Press 2000, p.70.
One customer perceived value (CVP) practitioner advocates the misguided use of a perceived performance / stated importance
measurement framework for the management of “customer loyalty” for all customers regardless of whether they are current
customers or new customers. Why the concept of loyalty is germane to new customers is in itself puzzling. That aside, it is well
known that retention strategies are quite different from acquisition strategies both in terms of content and costs. Consequently, the
guidance dispensed from this confused measurement approach will certainly result in a mal-allocation of scarce resources for those
who have unfortunately bought into this method.
31

Differences Between Standardized and
Unstandardized Means and Impacts
Business Unit 2

Business Unit 1

n=561

n=1726
Autonomy
69.6
0.06

Recognition
57.8
0.23

Stress
50.1
0.03

Unstandardized
Means
0.22
0.27
0.24
0.30

0.06
0.09

Standardized
Means

JOBATT
77.0
0.09

Unstandardized
Impacts
Standardized
Impacts

Autonomy
71.4
0.13

Recognition
53.1
0.04

Stress
52.8
0.13

0.28
0.30

0.31
0.32

JOBATT
73.9
-0.05

0.09
0.10

Multicollinearity: A very difficult problem in impact estimation is the isolation of the individual effect of each experience
component from other components. This is because respondents tend to see many components as inter-related to
some extent. This “halo” can contribute to high correlations between the components resulting in what is known as
multicollinearity. No statistical technique is equipped to handle such multicollinearity and the result is misleading
diagnosis. Normal LV-PLS and some other structural equation modeling techniques can help in reducing
multicollinearity, but not enough to overcome the problem. The CFI Group system, however, is (1) able to extract the
cause of multicollinearity and (2) apply a solution from the field of chemometrics to solve the problem.
Other consulting firms either ignore the problem at worst, or conduct a factor analysis of the experience components
(thus grouping them together) and then regress customer satisfaction, or some other dependent variable, on the
factor analysis groups. The problems with this approach are so serious that it is virtually impossible to make sense of
the results.
•
•

•
•

First, it destroys the meaning of the variables as they were originally conceived and measured; the resulting
factors must be interpreted post-hoc by the analyst—raising questions of validity.
Second, the imposed correlational structure among the factors is highly artificial and far removed from the
how the respondents perceived things. The most common way is to force all the factors to be independent
from each other (i.e., constrain the factors to have zero correlations with one another). This is most certainly
wrong and very different from how the respondents perceived them—the “halo” effect.
Third, usually the first factor extracted in a factor analysis solution will be totally overwhelming in terms of
information (variance) content, which makes it necessary to use some sort of rotation scheme (introducing
yet another artificial device) so the results can be interpreted by the analyst.
Fourth, factor analysis plus regression represents a piecemeal two-step approach. Any errors existing in the
first step are magnified by the second step—an optimal index cannot be constructed under this scenario.
The post-hoc interpreted factors may not resemble those quality components that have maximal impact on
satisfaction (and subsequent behaviors).

Two other approaches that are often used by firms to analyze satisfaction are stepwise regression and conjoint
analysis. Stepwise regression assumes that absolutely nothing is known beforehand and everything is left to a
sample of data points. In other words, the solution is an artifact of the data. As the name implies, stepwise regression

is a technique for including “important” variables in a regression in a stepwise manner. The limitations of stepwise
regression are:
•
Notoriously unstable results;
•
High likelihood of omitting a key variable;
•
An inferior methodology if any theory exists;
•
The results of stepwise regression cannot be evaluated by statistical significance testing; and
•
The regression coefficients are biased.
Stepwise regression will almost never be used in articles published by respectable scientific journals (for the reasons
given above).
Conjoint analysis is a different matter. In contrast to stepwise regression, conjoint analysis is a useful scientific
method. The problem is that it is not well suited to the measurement and diagnosis of customer satisfaction. The
basic problems are that it cannot handle many attributes and that there has to be a “level” of each quality attribute
that the respondent is asked to evaluate. Conjoint analysis is more suitable for new product (service) development, in
which respondents are asked to evaluate different prototypes (on paper) that have different levels of each attribute.
For CFI Group, conjoint analysis can be used if a client is interested in finding out what customer satisfaction would
be, if certain attributes were added to the product (service) and what the importance of each attribute would be. A
nice benefit of conjoint analysis in this context is that it can be done on a single customer. The contrast between
causal modeling methods and conjoint analysis is detailed in Appendix A.

Prognosis
The ultimate proof of a good measurement system is its ability to make accurate predictions. The models built on the
principles described above provide managers with measurement-based tools for better management of intangible
32
assets (like customers). With the patented process used in the development of CFI Group measurement systems
managers in commercial and public service organizations alike can be assured that they are getting valid, reliable
and sensitive measures within a cause and effect framework that allows them to evaluate their decisions before they
make them.
Once an initial model is built, the resultant component scores and impacts provide managers with high-powered
metrics for determining the best courses of action they can take for accomplishing desired outcomes. Competing
measurement systems “statically” compare self-reported importance measures against current performance
measures. The CFI Group performance measurement approach provides a “dynamic” tool that tells managers what
changes are important in affecting desired outcomes (e.g., increases in customer satisfaction). This distinction is a
critical one for the success of resource allocation decisions that managers make daily. Without the knowledge of
“what to expect” when executing a plan, decision-making devolves to a mere guessing game.
Most traditional approaches to market research either confuse comparison of levels (e.g., current performance and
levels of importance as provided by customers) with marginal contributions (e.g., what should be changed), or fail to
make the connections to desired performance outcomes (such as economic returns), or both. As discussed above,
the CFI Group system allows for all of these features—the perceived performance comparisons, the impact of quality
components on satisfaction and, the impact of satisfaction on future behaviors, and the use of this information for
efficient resource allocation.
The CFI Group approach provides specific and quantifiable information about the levels of service and quality and the
marginal contribution, to both customer satisfaction and profits, which will result from a change in a process, service,
aspect of quality, etc. Unlike other consulting firms, CFI Group utilizes a cause-and-effect system that isolates the
effects of a change in a quality component on the change in customer satisfaction, and the subsequent change in
economic returns. This is very different from focusing on what customers deem “important”. It is also characterized by
a “systems” treatment of all its components (i.e., quality, satisfaction, profit). All of these aspects make it different
from other approaches.

United States patent number 6,192,319, visit www.uspto.gov for more information.

Summary Table
The following table provides a basic summary of many of the key points made in the foregoing discussion.

Objective

Characteristics

Benefits

Elements of ACSI Technology Implementation
Measurement
Diagnosis
Prognosis
Reliable (precision)
Impact or Key Driver
Change Prediction
Valid
Analysis (“To improve
(“How do changes in
Sensitive (power to
customer satisfaction
experiences effect
detect change)
what matters the
changes in
most?”)
satisfaction and
retention?”)
“Voice of the
Calculated within the
“What if” predictive
customer” (VOC)
context of a complex
tool
based
cause and effect
Quantifies the effects
Multiple measures
network.
of changes across
optimally weighted
Based on
multiple nodes
based on strength of
unstandardized slopes
(experience to
relationships in
not correlation.
evaluation to intention)
measurement network
Optimized with regard
Future effects are
Reduced
to key management
comparative across
measurement error
objectives (i.e. CS or
time, location or
Reduced confidence
behaviors)
segment given
intervals
Control of multiplanned investment
Uses unstandardized
collinearity provides
levels
performance scores
more reliable impact
estimation
Accurate
Prioritizes
Focuses on the
Meaningful—tied
improvement efforts
“dynamic”
directly to customer
Provides impacts that
quantification of
experience
are additive in nature
change
Comprehensive—
and comparable
Increased ability to
incorporates all
across groups
envision future change
aspects of customer
Allows for more
in key performance
experiences
efficient allocation of
outcomes
Understandable—
resources based on
simple scoring method
the economic concept
Comparable—by
of marginality
using unstandardized
scores

Competitive Comparisons
The following table provides a comparison of the ACSI technology as implemented by CFI Group with three classes
of competitors—Primitive, Naïve, and Pseudo-Sophisticated.
Primitive competitors are research suppliers that compete largely on the basis of price, supplying survey information
that uses “canned” questionnaires. They may be able to provide results quickly, but the results lack any diagnostic or
prognostic capability. They appeal to buyers of consumer research who are unconcerned with information quality and
may be looking to meet an organization requirement that customers be surveyed. However the usefulness and
incorporation of the results into decision-making is rudimentary.
Comparative
Criteria

Measurement
>Uses VOC
qualitative methods

>Customized
measures to insure
validity

The ACSI
Technology/
CFI Group

Types of Competitors
Primitive—“Price
Based”

Yes—all
measures used
by CFI are based
on VOC methods
Yes—customized
measures are
recommended to
insure validity

No—use “canned” uncustomized surveys
based on researcher
judgment
No—repetitively use
the same set of
questions for all
clients—validity is
likely low
No—use single item
nominal or categorical
measures—reliability
very low, large
confidence intervals

>Use multiple item
scales to minimize
measurement error

Yes—three to five
items necessary
to insure high
reliability
standards

>Optimal weighting
for deriving scores

Yes—weights
based on cause
and effect
network between
components
Small ≤ 150-200

Sample sizes
required

Driver/ Impact Identification
Yes
>Cause and Effect
Network with
impacts based on
based on slopes

No—Only report item
scores—usually as
percentages or
proportions (i.e., “topbox”)
Very large—samples
based on number of
cells that need to be
filled in a cross-tab
table
NA—no cause and
effect networks are
used

Naïve—“Simple
Minded Solutions”

PseudoSophisticated—
“Faulty Science”

No—use “canned” uncustomized surveys
based on researcher
judgment
No—repetitively use
the same set of
questions for all
clients—validity is likely
low
No—use single item
nominal, categorical or
5 point Likert scaled
measures—low
reliability, large
confidence intervals
No—Only report item
scores as proportions
or means

Maybe—some may
use qualitative
methods

Large—needed to get
any kind of estimation
precision

Large—needed to
get any kind of
estimation precision

No—use correlations,
difference gaps or
stated importance;
some may use simple
regression
No—usually ignore the
existence of multicollinearity

Quasi—Usually
stepwise regression;
or in some cases
factor regression

Some—usually use
canned surveys to
save time

Some—tighter
confidence intervals
but still 30% bigger
than CFI method

No—usually
compute averages,
sums or factor
scores

>Control of
Multicollinearity

Yes—proprietary
PLS regression
based method for
allocating the
“halo” in perceptual measures

NA—no estimates
provided

>Unstandardized
estimates
Other
Proprietary
patented analysis
system

Yes

NA—no estimates
provided

Yes—United
States patent
number 6,192,319

No—not available; all analyses use software packages that are
purchased from third party vendors (e.g., SPSS, SAS)

Mostly No—some
may factor analyze
predictors to control
inter-correlations
before using in a
multiple regression
model
No

Naïve suppliers will often use similar data sources and measures as those of Primitive research suppliers but add
some intuitively appealing analytic paraphernalia—such as performance importance matrices, difference gap
analysis, etc. Unfortunately most of these so-called analytic approaches actually increase error and provide a
muddled picture of reality rather than clarifying it. In addition, their ubiquitous use of a “stated performance”
measurement methodology needlessly increases questionnaire length with no additional diagnostic value. They may
appeal to users of research who are looking for a more sophistication from their measurement suppliers than can be
provided by Primitive suppliers. Unfortunately, such users are being hoodwinked by glib answers and simplistic
solutions to complex questions of human behavior. Naïve suppliers are very dependent upon the inability of their
customers to discriminate between what they are peddling and the kind of sound methods espoused by CFI Group.
Pseudo-Sophisticated suppliers may provide upgraded measurement and diagnostic capabilities. They do this by the
“piece-meal” application of multiple measurement approaches along with limited use of multivariate statistical
techniques often found in common statistical packages. These kinds of suppliers add a veneer of science to the types
of measurement services provided by Primitive and Naïve suppliers.
What is often unknown to most users of the services supplied by these vendors is that the common sequential use of
multivariate methods (such as using factor analysis or cluster analysis to create component scores which are then
used in a multivariate regression procedure) essentially magnifies the weaknesses of both methods and creates
interpretational problems that are not easily overcome. For instance, if the factor scores of a three factor solution to a
data reduction problem that explains 50 % of the variance in the data used, are then regressed against a fourth
2
variable achieving an R (variance explained) value of 0.5, then what does the analysis tell the user? Never mind
what the regression coefficients mean. Unfortunately many Pseudo-Sophisticated venders are not sufficiently
cognizant of these weaknesses to adequately educate their customers about the frailties in the seemingly scientific
methods they advocate. As indicated earlier in this document, users of this kind of analytic product run the danger of
being seriously misled in their decision-making.

Results—How have CFI Group Clients Benefits from the ACSI Technology?
In this section three examples of the how the CFI Group/ ACSI technology was applied to U.S. Government agencies
are reviewed.
Internal Revenue Service (IRS):
Before working with CFI Group, the IRS suffered from:
•
Disgruntled employees
•
Dissatisfied taxpayers
•
Declining, low ACSI Scores

Case Study: Internal Revenue Service
Departmental performance lagged far behind…

54
50

ACSI Scores

40
Internal Revenue Service
ACSI National

0
1994

1995

1996

1997

This situation resulted in a 1997 Senate hearing that labeled the IRS as a “tax agency out of control”. This finding was
supported by witnesses and commentators making statements such as the following:
“As only one taxpayer representative out of thousands across the country, I have seen dozens of taxpayers
severely damaged and even made homeless by the IRS collection division.” (Anonymous Witness #1, IRS
Employee
Senate IRS Hearings 1997)
“The long list of IRS horribles included arbitrary collection decisions, sale of taxpayer lien property far below
value, and the cavalier mistreatment of taxpayers.” (Bob Zelnick, ABC Good Morning America, September
26, 1997)

CFI Group began working with the IRS in 1999. An initial assignment discovered that the satisfaction levels for
taxpayers filing on-line (eFilers) were 30 points higher than those taxpayers submitting paper returns (see first chart
below). As a consequence of this finding, the IRS instituted a strategy of encouraging filers to use the on-line
submission process. The result was a steady improvement in overall IRS customer satisfaction scores (see second
chart below). These findings demonstrate the power of the strategic guidance provided by CFI Group to improve
decision-making and subsequent customer satisfaction.

Case Study: Internal Revenue Service
A Key Discovery: ACSI Study 1999

eFilers vastly more satisfied …
• fewer errors, quick problem resolution
• earlier refunds, status tracking
Internal Revenue Service
Tax Filer Satisfaction

90
85
80
75
70
65
60
55
50
45
40

30 points
48
44

1999

2000

2001

Paper Filers

2002

2003

E-Filers

Case Study: Internal Revenue Service
A systematic improvement

IRS hears the voice of the customer …
• Commitment to customer service: kinder, gentler
• Increased awareness and usage of eFiling
70

Customer Satisfaction Up 12 Points

0
1999

2 00 0

2001

2002

Faster Trade-Up to Electronic filing…
Faster access to tax revenues?
© CFI Group

2003

Federal Aviation Agency (FAA):
When CFI Group began working with the FAA:
•
•

•
•

Satisfaction had been low, typical for an agency with largely regulatory/punitive function.
CFI Group developed a model that measured three specific drivers of satisfaction:
o Quality of air traffic services
o The pilot certification process
o FAA policies, standards and regulations
The “Policies, standards and regulations” area was identified as the lowest scoring driver but with and the
greatest impact on pilot satisfaction with the FAA.
It was determined that pilots perceived “policies, standards and regulations” as poorly written and difficult to
understand, thus failing to contribute to airline safety as well as they should.

As a result of CFI Group analysis and modelling insights, the FAA engaged in a significant overhaul of its policies,
standards and regulations, doing much rewriting using plain language.
Subsequent measurements showed that pilot satisfaction with FAA increased 14%, a very large improvement in ACSI
terms for a relatively short span of time.

Case Study: Federal Aviation Authority (FAA)
Pilot Satisfaction Raised…Improved Safety Probable

Federal Aviation Administration –
Commercial Pilot Satisfaction
64

+14%
59

60
58
56
55

50
1999

2000

2001

2002

2003

Improved Performance from a Safety-Critical Agency
© CFI Group

Bibliography
Performance Measurement in Government:
Ammons, David N. Municipal Benchmarks: Assessing Local Performance and Establishing Community Standards.
Thousand Oaks, CA: Sage Publications, 1996.
Ammons, David N., ed. Accountability for Performance: Measurement and Monitoring in Local Government.
Washington, DC: International City/County Management Association, 1995.
Bens, Charles K. “Strategies for Implementing Performance Measurement.” Management Information Service Report,
International City Management Association, November 18, 1986.
Hatry, Harry P. Performance Measurement Principles and Techniques: An Overview for State Governments.
Washington, DC: The Urban Institute, 1983.
Hatry, Harry P. “The Status of Productivity Measurement in the Public Sector,” Public Administration Review
January/February, 1978, vol. 38, pp. 28-33.
Hatry, Harry P. “Performance Measurement Principles and Techniques: An Overview for Local Government,” Public
Productivity Review, December 4, 1980, pp. 312-339.
Hatry, Harry P., Louis H. Blair, Donald M. Fisk, John M. Greiner, John R. Hall, Jr., and Philip S. Schaenman How
Effective Are Your Community Services? Procedures for Measuring Their Quality. Washington, DC: The Urban
Institute and International City/County Management Association, 1992.
Hatry, Harry P., James R. Fountain, Jr., Jonathan M. Sullivan, and Lorraine Kremer eds. Service Efforts and
Accomplishments Reporting—Its Time Has Come: An Overview. Norwalk, CT: Governmental Accounting Standards
Board, 1990.
Maynes, John, and Zapico-Goni, Monitoring Performance in the Public Sector, 1997.
Miller, Thomas I. and Michelle A. Miller. Citizen Surveys: How to Do Them, How to Use Them, What They Mean.
Washington, DC: International City/County Management Association, 1991.
National Partnership for Reinventing Government, Balancing Measures: Best Practices in Performance Management
August 1999.
Osborne, David and Ted Gaebler Reinventing Government: How the Entrepreneurial Spirit is Transforming the Public
Sector. Reading, Mass: Addison-Wesley Publishing Co., 1992.
Peters, Thomas J. and Robert H. Waterman, Jr. In Search of Excellence: Lessons from America’s Best-Run
Companies. New York: Harper & Row, 1982.
Segal, Geoffrey and Adam Summers, “Citizen Budget Reports: Improving Performance and Accountability in
Governments,” Reason Public Policy Institute (RPPI), Policy Report No.292 (March 2002).
Tigue, Patricia and Dennis Strachota. The Use of Performance Measures in City and County Budgets.Chicago:
Government Finance Officers Association, 1994.
U.S. General Accounting Office, Executive Guide: Effectively Implementing the Government Performance and
Results Act (Washington, D.C. 1996).
Van Houten, Therese and Harry P. Hatry How to Conduct a Citizen Survey. Chicago: American Planning Association,
1987.
Webb, Kenneth and Harry P. Hatry. Obtaining Citizen Feedback: The Application of Citizen Surveys to Local
Government. Washington, DC: Urban Institute, 1973.

ACSI Related
Anderson, Eugene, Claes Fornell and Sanal Maznancheryl (2004) ”Customer Satisfaction and Shareholder Value,”
Journal of Marketing,(October), Vol. 68, no.4, 172.
Anderson, Eugene W., Claes Fornell and Roland T. Rust (1997), "Customer Satisfaction, Productivity and
Profitability: Differences Between Goods and Services," Marketing Science, Vol. 16, No. 2, 129-145, Summer.
Fornell, Claes and Roland Rust, “The effect of customer satisfaction on consumer spending growth,” under review,
2005.
Fornell, Claes, Sunil Mithas, Forrest Morgeson, and M. S. Krishnan, ”Customer Satisfaction and Stock Prices: High
Returns, Low Risk” under review, 2005.
Fornell, Claes, Paul Damien, Marcin Kacperczyk, and Michel Wedel, “The Empirical Relationship between Buyer
Satisfaction and GDP Growth under Parameter and Distributional Uncertainty,” under review, 2004.
Fornell, Claes, Michael D. Johnson, Eugene W. Anderson, Jaesung Cha and Barbara Everitt Bryant, (1996), "The
American Customer Satisfaction Index: Nature, Purpose and Findings," Journal of Marketing, Vol. 60, October, 7-18.
Gruca, Thomas S., and Lopo L. Rego (2005) “Customer Satisfaction, Cash Flow, and Shareholder Value,” Journal of
Marketing,(July) Vol.69, 115-130.
Morgan, Neil and Lopo Rego (forthcoming 2006) “The Value of Different Customer satisfaction and Loyalty Metrics in
Predicting Business Performance,” Marketing Science.

Analytic Methods
Andrews, Frank M. (1984), “Construct Validity and Error Components of Survey Measures: A Statistical Modeling
Approach”, Public Opinion Quarterly, p.404-442.
Allen, Derek and Tanniru Rao, Analysis of Customer Satisfaction Data, ASQ Quality Press 2000.
Cox, E. P. (1980) “The optimal number of response alternatives for a scale: a review,” Journal of Marketing
Research, 17, 407.
Falk, R. Frank and Nancy B. Miller, A Primer for Soft Modeling, The University of Akron Press, 1992.
Fornell, Claes and Jaesung Cha, (1994), "Partial Least Squares," Richard Bagozzi (ed)., Advanced Methods of
Marketing, 52-78.
Fornell, Claes and David F. Larcker, ”Evaluating Structural Equation Models with Unobserved Variables and
Measurement Error,” Journal of Marketing Research, (February) 1981, 18, 1, 39.
Fornell, Claes and David F. Larcker, ”Structural Equation Models with Unobserved Variables and Measurement Error:
Algebra and Statistics,” Journal of marketing Research, (August) 1981, 18, 3, 382.
Fornell, Claes, B.D. Rhee, and Y. Yi, “Direct Regression, Reverse Regression, and Covariance Structure Analysis,”
Marketing Letters, 1991, 20(3), 309.
Hauser, John R, Simester, Duncan I, Wernerfelt, Birger. “Internal customers and internal suppliers,” Journal of
Marketing Research, Aug 1996. Vol. 33, Iss. 3; p. 268
Lohmoeller, Jan-Bernd, Latent variable Path Modeling with Partial Least Squares, New York: Springer-Verlag, 1989.

Louviere, Jordan J. and Towhidul Islam, “A Comparison of Importance Weights/Measures Derived from ChoiceBased Conjoint, Constant Sum Scales and Best-Worst Scaling,” Centre for the Study of Choice
(CenSoC) University of Technology, Sydney, Working Paper No. 04-003, September 30, 2004.
Peterson, Robert A. and William R. Wilson (1992), “Measuring Customer Satisfaction: Fact and Artifact,” Journal of
Academy of Marketing Science, 20 (Winter), 61-71
Preston, Carolyn C. and Andrew M. Colman, “Optimal number of response categories in rating scales: reliability,
validity, discriminating power, and respondent preferences,” Acta Psychologica 104 (2000) 1.
Ryan, Michael J., Thomas Buzas and Venkatram Ramaswamy (1995), "Making Customer Satisfaction Measurement
a Power Tool," Marketing Research, Vol. 7, No. 3, Summer, 11-16.
Verlegh, Peeter W.J., Hendrik N.J. Schifferstein, Dick R. Wittink, “Range and Number-of-Levels Effects in Derived
and Stated Measures of Attribute Importance,” Marketing Letters; Feb 2002; 13,
Wold, H. (1975). Soft Modelling by Latent Variables: The Nonlinear Iterative Partial Least Squares Approach. In:
Perspectives in Probability and Statistics, Papers in Honour of M.S. Bartlett, ed. J. Gani, Academic Press, London.

Appendix A: A Technical Summary of the CFI Group Analytic System
CFI Group’s process takes place in four stages to ensure maximum reliability, validity and inclusion of essential
issues:
1) Secondary Research
2) Management Interviews
3) “Voice of the Customer” (VOC) Investigations
4) Quantitative Analysis
1) Secondary Research
Some firms might argue against the necessity of this stage, stating that vast quantities of such research had already
been performed, oftentimes yielding no more information than they had had before. However, one reason firms often
do not benefit from such research is that its focus tends to be scattered. One study might look at concepts of
customer loyalty, while another looks at current attitudes of store personnel, and still another asks customers to focus
on aspects of in-store shopping. Our purpose in performing secondary research is to build upon and synthesize prior
research thereby gaining the maximum information available from it.
2) Management Interviews
Interviewing management personnel across relevant areas of businesses is also critical to synthesizing useful
information, which might otherwise remain isolated. These interviews aid in:
•
Understanding a heterogeneous customer base;
•
Identifying current business issues viewed as relevant by management personnel;
•
Developing a substantive knowledge of the competitive environment;
•
Designing the qualitative interview guidelines for in-depth interviews with customers; and
•
Determining how performance measures will be represented in the subsequent model
3) “Voice of the Customer” (VOC) Investigations
The need to talk with customers to uncover issues salient to them has become increasingly obvious over the past
several years. What has not become obvious, however, are the techniques needed to uncover such issues accurately
and in-depth. CFI Group’s system utilizes qualitative one-on-one customer interviews specifically designed to cover
both issues identified as relevant by management personnel and to allow customers to voice their opinions, concerns
and desires which might otherwise be left unknown to management.
While management would likely be able to predict a large percentage of the components and issues salient to
customer satisfaction, there is still a reasonable amount of information to be gained from customers, which would go
unsaid if customer interview structures were too rigid.
Further, management personnel might also be unaware of the language that customers tend to use
(i.e., voice of the customer) when discussing such issues or, quite importantly, all the aspects of a particular issue,
even if correctly identified by management, relevant to the customer.
CFI Group’s qualitative system applies a combination of current social-psychological techniques whose power and
scope exceed common research methods utilized by other firms. CFI Group’s system employs the following
techniques33:
One-on-one interviews: While focus groups can be useful in certain cases, typically what happens in such settings is
that one or two strong voices emerge only to be followed by the rest of the group. The resulting information is highly
biased and skewed toward the more vocal customers in the group. Although interviewers often try to avoid such
biases by requiring focus group attendees to talk “in turn”, they may still miss subtle (and not-so-subtle) pressures,
which come from group meetings. Valuable information may be lost in such settings where the interview is highly
structured.
Open ended, semi-structured interview approach: This approach allows us to ask customers about issues mentioned
in secondary research and management interviews, while still leaving the opportunity for each customer to discuss
“top-of-mind” issues during the course of the interview, thereby identifying salient factors which might otherwise go
undetected.
Metaphors and narrative accounts: By giving customers the opportunity to tell stories and use metaphors to describe
and the various experiences they have had, we also encourage the identification of new and valuable information.
33

Griffin, Abbie and John Hauser, “The Voice of the Customer,” Marketing Science, winter, 1993, 12,1,1.

Given innovative social-psychological research techniques, and a more conversational style interview, customers can
relax and converse as they might with a friend during the interview. A skilled interviewer can keep a respondent
focused on the relevant topics while still allowing them to recall experiences regarding which could be very useful to
management and other personnel. Similarly, simply asking someone “why” they like or dislike some aspect of a
product, will not get at the real ways in which people think about things and make purchase decisions. CFI Group’s
qualitative system utilizes techniques which help customers to identify and discuss issues relevant to their purchasing
behaviors, unlike most other consulting firms where customers are asked only to confirm or rank pre-identified and
ultimately incomplete factors relevant to decision making.
Customer interviews performed by CFI Group are recorded and transcribed verbatim ensuring maximum reliability
and validity in performing the analysis. Qualitative research techniques are then applied to the subsequent analysis of
each transcript as well as the transcripts as a group. Unlike other firms who rely on “frequency of response” coding to
identify relevant factors (thereby only increasing interviewer created bias), CFI Group’s system relies on a “narrow
lens approach” – a social-psychological analysis process which allows us to identify and categorize salient factors
and re-group all relevant information into a subsequent model, thereby maximizing the information gained from the
interviews.
CFI Group’s qualitative analysis allows a specification of a preliminary model of customer satisfaction, and makes
certain that attributes of each component are preserved utilizing the language of the customer. The subsequently
developed questionnaire is based on the voice of the customer and helps ensure that the information gathered with it
is valid.
4) Quantitative Analysis
Ultimately, the power and precision of the preliminary model is proven in the quantitative phase of CFI Group’s
system which is built upon three distinct points:
A. Estimating Importance, Utility, and Impact
B. Estimating Derived Importance
C. Causal Models: comparing covariance structure analysis (e.g., LISREL) and latent variable partial least
squares (e.g., Wold’s LV-PLS system), the two major approaches to causal models.
The objective is to identify those quality dimensions whose improvement offers the greatest returns, as measured in
customer satisfaction, retention rate (and potentially related measures of individual behavior, such as spending level)
and corporate financial performance. That is, if the level of performance on a quality attribute improves by a given
amount, how much will satisfaction (and, subsequently customer retention or financial performance) improve? In
evaluating a methodology, the most important criterion is whether a method can quantify the return-on-quality.
A. Estimating Importance, Utility, and Impact
Table 1: Four Approaches to Estimating Importance, Utility and Impact of Quality Improvements
Class of Methods
Explicit Self Reported Importance
Derived Self-Reported Importance
Conjoint Methods
Derived Importance Methods

Quantifies Change
No
No
Yes
Yes

In methods assessing Explicit Self-reported Importance, respondents directly state or rate the importance of an
attribute. If respondents are asked to “Rate the importance of price on a scale from 1 to 5,” attributes can only be
compared in terms of their mean importance ratings. Methods of Derived Self-Reported Importance ask respondents
to compare attributes in terms of their importance. If the question is “Which is more important to you, price or on-time
delivery?” A rank order or a derived importance scale can be calculated. Constant sum scales can also be used in
this way. However, none of these approaches simultaneously calibrates the relationship from performance on a
quality attribute to a consequent change in satisfaction, retention or financial performance. The best they can do is
indicate on an attribute by attribute basis how important each attribute might be for satisfying the customer. But this
assumes that each attribute importance measure is perfect and without error. Plus it would have to be repeated for
any additional dependent variables separately extending the questionnaire length and increasing respondent fatigue.

Thus, there is no way to compare the returns from quality improvements and set priorities using these methods. This
34
is one of the main reasons that few if any consultants advocate a stated importance measurement framework .
Conjoint Methods ask respondents to rate or choose between alternative profiles of products or services. The
products/services are described in terms of levels of objective quality. From a pattern of preferences, we can derive
part-worths or utilities of different levels of an attribute (for example, the utility of a “professional and polite employee”
versus “warm, friendly and polite employees”). Conjoint analysis is thus able to quantify the relationship between the
level of an attribute and the level of preference.
Conjoint methods, it should be noted, create a model of the individual. As a result, the ability to generalize part-worths
to the population depends on the sampling method. A conjoint study of 30 respondents selected by a non-probability
sampling method, such as convenience sampling or quota methods cannot be generalized to the population.
Confidence intervals on the aggregate part-worths depend on sample size and method.
Table 2: Approaches to Estimating Impacts Conjoint Methods? Derived Importance Methods?

Individual Level Model
Population Level Model

Conjoint Methods?
Yes
No

Derived Importance Methods?
No
Yes

The length of a conjoint questionnaire increases exponentially with the number of attributes and the number of levels
to each attribute. There are some techniques for reducing the burden on the respondent, but in general the
questionnaires are quite lengthy. Conjoint methods work best on product attributes with discrete concrete levels, such
as colors or package designs. Conjoint is much more difficult when attributes are more subjective, such as the
Employee Courtesy example, which does not identify a clear, discernible difference between “professional and polite”
and “warm, friendly and polite.” Conjoint methods cannot be recommended for determining impacts.
Derived Importance Methods estimate the impact of improvement directly from the relationship between a quality
factor and the level of Satisfaction. For example, the level of Satisfaction can be regressed against the levels of
attributes. The regression coefficient, or impact, quantifies the relationship between Satisfaction and the level of an
attribute. For example, a change of x units on an attribute results in a change of y units in Satisfaction.
The advantages and disadvantages of different methods of determining derived importance will be discussed later.
For now, derived importance should be considered as a model of the population rather than the individual. That is,
derived importance indicates the return from improving the level of an attribute for the population rather than for an
individual respondent. In contrast, conjoint measures utility at the individual level and then infer population utilities
using sampling statistics.
B. Estimating Derived Importance
Methods such as correlation or simple regression, examine the relationship between two variables. It is assumed that
the system is not affected by any variables other than the two selected for analysis. In virtually all cases, this is an
unreasonable assumption. The correlation coefficient says nothing about impact. Two variables can have the same
correlation with satisfaction, but different effects because the slope of the relationship differs. This is the difference
between correlation and regression.
Multiple regressions analyze the relationship between multiple variables, such as quality issues, and a single
dependent variable. Basic to multiple regression is that each independent variable, each quality issue, measures a
different thing. Multiple regression as well as simple regression and correlation assume that all independent variables
are measured perfectly without error. Again, this is an unrealistic assumption. Error in measurement typically
35
amounts to 30% in survey data. This error is often much greater than the sampling error. (Andrews) .

Allen, Derek and Tanniru Rao, Analysis of Customer Satisfaction Data, ASQ Quality Press 2000, p.70.
Frank M. Andrews (1984), Construct Validity and Error Components of Survey Measures: “A Statistical Modeling
Approach”, Public Opinion Quarterly, p.404-442.

Table 3: Single Equation Systems vs. Causal Models Bivariate Methods? Single Equation Systems? Causal
Models?

Measurement Model
(Multiple measures)
Multiple Constructs
Multiple Objectives
(Dependent variables)
Complex Systems

Bivariate
Methods?
No

Single Equation
Systems?
No

Causal
Models?
Yes

No
No

No
Some

Yes
Yes

Yes

Measurement error introduces bias and inconsistency in the estimation of importance. That is, the estimates of
importance are incorrect in the sense that the expected value of the regression estimate does not equal the true
importance (bias). And the regression estimates do not converge to the correct values with larger samples
(inconsistency). The amount of bias and inconsistency varies in proportion to the amount of error.
Another serious problem is that many quality variables are highly related with one another (multicollinearity). This
causes estimates of impacts to be imprecise with multiple regression. It should be noted that the close association
between variables is a result of the nature of satisfaction. A firm’s customers tend to rate the firm high or low on
everything due to a strong halo effect. The problem of multicollinearity renders multiple regression results essentially
useless. Lastly, multiple regression allows only one dependent variable, (i.e., one objective such as Satisfaction or
Retention, but not both). Thus, multiple regression is inappropriate with multiple objectives and complex systems.
C. Causal Models
Causal models have all the features necessary to estimate impact. Causal models accept multiple measures to
control measurement error, allow multiple objectives, and allow complex, multi-level systems of relationships.
The two major approaches to causal models are covariance structure models, typified by LISREL and predictivecausal systems typified by LV-PLS. The differences between LV-PLS and LISREL are summarized below in Tables 4
and 5. The CFI Group uses a further development of the LV-PLS approach. With LV-PLS, weights and impacts are
estimated to predict key variables. That is, LV-PLS will maximize our ability to predict Satisfaction or Retention. In
contrast, LISREL attempts to account for covariance and maximizes the fit to the covariance matrix among all
variables. Consequently, correlations between all variables are treated as equally important. The CFI Group impact is
the expected (average) change on an individual score given a five-point change in a quality or experience
component. Because this is the mean prediction, the prediction applies to the aggregate as well.
LISREL produces an estimate of an effect, which is meant to represent the causal effect of an unobservable variable
onto another unobservable variable. However, because the unobservable variables in LISREL are unobserved, their
scales are arbitrary. That is, a scale and origin must be assigned to each unobservable. Usually, the origin is set at
zero and one of the measures of each unobservable is given a weight of one – all results are then calibrated relative
to the assignments.
An alternative is to assume the unobservable variables are standardized, (i.e., have mean zero and unit variance).
This is generally possible for dependent variables, but not for independent variables. The arbitrary nature of the scale
assignments means that it is difficult to interpret or compare effects. That is, a unit on one unobservable may not be
the same as a unit on another and hence, the effects cannot be compared directly. For example, if the dependent
unobservable variables are standardized, then a change of one unit on independent variable one will produce x%
change in standard deviations on the dependent variable. Similarly, a change of one unit on a different independent
variable would have a different y% change in standard deviations on the dependent variable. However, comparing
these x% and y% impacts is difficult since the scales of the independent variable may differ. Of course, without
comparing results, it is impossible to prioritize improvements.
LV-PLS relies upon Ordinary Least Squares for estimation. OLS makes no distributional assumptions. Statistical
testing in LV-PLS is accomplished via jack-knifing and blindfolding. These methods are empirical and based upon
case level data. In particular, these techniques do not require distributional assumptions.

Table 4: PLS vs. LISREL – Managerial Issues
Managerial Issues
Purpose
Which objective is more
meaningful for managers – better
prediction or best fit to
covariance structure?
Priority given to key objectives
Component Level Scores
available for benchmarking and
tracking
Case level scores for further
analysis, such as segmentation,
descriptive, ANOVA
Indices

Sample Size

PLS?
Minimize prediction error.

LISREL?
Maximize fit to covariance matrix.

Yes, to dependent variables
Yes

No, all variables treated equally
No, component scores cannot be
calculated, because scores are
indeterminate.
No, case scores are
indeterminate.

Yes
Can measure one construct
or form a composite, such as
Overall Quality.
200 is typical but can be less
(PLS fits each part of the
model separately. Thereby
reducing the number of
cases required.)

Measures within a component
must measure one and only one
construct. This is restrictive when
one wants to construct a
managerially useful index.
500+ (fits entire model at one time
thereby requiring more cases).

Table 5: PLS vs. LISREL: Statistical Issues
Statistical Issues
Estimation Method
Assumptions

Minimum specification
requirements

Feasibility of use for
analysis of complex
relationships
Efficiency of estimates
Consistency of estimates
Identification (can estimate
all parameters)

PLS?
Least Squares
Assumes linear conditional
expectation between independent
and dependent variables (x is a
cause of y, expected residual is
zero, the residual is uncorrelated
with the conditional variable. and
linear measurement relationships.
Must specify all predictors of a
dependent variable and group
manifest variables into components.

Yes
Predictions are consistent with
minimum variance.
Estimates of impacts are consistent.
Estimated component scores are
consistent at large.
Not an issue

LISREL?
Typically, maximum likelihood
Assumes linear relationships
among constructs and linear
measurement relationships. In
addition, typically assumes
multivariate normal (or related
distribution) and independent
observations.
Must specify all predictors of a
dependent variable and group
manifest variables into
components. In addition, must
specify all other relationships
among all variables and
constructs.
Yes
Yes (Parameter estimates are
efficient if assumptions are met).
Yes (If assumptions are met).
Can be problematic. To be able to
estimate certain parameters, may
need to make assumptions about
relationships about which we have
no knowledge (i.e., the covariance
between residuals).

LISREL uses several methods to estimate parameters to fit (reproduce) the covariance matrix, including unweighted
least squares (ULS), generalized least squares (GLS), and maximum-likelihood (ML). Statistical tests are derived
under distributional assumptions and come directly from the fitting functions rather than from case level information.
In particular, under the assumption that the observed variables are distributed multivariate normal, GLS and ML
provide large sample estimates of the standard errors for statistical testing. Standard errors must be used with care
when the assumptions of normality are not met. ULS can be justified without distributional assumptions. However,
standard errors and statistical tests are unavailable for ULS.
LV-PLS makes no further assumptions about the distribution of the variables or the error terms. In particular, PLS is
insensitive to non-normality of the error terms, heteroscedasticity of the error terms, and autocorrelation of the error
terms. Specifically, the LV-PLS estimates are unbiased estimates. LISREL makes many more assumptions, including
multivariate normality of the variables. Violations of the assumptions are generally viewed as problematic for LISREL.
In summary, the most important difference between LV-PLS and LISREL is how relationships are established. LVPLS produces scores both for overall and for individual cases, while LISREL does not. LV-PLS makes no
distributional assumptions, while LISREL requires strong distributional assumptions. For these reasons, and for
others presented in the following tables, it is not possible to use a covariance structure method such as LISREL for
estimating impacts. Further, scores on customer satisfaction and other components cannot be computed from a
LISREL approach (they can only be estimated with the introduction of yet another source of error).
36
The CFI Group system is an advancement of LV-PLS. LV-PLS estimates are consistent at large . That is, as the
sample size increases and the number of measures increases, the scores approach their true values. Consequently,
close association among related quality variables is an advantage rather than a disadvantage37. Moreover,
convergence in measurement implies that the estimates of importance are also unbiased and consistent38. That is,
the expected value of the impact is equal to the true importance (unbiased). And, the estimates of impact converge to
the true values as sample size increases (consistent).

Thus the CFI Group system is better able to detect the true association between experience quality and satisfaction,
more able to explain satisfaction, and to do so with greater accuracy than alternative methods of analysis. Whereas
the basic LV-PLS is more suitable than other methods for the analysis of customer satisfaction data, it is not
sufficient. Particularly, it does not handle the problems of multicollinearity and standardization well. The contribution
of the CFI Group to the basic LV-PLS method is threefold:
•
It reduces the multicollinearity by (a) using the qualitative work in model specification and (b) by extracting
and isolating any remaining excess collinearity in the quantitative analysis.
•
It retains the original scale values in analysis (the basic LV-PLS method does not do this).
•
It reduces necessary sample size by putting the variables in the context of a comprehensive system that is
estimated iteratively rather than simultaneously.
In summary, we contend that it is both cost-effective (data collection costs will be lower) and revenue effective to
adopt CFI Group’s system. It generates better information at lower cost than any other approach. The total cost
reduction also implies a shift in the budget such that a proportionally smaller amount is spent on data collection and a
larger amount on data analysis. We are eager to discuss these benefits relative to any other system.

Appendix B: Use of 10-point Scales
CFI Group’s use of 10-point scales over commonly used 5-point scales is based on a number of statistical and
managerial criteria as discussed below.
A common basis for recommending 5-point scales often rests on the assumed inability of people to reliably
discriminate more than 5 levels on a scale, where offering more than 5 levels would introduce error into the
measurement and offer weaker correlations and lower explanatory power. Research has clearly shown that people
can handle more than 5 pieces of information at one time, particularly depending on their experience in a given area

H. Wold (1982), “Soft Modeling: The Basic Design and Some Extensions,” in K.G. Joreskog and H. Wold (Eds.),
Systems under Indirect Observation: Causality, Structure, Prediction (Vol. 2, pp. 1-54)., Amsterdam: North Holland
37
Claes Fornell, Byong-Duc Rhee and Youjae Yi (1991), “Direct Regression, Reverse Regression and Covariance
Structure Analysis,” Marketing Letters, Vol. 2, No. 3, p.309-320.
38
Claes Fornell and Jaesung Cha, (1992), “Partial Least Squares,” Handbook of Marketing Research.

and ability. A 10-point scale is within capabilities of most people with little experience, and in areas of professional
expertise people are able to and will make much finer distinctions.
Because customer satisfaction data is positively skewed (where customers less frequently use the lower ends of
scales), a 5-point scale is really closer to a 3-point scale, and a 10-point scale behaves more like a 7-point scale.
Since most customers don’t really use the lower ends of scales (values 1 and 2 on a 5-point scale) and mostly use
values 3, 4, and 5, a 5-point scale offers little opportunity to differentiate positive responses. This negative skewness
introduces error into the measurement process and loss of critical, meaningful information compared with a 10-point
scale.
Societal norms and the fact that customers typically “like” companies they do business with tend to limit the number
of customers who use the very lower ends of response scales. In most cases, if a customer is so completely
dissatisfied as to have the need to use the lower ends of the scale, they will leave and stop doing business with the
company. As a result, the 5-point scale effectively turns into a 2- or 3-point scale due to limited response at values 1
and 2.
This “compression effect” also militates against the common assumption that 5-point scales offer a mid-point that can
be considered as the “average response”, a characteristic not present in 10-point scales. The mid-point argument is
only valid if respondents use, or at least contemplate, all points of the scale, and as discussed above, they do not,
and responses are consequently negatively skewed.
The use of 10-point scales significantly enhances the information that is transmitted in the surveying process. The
increased information content yields:
•

Greater precision of results, thereby providing opportunity to reduce sampling costs while maintaining the same
precision obtained using 5-point scales – OR – Ability to reduce the number of questions on the questionnaire
(which also reduces sampling costs due to reduced questionnaire length) while maintaining the same
measurement reliability offered while using 5-point scales.

•

Greater ability to link Satisfaction results to internal performance measures or measures of employee satisfaction
due to the gains in reliability and precision.

Another critical benefit of the use of 10-point scales is in the increase explanatory (as measured by R2) power gained.
•

The gain in R-squared from using the 10-point scale is an important component of accurately identifying the
drivers of Satisfaction and predicting the economic returns associated with improving Satisfaction. In addition,
for business’s which have inherently small populations, use of 10-point scales may make the difference between
being able to discern these linkages or not.

•

Further, the gain is valuable within the context of linking employee compensation to CSI. Higher correlation (Rsquared) within the model ensures that targeted employee actions will be reflected in the CSI measure and will
provide less error within the compensation system (i.e. reducing Type I and Type II errors, where employees are
not reward when CSI really did change or when employees are rewarded and CSI did not really change).

There is one area in which 10-point scales are not appropriate relative to 5-point scales – that is when there is a
desire to label each response point within the scale (e.g. 1=poor, 2=not so good, 3=satisfactory, 4=good,
5=outstanding). There are several arguments for not attaching labels to response categories, most notably 1) added
error due to violation of the interval/ratio data assumption, where it can no longer be assumed that the distance
between 1 and 2 is the same as the distance between 2 and 3, and so forth, and 2) respondent burden and increased
questionnaire length.

Criteria for evaluating scales and supporting evidence
Cox39 has reported the statistical benefits of 10- vs. 5-point scales.

Cox, Eli P. (1980), “The Optimal Number of Response Alternatives for a Scale: A Review”, Journal of Marketing Research, XVII
(November), 407-422.

•

Information content
The figure below illustrates, more information is transmitted in 10- vs. 5-point scales – approximately 2.4 bits
on a 5-point scale vs. 3.4 bits on a 10-point scale.

Figure: Relationship Between the Number of Response Alternatives and Transmitted Information Found by Bendig
and Hughes (1953)

•

Explainability and predictability (R-squared)
Figure 4 illustrates the significant added benefit of increasing R-squared, which we have defined as
explainability (ability of the quality components to explain changes in Satisfaction) and predictability (ability of
Satisfaction to explain changes in performance measures).

Figure: Gain in R2 Obtained by Using a More Refined Scale

As the chart shows, the largest increased returns are achieved when employing 4- or 5-point scales, but 10-point
scales continue to strengthen and tighten the relationships of the entire model.

•
Mean-squared correlations
The Table below provides strong evidence that the use of 10-point scales increases the reliability and accuracy
of measures over 5-point scales. Specifically, using correlations as the benchmark level, (where higher
correlations are better, indicating greater reliability) three items on a 10-point scale provide comparable reliability
(0.785) to 4 items on a 5-point scale (taking the average of 0.759 and 0.813 which is 0.785).
Table 1: Mean Squared Correlations Between Observed and True Composites by the Number of Items and
Response Alternatives Found by Jenkins and Taber (1977)40
Categories
2

.551

.657

.718

.736

.744

.747

.752

.604

.702

.759

.776

.783

.785

.790

.680

.766

.813

.827

.833

.835

.839

.725

.804

.845

.857

.863

.865

.868

.756

.828

.865

.876

.880

.882

.885

.769

.839

.874

.885

.889

.890

.893

.810

.868

.899

.907

.911

.912

.915

Items

More recently, Preston and Colman41 in a study using ratings of service quality in restaurants and stores found:
•
The rating scales that yielded the least reliable scores turned out to be those with the fewest response
categories.
•
According to the indices of validity and discriminating power examined, the scales with relatively few
response categories performed worst.
•
No corroboration with the contention that reliability and validity of scores are independent of the number of
response categories and that nothing is gained by using scales with more than two or three response
categories.
•
Statistically, scales with small numbers of response categories yield scores that are generally less valid and
less discriminating than those with six or more response categories.
•
Scales with 5, 7, and 10 response categories were rated as relatively easy to use. Shorter scales with two,
three, or four response categories were rated as relatively quick to use, but they were rated extremely
unfavorably on the extent to which they allowed the respondents to express their feelings adequately;
according to this criterion, scales with 10, 11 and 101 response categories were much preferred.
•
On the whole, taking all three respondent preference ratings into account, scales with two, three, or four
response categories were least preferred, and scales with 10, 9, and 7 were most preferred.
•
From the multiple indices of reliability, validity, discriminating power, and respondent preferences used in the
study, a remarkably consistent set of conclusions emerged.
In general, it was found that scales with two, three, or four response categories yielded scores that were clearly and
unambiguously the least reliable, valid, and discriminating. The most reliable scores were those from scales with
between 7 and 10 response categories, the most valid and discriminating were from those with nine or more. The
results regarding respondent preferences showed that scales with two, three, or four response categories once again
generally performed worst and those with 10, 9, or 7 performed best. Taken together, the results reported above
suggest that rating scales with 7, 9, or 10 response categories are generally to be preferred.

Jenkins, C. Douglas, Jr., and Thomas Taber, “A Monte Carlo Study of Factors Affecting Three Indices of Composite Scale
Reliability,” Journal of Applied Psychology, (August) 1977, 62, 4, 392.
41
Preston, Carolyn C. and Andrew M. Colman, “Optimal number of response categories in rating scales: reliability, validity,
discriminating power, and respondent preferences,” Acta Psychologica 104 (2000) 1.

Appendix C: Why do the ACSI and CFI Group use three measures of Customer
Satisfaction?
Managers must carefully evaluate the multitude of measurement options offered in the marketplace to ensure they
secure the most accurate, reliable and valid measurements of customer satisfaction.
To squarely address these concerns, CFI Group has developed measures of satisfaction that blends state-of-the-art
customer satisfaction research theory from leading universities with leading-edge statistical technologies.
Specifically, CFI Group has used the following three concepts to measure customer satisfaction to explicitly assess
the distinct dimensions of customer satisfaction. [The “Kano” Model referred to below is an oft-cited and wellaccepted conceptual model of customer satisfaction.]
•

•

Overall satisfaction - This dimension assesses a customer’s overall evaluation, quantifying what the Kano model
characterizes as evaluation of “Performance” or “Spoken” attributes. [This dimension of satisfaction
encompasses those attributes for which customers “reward” high performance and “punish” low performance in
their satisfaction ratings.]
Meeting Expectations – This dimension provides specific evaluation of what the Kano model characterizes as
“Basic” or “Expected” attributes [i.e., those attributes that must be present as a condition for a person to be
satisfied. A good example is “safety” on airplanes]. It addresses the disconfirmation theory of customer
satisfaction, which states that an individual’s satisfaction level with a product or service is strongly related to how
well their experience either confirms or disconfirms what the customer thought they would experience. [The
expectations dimension of satisfaction concerns those attributes where customers punish low performance with
lower ratings, but do not necessarily reward performance beyond their minimum requirements for satisfaction.]
Being Ideal – This question provides specific evaluation of what the Kano model characterizes “Surprise” or
“Delight” attributes [i.e., those aspects of the product or service that are unexpected and add value for the
customer]. The ideal measure accounts for the fact that customers likely refer to a benchmark or standard when
evaluating their experiences with a company’s product or service. The ideal measure provides a more absolute
evaluation of satisfaction and is based on the collection of experiences an individual has had over time and
across industries. Of particular importance is that the ideal dimension complements expectations and helps
explain loyalty. For example, “ideal” is why individuals don’t always eat fast food. Fast food may be “satisfying”
and meet “expectations,” but may not always be “ideal.” [This dimension of satisfaction encompasses people’s
attitudes toward attributes where low or absent performance is not punished, but high performance is greatly
rewarded through high satisfaction ratings.]

Questions based on these three concepts are used to build a composite or multiple-item measure of customer
satisfaction which, in addition to its conceptual rigor, offers superior reliability [freedom from measurement error],
validity, and precision [of score estimates] over other traditional measures [especially single-item “overall”
measures]42. Three questions are necessary to achieve these benefits because, as discussed, satisfaction is made
up of multiple dimensions. Asking only one question severely limits measurement coverage of customer satisfaction
and subjects the measurement to bias and measurement error…
Another important point is that by consistently employing these three questions, we can validly make comparisons
across different items, market segments, companies and even industries (through comparison with scores from the
ACSI and soon the “EUCSI”, which use the same satisfaction measure). This ability is invaluable to clients seeking a
valid and relevant basis upon which to benchmark their customer satisfaction scores.

Ryan, Michael, Tom Buzas, and Venkatram Ramaswamy. 1995. “Making CSM a Power Tool.” Marketing Research 7(3):11-16.

CFI Group’s Measure of Satisfaction
The Kano Model

Surprise
and Delight
Attributes
Not At All
Achieved

Customer
Satisfaction

Very Satisfied

Performance
or Spoken
Attributes

Degree of Achievement

Fully
Achieved

Basic or
Expected
Attributes

File Type	application/pdf
File Title	Microsoft Word - CFI ACSI Methodology WP Revised Final _Russ white paper_.d…
Author	SVail
File Modified	2006-07-27
File Created	2006-05-22