Download:
pdf |
pdfEvaluating Qualifiers in Rating Scales
Thursday 4:00 PM – 5:30 PM
July 18, 2019
Room D22
Morgan Earp
Jean Fox
Robin Kaplan
1
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Overview
Background
Motivation
MTurk Study
Case Studies
Conclusions
Limitations
2
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Background
We often use surveys to collect data on things like attitudes,
experiences, and expectations using rating scales.
Can collect data from a lot of people in a systematic way
Lots of research about writing good survey questions
It’s easy to write surveys, but hard to write good surveys.
One of the many challenges is deciding on the response options
for rating scales.
3
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Selecting Rating Scale Options
You want the options to:
Be appropriate conversational answers to the question asked
Cover the full range of situations
Be equally distributed across the full range of the construct
Our research explores if and when varying response options
cover the full scale, as well as how the response options are
distributed
4
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Definitions
Qualifiers in scales
Strength/Intensity (e.g., Not at all, Somewhat, Very)
Frequency (e.g., Never, Sometimes, Often)
Evaluation (e.g., Bad, Good, Great)
Bi-polar vs unipolar
Focusing on unipolar here
5
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Motivation
Explore the “quantity” that commonly used qualifiers represent
Explore the relative values of closely related qualifiers to
understand how they compare to one another
6
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
MTurk Study
7
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Participants (N = 355)
Online study with participants from MTurk
Mean age = 35.2 (SD = 10.7)
Education:
High school: 14.8%
Some college: 19.5%
Associate’s/Bachelor’s: 57.9%
Graduate degree: 7.8%
Gender
59.3% Male, 40.4% female
8
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Slider Task
Participants rated on scales from 0 to 100 “how much” each of
the terms meant
15 Quality terms (e.g., Excellent, Good, Average, Poor)
18 Amount terms (e.g., Completely, Very, Moderately, A little)
22 Frequency terms (e.g., Often, Frequently, Occasionally, Rarely)
Terms were presented in randomized order
Selected commonly used terms for task
9
— U.S. B UREAU
OF L ABOR
STATISTICS • bls.gov
Example
10
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Satisfactory
394
63.00
14.285
Generally
393
67.00
15.371
394
70.00
17.815
393
70.00
12.693
392
393
395
393
74.00
4.00
74.00
5.00
14.412
23.374
17.541
22.730
395
393
393
393
77.00
8.00
78.00
10.00
14.562
21.287
14.802
20.166
392
396
393
394
80.00
9.00
80.00
14.00
14.395
21.473
14.094
20.388
395
397
394
393
83.00
15.00
85.00
15.00
15.582
20.347
14.594
19.361
396
391
395
395
84.00
15.00
84.00
15.00
15.587
19.885
15.453
20.630
393
390
393
397
85.00
18.00
85.00
18.00
391
393
394
390
90.00
20.00
93.00
21.00
20.424
19.312
14.577
17.187
394
392
397
394
94.00
28.00
95.00
31.00
16.468
17.186
18.019
19.446
396
393
100.00
42.00
18.104
19.362
LABOR
TATISTICS
NowSand
then • bls.gov 393
0.35
0.49
0.59
0.59
0.66
0.69
Z value
0.73
-1.78
0.73
-1.64
0.73
-1.61
0.83
-1.51
0.86
-1.44
0.93
-1.47
0.93
-1.31
1.03
-1.27
1.10
-1.27
1.07
-1.27
1.07
-1.27
1.10
-1.17
1.10
-1.17
1.27
-1.10
1.37
-1.07
1.40
-0.83
1.44
-0.73
1.61
-0.36
42.00
17.605
-0.36
Quite
Fairly often
Comparing Quantifiers
Favorable
Pretty often
Good
Not at all
Often
Horrible
Quite
a bit
Terrible
Very
Hardly ever
Usually
Rarely
Frequently
Very little
Quite
Bad often
Very
much
Not very
Great
Poor
Highly
Seldom
Strongly
A little
Most
of the time
Not often
Very
often
Slightly
Continually
Infrequently
Excellent
Not too often
Outstanding
Less often
Extremely
Mildly
Completely
Occasionally
11
— U.S. B UREAU
OF
394 Statistics
72.00
13.015
Descriptive
73.00 Std. Deviation
14.575
N393
Median
394
74.00
12.867
393
0.00
25.991
…
15.360
18.853
14.652
20.989
Grand Mean
Std. Dev
52.54
29.53
All Terms
Completely
Extremely
Outstanding
Excellent
Continually
Very often
Most of the time
Great
Strongly
Highly
Very much
Quite often
Frequently
Usually
Very
Often
Good
Quite a bit
Pretty often
Favorable
Fairly often
Quite
Generally
Somewhat often
Satisfactory
Fine
Reasonably
OK
Fair
Moderately
Fairly
Sometimes
Neutral
Average
Periodically
Occasionally
Now and then
Somewhat
Mildly
Less often
Not too often
Infrequently
Slightly
Not often
Seldom
Poor
Not very
A little
Bad
Rarely
Very little
Hardly ever
Terrible
Horrible
Not at all
-2
12
— U.S. B UREAU
-1.5
OF L ABOR
-1
STATISTICS • bls.gov
-0.5
0
0.5
1
1.5
2
Frequency Terms
Continually
Very often
Most of the time
Quite often
Frequently
Usually
Often
Pretty often
Fairly often
Generally
Somewhat often
Sometimes
Periodically
Occasionally
Now and then
Less often
Not too often
Infrequently
Not often
Seldom
Rarely
Hardly ever
-1.5
13
— U.S. B UREAU
-1
OF
-0.5
LABOR STATISTICS • bls.gov
0
0.5
1
1.5
Quality Terms
Excellent
Great
Good
Favorable
Satisfactory
Fine
OK
Fair
Neutral
Average
Poor
Bad
Terrible
Horrible
-2
14
— U.S. B UREAU
-1.5
OF
-1
LABOR STATISTICS • bls.gov
-0.5
0
0.5
1
1.5
2
Amount terms
Completely
Extremely
Strongly
Highly
Very much
Very
Quite a bit
Quite
Reasonably
Moderately
Fairly
Somewhat
Mildly
Slightly
Not very
A little
Very little
Not at all
-2
15
-1.5
— U.S. B UREAU
-1
OF
-0.5
LABOR STATISTICS • bls.gov
0
0.5
1
1.5
2
Paired Comparisons
Selected similar terms and asked participants to select the one
that suggests “more” of that construct
14 Quality pairs (e.g., Excellent vs. Outstanding)
19 Amount pairs (e.g., Completely vs. Extremely)
17 Frequency pairs (e.g., Often vs. Usually)
Presented one at a time, grouped by construct
16
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Example
17
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Studies
18
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Studies
We solicited previous internal studies that might have useful
data using a variety of response scales
Needed enough responses
Wanted unipolar data only
Items had to have good item fit in relation to the construct they were
specified to measure
19
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Studies
We found 7 studies with data we could use as case studies
Measured 10 constructs
Burden
Concern
Confidence
Frequency
Importance
Likelihood
Persuasiveness
Sensitivity
Trust
Usefulness
20
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Studies
We found 7 studies with data we could use
Measured 10 constructs using multiple scales
Burden
Concern
Confidence
Frequency
Importance
Likelihood
Persuasiveness
Sensitivity
Trust
Usefulness
21
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study Response Scales
Case Study 1
Persuasive
(n=…)
Concern
(n=…)
Case Study 3a
Burden
(n=…)
Case Study 3b
Burden
(n=…)
Not at all
Not at all
Not at all
Not at all
A little
A little
A little
Somewhat
Somewhat
Moderately
Moderately
Moderately
Very
Very
Very
Extremely
Extremely
Extremely
“Persuasive”
Very
22
Case Study 2
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study Response Scales
Case Study 1
(Persuasive)
Not at all
A little
Somewhat
“Persuasive”
Very
23
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Extremely Low
24
— U.S. B UREAU
OF
Normal Range
LABOR STATISTICS • bls.gov
Extremely High
Amount terms
Completely
Extremely
Strongly
Highly
Very much
Very
Quite a bit
Quite
Reasonably
Moderately
Fairly
Somewhat
Mildly
Slightly
Not very
A little
Very little
Not at all
-2
25
-1.5
— U.S. B UREAU
-1
OF
-0.5
LABOR STATISTICS • bls.gov
0
0.5
1
1.5
2
Very, 0.83
Somewhat, -0.36
A little, -1.27
Not at all, -1.78
26
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
27
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
28
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
29
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Persuasive
30
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Persuasive
31
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Persuasive
32
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
33
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Persuasive
Very, 0.83
Somewhat, -0.36
A little, -1.27
Not at all, -1.78
34
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Extremely, 1.44
Very, 0.83
Somewhat, -0.36
A little, -1.27
Not at all, -1.78
35
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Very vs. Extremely
mTurk Comparison
36
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Very vs. Extremely
mTurk Comparison
Which word suggests more, or
a greater quantity?
92%
8%
Very
37
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Extremely
Extremely, 1.44
Very, 0.83
Somewhat, -0.36
A little, -1.27
Not at all, -1.78
38
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study Response Scales
Case Study 1
(Persuasive)
(Concern)
Case Study 3a
(Burden)
Case Study 3b
(Burden)
Not at all
Not at all
Not at all
Not at all
A little
A little
A little
Somewhat
Somewhat
Moderately
Moderately
Moderately
Very
Very
Very
Extremely
Extremely
Extremely
“Persuasive”
Very
39
Case Study 2
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Extremely, 1.44
Very, 0.83
Somewhat, -0.36
A little, -1.27
Not at all, -1.78
40
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Extremely, 1.44
Very, 0.83
Moderately, -0.04
A little, -1.27
Not at all, -1.78
41
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study 1
(Persuasive)
42
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study 2
(Concern)
43
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Concern
44
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Concern
45
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
46
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Concern
Concern
47
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Concern
48
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
49
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Concern
Case Study 2
(Concern)
50
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study 3
(Burden)
Burden
51
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study 3a
(Burden)
52
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Case Study 3b
(Burden)
Burden
53
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Conclusions
The response option probability distributions tended to follow the same
order we observed in the MTurk study
Specific findings
“Very” as an endpoint may not capture the full range of responses, but
Adding “Extremely” may suppress people using “Very”
Looking at “a little” vs “somewhat,” the value assigned to a qualifier by a
respondent may depend on the other responses in the scale.
54
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Conclusions
BUT the data in the case studies did not always match the expectations
set by the values from MTurk study
Some scales that should have been well-distributed based on the MTurk findings
were not, and
Some scales that should not have been well-distributed were.
Factors that may impact the interpretation of individual scale items
The construct
The other response items used in the scale
The context of the survey item
The respondent population
55
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Limitations
We did not test every possible response option in our MTurk
study, so we were limited in the case studies we could examine
as a follow-up
While we identified some interesting patterns between the
MTurk and the case studies we had available, the sample size of
case studies and constructs was extremely limited
56
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Next Steps
We would like to dig a little deeper into this, but we need more
data to identify if there are consistent effects across contexts
Constructs
Response options
Populations
Do you have publicly available data that uses some of the
response options we assessed in the MTurk study?
Please contact Jean Fox [email protected]
57
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
Morgan Earp
[email protected]
58
— U.S. B UREAU
OF
LABOR STATISTICS • bls.gov
File Type | application/pdf |
File Modified | 2020-01-31 |
File Created | 2020-01-28 |