My example involves the very common practice of doing surveys, asking people to rate something on a 1 to N scale, then reporting results using an “average” value for that something.
Really, there are two bad things at work here, from a statistics and survey perspective, I believe. Let me deal with the one that is not the main point of this post, but important for people to consider. Once again, Wikipedia covers the broad subject, but I’ll just explain it briefly.
The Scale Problem
Most 1 to N scales have nothing numeric about them, except for the fact that they use numbers as symbols for points on the scale. It would be better to use words to indicate what the points on the scale mean, since the scale is really ordinal, at best.
That is, from left to right or right to left each position is considered higher or lower (better or worse) in some “value” than those around it. But there is no guarantee the space between them is mathematically equal (or it would be at least an interval scale), so it cannot be a ratio scale, where multiplying and dividing are legitimate operations.
And, that’s the point, using numbers to represent an ordinal scale then adding up values and dividing to get an average is, technically, meaningless. (You can count instances of such points on a scale and report how many results you got for each point on the scale.)
Of course, this is done all the time on customer surveys, conference feedback forms, the ratings on Amazon, etc. And all such examples seem to end up with an average rating for the questions on the survey. (Amazon, at least, shows you the counts for each “star” value as well as the whole feedback statement from those doing the rating.)
Another issue with ordinal scales is that there is no way to be sure one person’s 3 really is the same as another person’s because the surveys often do not place any substantive interpretation on the points to help you judge where your sense of evaluation for that question would fit. But, enough of that…you get the idea.
The Results Representation Problem
This is the more serious issue with the mononumerosis question. Let’s even say you could legitimately do adding and dividing and get an average. Does that tell you an accurate story about what all the respondents taken together felt? Or does it represent some imaginary respondent’s evaluation?
Here’s a few examples.
Let’s say you have 3 data sets of 10 responses on a 1 to 5 scale each with the following numbers of each data set being the number of 1s, 2s, 3s, 4s and 5s for each data set:
Bell - 1, 2, 4, 2, 1
Flat - 2, 2, 2, 2, 2
Camel - 0, 5, 0, 5, 0
This will give you an average of “3" for each.
I’m sure you can see from the data itself that “3” isn’t the same as “3” isn’t the same as “3” when it comes to the actual sense of what responses to the question would mean.
Here they are graphed in two ways (both showing the same thing, but one might be more meaningful for you than the other):
My point is that it matters how data is represented based on its scale and distribution. So watch out for mononumerosis (and scales) when you are given survey results.