In my previous two posts in this series, I discussed a couple reasons why review scores from individual sources can be problematic. But what happens when the scores from all the different sources are aggregated to create a single, average score? Fortunately, in most cases, the effects of the problems tend to cancel out rather than compound, so aggregate scores are a little more trustworthy than individual scores. There are still a few problems, however, that deserve mentioning.

When averaging a bunch of different scores, aside from knowing the grand average, it is also important to have a sense of the distribution. Metacritic does a good job of this by providing a “Metascore” along with a list of all the individual review scores in order from highest to lowest, but unfortunately it doesn’t calculate the standard deviation (a quantitative measure of the spread) of the scores. Knowledge of the distribution is important for understanding how much reviewers agree or disagree on the merits and shortcomings of a game. If the distribution is narrow, then the reviewers pretty much agree on how good or bad the game is; on the other hand, if the distribution is wide, then the reviews are more of a mixed bag. This can be particularly significant for games scoring in the 70-85% range, which tend to fall into two main categories: fundamentally mediocre and love-it-or-hate-it. Simply looking at the average scores is not enough to make the distinction.

Another problem with aggregate scores is the need to convert individual scores to a common scale. Both GameRankings and Metacritic use simple formulas to convert letter and numerical scores into percentages, but these formulas prove to be problematic once you realize that two different scores which convert to the same percentage may not have the same meaning. For example, a 50% from one site might mean average, whereas a 50% at a different site could mean failing. Trying to compare those two scores based on the numbers alone is like comparing apples to oranges. It doesn’t work.

One last thing to be aware of when dealing with aggregate scores is how the average scores are calculated. Are they just straight-up averages? Are they weighted averages (and if so, how are the weights assigned)? Are the highest and lowest scores dropped before computing the averages? This may be a more subtle point, but the more information you have the better. You don’t want to use numbers that you don’t understand when making your next video game purchase.

While aggregate review scores are less susceptible to some of the flaws found in the scores issued by individual reviewers, there are still a few things to keep in mind when working with them. Just remember that you can never get all of the information you need to make an informed purchase with only a single number.

## 0 Responses to “Flaw #3 with review scores: the aggregate”