Should the skill of competitive Hearthstone players be primarily assessed on results?

2015 Hearthstone World Champion Sebastian “Ostkaka” Engwall with his trophy.

2015 Hearthstone World Champion Sebastian “Ostkaka” Engwall with his trophy.

Are results really the best way to gauge the skill of competitive Hearthstone players, given the amount of RNG in the game? We asked two expert players who feel passionately about the subject to argue it out.

NO... says Modernleper

will "modernleper" bindloss

William 'Modern Leper' Blindloss

Will is a regular top 100 legend Hearthstone player who previously played for the ManaLight team and finished 3rd/4th at Dreamhack Cluj. He also loves to write about the game, having contributed several articles to Team Archon’s site. You can find him on Twitch here and on Twitter here.

Before I begin I’d like to make something clear: Hearthstone is a very skillful game. As my twitter feed during a major tournament never fails to demonstrate, we are extremely far from the point where there is consensus among the pros regarding what a player should do each turn. It is important to clarify that from the beginning, because I don’t want people mistaking the rest of my argument as making the case for Hearthstone being an unskilled game. But while there is plenty of skill in Hearthstone, the degree to which it affects the outcomes of games is much smaller than in other esports, or indeed in most competitive pursuits in general.

The crucial point is this: it is not uncommon for the best player to lose in a game of Hearthstone. This should come as no shock to most people familiar with the competitive scene, or indeed the “Never Lucky” narrative adopted by many of the saltier residents of Twitch’s Hearthstone section. But what a lot of people don’t quite understand is the sheer frequency at which the better player loses. Winrates at Legend rank—the highest tier of Hearthstone’s ladder system—provide the strongest example of this, with even 60% over a reasonable sample of matches being the mark of a very strong player.

So, if we are to use results as the primary metric for evaluating the skill of Hearthstone players, then the sample size has to be high enough to give us a reasonable level of confidence that it provides an accurate representation of a player’s skill, and not just a short term upswing or downswing that can be accounted for by variance. It would be obviously ridiculous to claim a sample of five games is an accurate representation of a player’s skill. Equally, a sample size of one million games would give a near-perfect read on how good a player is overall.

The problem is, the sample sizes used right now are simply too small. Over half of the top 100 Hearthstone players on the GosuGamers ranking have under 100 games recorded. In that sort of territory, the 95% confidence interval is close to 10. This means that if a player has a 60% winrate over those 100 games, we can only be 95% confident of his or her true winrate being between 50% and 70%, which is not tremendously helpful. It is somewhat worrying that someone with a true winrate of 50% could find themselves in the top 10 of the most widely recognized ranking, but this is the situation we find ourselves in.

When players like [Lifecoach] are betrayed by a bad run of variance, the need for an alternative ranking system becomes obvious.

There are some players whose analytical prowess becomes evident simply by talking to them while they play the game: I would put Lifecoach, Ostkaka and Purple in this category. I have a memory of being in a Skype call with Purple during the Miracle Rogue era, and being blown away by how much better he understood the game than me at the time. And quite often you don’t need to be a high-level player yourself to see this: people watch Lifecoach’s stream because his skill is clear just from listening to him talk. When players like this are betrayed by a bad run of variance, the need for an alternative ranking system becomes obvious.

My solution to assessing the skill of players, therefore, is to poll the pros. The logic is based on two assumptions: that people are more likely to be correct than incorrect in their judgement of a player’s skill, and that of all people involved with Hearthstone, the best players are the best at judging skill. Find a sample of players who fulfill some criteria that it is generally accepted as correlating positively with skill—perhaps a threshold of HWC points earned might be a reasonable start—and ask them to create a ranking of players. It is important to note that the players polled do not necessarily have to be the world’s very best—indeed, how would we know which players are the best in the absence of a fair ranking system? But they would not need to be, because the quantity of people polled would even out the errors made by individuals thanks to an effect known as ‘the wisdom of the crowd’.

It is entirely possible that, in the future, tournaments will become so numerous, and our ranking systems so precise, that we will be able to determine player skill purely on the basis of results. I sincerely hope that this is the case. Until that time, however, an aggregation of player opinion is the best of our options, simply because it offers a means of measuring player skill not hopelessly stunted by the inadequacy of tiny sample sizes.

YES... says Shevvek

Shevek

Shevek

"Shevek" has established himself in the Hearthstone semi-professional scene by writing a strategy column for Team Archon. He is a caster and player for RudeHouse Gaming, streaming several days a week here. You can also follow him on Twitter here.

The game of Hearthstone has one goal, toward which every decision aims: winning. Modernleper, I think, imagines that wins and losses, bit by bit, reveal a player’s ineffable “true skill.” Skill, however, does not exist apart from actual wins. All skill means is the likelihood a player will defeat a given opponent.

When he says that there aren’t enough games played for competitive results to accurately reflect skill, what he means is that some players have won more, and others less than they “deserve.” Luck, he supposes, has not been fair, because it takes too many games to even out. Which may be true—especially if, as GosuGamers does, you exclude ladder results. Ladder comprises vastly more games, and top ladder results are accordingly more consistent.

If we accept Modernleper’s position, then it would seem to predict that over the next several years we will see the “undeserving” players lose while the “deserving” ones finally get their long awaited wins. If so, he would have to agree that results are ultimately the best (I would say only) objective reflection of skill. The alternative position would be to argue that even with more and more games played, results will continue to be inadequate. But if that were true, it would mean that what pros call skill is actually something more like “beautiful play”, but which doesn’t substantially affect competitive results. I can’t imagine he thinks that.

Using results as the sole measure of skill ignores the actual content of the game we spend so much energy studying, which is frustrating. I agree with Modernleper that better players are better judges of decision making. But if we are to take such judgments as an objective measure of skill, appealing to the authority of expertise is not enough. It must be demonstrated that the subjective judgment of experts predicts future wins and losses at least as well as win/loss results, and that it does so without bias.

If there were some sort of AI that could perfectly calculate every possibility in a game of Hearthstone, it would certainly be able to accurately judge skill—i.e. predict winrates in advance of games being played. The trouble is: there is no such Hearthstone AI. While bots can play simple decks high on ladder, Hearthstone in general, especially deckbuilding, is far too complicated to completely and definitively solve.

When pros assess the decisions of other players, they often talk as if they are checking against the “correct” play that a perfect AI would have made. Actually, though, they are judging each choice against what they would have done in the same situation. Often they’re right: tactical mistakes can be quite clear cut—and several players I talked to emphasized tactics in their criteria for judging skill. But choosing the right strategic line, deckbuilding, psychological and physical preparation—these are more art than science. A tactically sloppy player that can consistently do well in tournaments might actually be better than a polished player who chokes a lot or lacks inspiration.

No matter how much you predict a player will lose, you cannot take away the games she has already won. Skill is nothing more than a prediction of future wins.

The Hearthstone competitive scene is already notoriously conservative—to see that, look no further than the countless pros calling new cards trash whenever they don’t fit existing deck archetypes. I routinely hear players I respect dismiss unusual decks in the face of objective results (e.g. finishing rank 1 Legend). Relying on the subjective evaluations of respected pros—individually or in aggregate—can only serve to reinforce the biases prevalent in the competitive scene, making it harder for new names and new strategic approaches to gain respect.

Modernleper emphasizes deeply insightful players that would be undervalued by results alone, but his position inevitably also leads to questioning the validity of tournament winners if pros consider them undeserving. But no matter how much you predict a player will lose, you cannot take away the games she has already won. Skill is nothing more than a prediction of future wins. To suggest that a tournament winner did not deserve to win—that she won in spite of, not because of her skill—is both rude and philosophically incoherent. I wish to emphasize this, because I see it happen far too often. Undermining the legitimacy of tournament winners robs them of fans, sabotages their careers, and saps professional competition of vital drama. I worry that some pros—to be clear, not Modernleper—use their expert commentary as a form of bullying, to make themselves feel better about their own skill at the expense of a healthy, positive scene.

Admiring and criticizing players is how we study and improve. It’s entirely appropriate to choose a practice partner or compliment an opponent on that basis. Subjective judgments should not, however, be used as a basis for supposedly objective player rankings. Who, for one thing, is to decide which players are qualified to judge skill? I have yet to meet a competitive player who does not include himself on that jury. If the judges are to be decided based on their ladder results, why not do away with the judges and let every player’s results speak for themselves?

Pcgp Logo Red Small

PC Gamer Pro is dedicated to esports and competitive gaming. Check back every day for exciting, fun and informative articles about League of Legends, Dota 2, Hearthstone, CS:GO and more. GL HF!

PCGamer

Hey folks, beloved mascot Coconut Monkey here representing the collective PC Gamer editorial team, who worked together to write this article! PC Gamer is the global authority on PC games—starting in 1993 with the magazine, and then in 2010 with this website you're currently reading. We have writers across the US, UK and Australia, who you can read about here.