Tuesday, May 10, 2016

[GameDev/WoW] Player Feedback: Forums versus Metrics

Recently, Watcher AKA Ion Hazzikostas--a designer on World of Warcraft--crafted an illuminating forum post about the pitfalls of manual player feedback from an extremely diverse player base.

An excerpt:
Almost every facet of WoW is an activity that caters to a minority of the playerbase ... [WoW] is not a narrow game, but rather one that can be enjoyed in numerous different ways, by people with hugely diverse playstyles ... We are [listening] - just to many, many different voices. And it may be that a given change, feature, or reward is simply aimed at a different portion of the playerbase. Or we could be wrong and we haven't realized it yet.
World of Warcraft is a huge game. Some folks focus entirely on PvP; others, raiding; still others, pet battling. Collecting mounts, transmog, small group content, questing and leveling, and more. Many of these activities can cause friction when game design conflicts arise between them. It also means that there are a lot of voices clamoring to be heard, and many of those voices conflict with each other, not to mention the oft-alluded silent majority.

An excellent discussion on Twitter suggested metrics or polling being a better barometer of player activity and thoughts, and in some ways it is. In others, nothing beats having someone tell you what they're thinking.


Let's talk about metrics for a moment. Metrics, or analytics, are an impartial way to gather data about your program's usage in the wild--in this case, the program is a game. It's impartial because the programmers put in the metrics to report into the code so there's no self-reporting biasing going on that forum posts suffer from.

Many commercial programs collect metrics, video games especially. Analytics and metrics are a huge business, with many 3rd party platforms that game devs can buy into and integrate into their game. The basics that most games are probably gathering are:
  • DAU/MAU (Daily Active Users, Monthly Active Users)
  • Sessions
  • Retention (%age of users who open your game at least once in 1, 3, 7, 30 days)
  • Conversion (%age of users who make in-app purchases)
  • Churn (player loss over time)
  • Source, Sink, Flow (in-game currency tracking, ie: how much gold are you making, spending?)
  • Start, Fail, Complete (tracking task starts/success/failures)
MAU is something folks who're reading the recent Activision-Blizzard financial report will be familiar with: how many unique users a month logged in/played your game. For a game like WoW, with Battle.NET accounts, it's as simple as querying their database for the number of players who've logged on the game in the past month. For others, it becomes a little more of a guessing game based on identifiers like IP Addresses or a device ID run through an anonymizing hash algorithm. Clearly the latter is not perfect: it's possible to have collisions, IPs get reassigned, device IDs may change if enough of the hardware changes on a PC, and so on. But it provides a more than statistically significant guess.

Retention, Conversion, and Churn are all clearly important for monetizing your game. Are lots of players purchasing things from the store? Are your players sticking around and logging in a lot? Are you losing players over time, and how long on average do they stick around? What a lot of those questions miss, however, is the why. They can indicate you have a problem or a success, but not why. That generally requires some further custom metrics--or perhaps some enterprising users making a blog or forum post.

Source, Sink, Flow is good for tracking your in-game economy. If Flow is positive, your economy--single player or multiplayer--is inflating. Flow is negative, your economy is contracting. Depending on the game and the currency (and if you can pay real money for said currency), whether you want a stable flow, or a positive or negative one will vary. A good game designer will already have a model economy (probably in a giant Excel spreadsheet), and this data can confirm that in the wild the economy is working as intended.

Finally, Start, Fail, Complete is good for tracking bottlenecks in your game-play. Is there a specific level that people are constantly failing at? Are folks performing specific tasks but not completing them? These are the sorts of metrics that tell you what your players are doing in-game, and how well they're doing it. This can be a warning that certain features are under-utilized, or too difficult (or more accurately, if the feature difficulty curve is where you want it to be). Again, though, this doesn't tell you why they failed.

A one-week snapshot of the average maximum wave reached in our Combat Arena
Of course, each game can have their own custom metrics. For example, Eon Altar we track our Combat Arena usage, and have stats on the average wave players make it to before wiping. If that number was too high, we'd likely have to tune the difficulty up. If the number was too low, balancing it down would be the way to go.

I don't know who got that far, and I don't know who failed early or why. The data is anonymized before it gets to me so I couldn't find out if I wanted to. I do know how many players they started with, but right now I don't know what classes they took--something I'd like to get eventually to help understand how our balance is, though I have a really good guess at how it'll shake out after watching many players at PAX Prime, PAX East, and Casual Connect.

Metrics can give you a broad snapshot or even trends over time of how features, levels, and players are doing. They can be an early alert system of a sorts to problems in your game, but a lot of the time you can only guess as to why the problem exists even if you know you have one. 

Direct Player Feedback

This is where other forms of feedback come into play: in-house playtesting, trawling Twitter and external forums for people's opinions, your own forums, watching Twitch or Let's Plays on YouTube, or direct conversations with your players.

Metrics can't tell you if players are having fun, or if they're just doing something because it's the most efficient route, whereas reading people's comments will definitely tell you how the vocal minority feels.

Sometimes it's ridiculously overwhelming: take the Real ID fiasco a few years ago where Blizzard was going to make your real full name mandatory on your Battle.NET account. I can't think of a time when feedback was so unanimously strong in gaming outside of maybe the Mass Effect 3 ending, and even then that wasn't as loud in my opinion.

Other times, it's heated on both sides, but there's no clear single answer. Flying mounts in World of Warcraft comes to mind here. Pro, anti, and even apathetic arguments were made vociferously--the apathetic crowd mostly just wanting everyone else to stop arguing already.

Players can be wrong, too. As Ion said in his post, players have a small snapshot of the world around them. An example of this in World of Warcraft were Frost Mages in PvP in Mists of Pandaria. Many people complained that they were extremely overpowered and wanted them nerfed, but Blizzard's metrics showed that while they were a little OP in lower brackets, they were actually struggling in the higher brackets. Nerfing Frost Mages outright would've removed them from higher level PvP play entirely--clearly the wrong move once you saw the metrics. A different approach was needed there.

Sometimes, players giving feedback are, well, I won't say lying per se, but their actions belie their posted words. A common example here is the CoD: Modern Warfare 2 backlash where folks were talking about boycotting in favour of dedicated servers, but that lasted all of a hot minute. And trust me, developers will notice: the metrics aren't biased. If sales actually plummet, or DAU metrics plummet, marketing will see it.

Direct player feedback is still incredibly important though, because you can get all sorts of context that's missing from soulless data. Maybe that boss' telegraph is really hard to see. Maybe something is really fun, but out of sight, out of mind so nobody noticed it. Maybe a feature that has a high engagement eats a lot of player energy, but isn't actually that much fun for a large part of the populace. Fun is a pretty nebulous concept, and data can't really tell you that.

Data is also impersonal. There is something to be said about the glow of seeing someone gush about your game in a forum post. The flip side, of course, is the terror of being on the receiving end of an internet lynch mob. For some, the threat of the flip side is too much, and so they'll only ever look at the data. For others, the personal experiences drives their satisfaction and design choices. It really depends on the developer.

Balancing the Two

I disagree with Wolfyseyes a little in that metrics or polling isn't necessarily a better gauge of temperature than forums. I see metrics like looking outside your window in the morning. You can tell if it's cold or hot, raining or cloudy, and maybe windy, but you can't tell the precise temperature or wind speed. You can't tell that the rain incoming is only going to get worse because it's the remnants of a tropical storm coming from Hawaii.

With player context you get a more complete picture, even if that picture is fractured between many different viewpoints. You may have to look at feedback from a specific angle or use a different lens. Sometimes that player isn't who you're building for--I wouldn't change Eon Altar to be more like a MOBA with real-time skill shots even if a bunch of people asked for it; that's just not our vision for our game. That's okay! For a game like WoW, that's a little bit of a harder sell because they have so many varied customers, so they have to have more lenses which to interpret direct feedback.

As metrics improve, and analytics technology gets better at sifting through big data, we'll be able to answer more of the "why" without that direct player feedback. But I don't ever see direct player feedback going away. It'll be very important for many years to come.
#WoW, #GameDevelopment

No comments:

Post a Comment