Destiny 2's inventory-deleting bug was caused by a remarkable run of bad luck

(Image credit: Bungie)

A couple of weeks ago, Bungie was forced to take Destiny 2 offline for the bulk of a day to fix a bug in an update that caused widespread loss of glimmer and rare enhancement materials. Yesterday, it happened again—it didn't take quite as long to fix (because developers knew what was going on this time) but it was still several hours of downtime for a major online game.

You might wonder how this kind of thing happens in a well-established game being run by a large, experienced game studio. If so, that large, experienced game studio is here to explain: With the game back online, Bungie posted an unusually deep and detailed explanation of what went wrong, and what it's doing to avoid that kind of gong show in the future.

It's a long and complicated (but also legitimately interesting) tale, but the short version, as it so often is, is that a bunch of small problems snowballed into a big problem, and then, well, mistakes were made. And then another problem, entirely unrelated, caused the re-emergence of the first problem, which is how yesterday's mess came about.

It's all fixed up now, although not without another character rollback. (The second in Destiny 2's history, apparently—Bungie said that the rollback two weeks ago was the first.) To help prevent this particular issue (but not others, because that's the way she goes) from happening again, Bungie also specified seven "preventive measures" it's taking going forward:

  • We have added further safeguards to our process for “hot-patching” our servers to ensure that they cannot start with an unexpected version. This change is in place as we spin up the game today.
  • We have fixed the issue that caused a small fraction of WorldServers to crash on startup. This fix will be deployed with Season 10. 
  • The permanent fix for character corruption will be rolled into the next update as an executable change, removing the need for the configuration override. (Unfortunately the 2.7.1.1 Hotfix was too far along to benefit from this).
  • Looking ahead, we are investigating ways to speed up our rollback and recovery mechanisms.
  • In a future release, we will address the issue that can cause servers to skip loading configuration data.
  • We will also add more protections to the login-account clean-up code, to help prevent future bugs from being introduced into such a critical area.
  • We are updating our development methodologies to catch issues like this earlier in the release pipeline. 

Downtime and rollbacks are frustrating, but the explainer gives us a little more insight than we normally get into how and why things can go so completely wrong, so quickly: from a "conceptually reasonable" fix several months ago that spun off "subtle side effects," to "what we thought was an impossible situation" that caused yesterday's problems.

"We know today’s outage and character rollback has been frustrating for you, especially with launch of Crimson Days, just as it’s been frustrating for us to realize that this is a problem we should have been able to avoid," Bungie said. "We’re sorry for the frustration and inconvenience this caused and will continue to work to prevent these kinds of things from happening again."

TOPICS
Andy Chalk

Andy has been gaming on PCs from the very beginning, starting as a youngster with text adventures and primitive action games on a cassette-based TRS80. From there he graduated to the glory days of Sierra Online adventures and Microprose sims, ran a local BBS, learned how to build PCs, and developed a longstanding love of RPGs, immersive sims, and shooters. He began writing videogame news in 2007 for The Escapist and somehow managed to avoid getting fired until 2014, when he joined the storied ranks of PC Gamer. He covers all aspects of the industry, from new game announcements and patch notes to legal disputes, Twitch beefs, esports, and Henry Cavill. Lots of Henry Cavill.