Fastly, one of the internet's largest content delivery networks (CDN), went down this Tuesday, resulting in millions of users being unable to access certain websites. 85% of the network covered by Fastly returned errors on Tuesday morning, and it was all caused by a single user configuration update that uncovered a bug that had laid dormant in Fastly's infrastructure since mid-May.
In an interview with The Guardian, the head of infrastructure and engineering at Fastly, Nick Rockwell, explained what had actually happened to bring down its services and also apologized for the disruption. It feels rare for such a big company to be so transparent about this, but it's certainly something to welcome.
Content delivery networks operate on the principle that the internet is faster and more stable if the users are physically closer to them. This results in faster downloads, better security, and a host of other features.
It also means there's a point of failure if something goes wrong though, and that's exactly what happened on June 8.
“On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances,” Rockwell told the Guardian. “Early June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors."
Essentially a bug had been introduced into the system on May 12 but had laid dormant until a customer updated their settings on June 8, which triggered the flaw, taking down most of the internet with it—including PC Gamer—for many users. Fastly spotted the problem within a minute, and "within 49 minutes, 95% of our network was operating as normal.”
There's a financial impact to users not being able to access sites of course, and SEO agency Reboot estimates that the downtime cost Amazon $32M in sales. Gulp.