Meet Tranquility, the military-grade 2,500GHZ monster that powers Eve Online
If this were the Discworld, the Tranquility server would be the Great A'Tuin, the Giant Star Turtle on which all existence rides. But, this is Eve Online. Entombed within what is recognized as the gaming industry's largest supercomputer is the ruffian-riddled, single-shard universe of New Eden. According to its creators, CCP, the monstrous London-based server cluster has 3,936GB of RAM, 2,574GHZ worth of processing power and even military-grade hardware.
Tranquility didn't start out this way, over ten years it’s grown from “a couple of computers” into a classroom-sized behemoth, swallowing up new tech and evolving to meet the needs of a the half-million players that live in Eve's universe. CCP chief technology officer Halldor Fannar explains how one of gaming's biggest supercomputers came to be.
"The cluster has 2,574GHZ of processing power and even military-grade hardware."
"We initially began designing [EVE Online] back in 2000 when latency was a huge issue across the Internet. To combat this, we designed these proxies that we were going to place around the world to cache information queries on market prices and so on, just so we could reduce latency."
The idea was sound but proved completely unnecessary. By the time CCP Games finally launched EVE Online in 2003, latency had become less of a killer, more of a nuisance. "We ended up putting all the proxies into the same data center as the server. It's kind of interesting how things changed even as we rolled out. We ended up with all the server nodes in the same location where before we had expected to place them around the world."
Things kept escalating. In the beginning, CCP Games made plans for a maximum of 100,000 subscribers, a number that translated to, roughly, 20,000 concurrent users. Needless to say, those designs were shattered. The number of players continued to climb beyond initial expectations.
"We had to change a lot of things." Halldor reminisces. "We had to take services that were tied together and living together on one node and break them apart so that we could run them independently and, maybe, dedicate a piece of hardware to running a specific service."
Making an example of the market nodes and the character nodes ("It didn't use to be like that but once you have so many characters and so many people that want to access their skills - "), Halldor explains that the practice now is to identify high-use services and break them out, a process that is more complex than it sounds.
"Tranquility is perpetually assimilating new tech."
"You don't want to break out a service that still has to talk a lot to another service. Because then, sure, you might have moved it out but you're going to have a communication bottleneck between those two anyway."
As the team continued to work on moving out 'high locality' services (services that did not require constant communication with other services) and implementing web technologies similar to those being used by Facebook, the server cluster itself continued to evolve. Like a living thing, Tranquility is perpetually assimilating new technologies and unburdening itself of obsolete parts - with the help of human hands, of course.
"When technology gets better, server hardware gets better. Sometimes, we are able to simplify things and replace a couple of computers with a single computer, one that has multiple processor sockets and multiple cores."
"we had to get military clearance to go into a bunker in Texas to evaluate the hardware."
The biggest problem here, according to Halldor, is ensuring that improvements do not catalyze more issues down the road. "We also have to be careful because, sometimes, you can look at some of our problems and go, 'Hey! You should just go for that solution.' but that solution might turn out to be technologically complex or difficult to maintain. Our code base is constantly evolving and, really, maintenance is key here. If something is hard to maintain, we'll normally opt against that route and go with the one that's maybe only 80% efficient but easier to maintain."
One of the key components within Tranquility is its persistent layer, the backbone that comprises so much of New Eden's daily activities. At one point, this was contained within solid state disk drives purchased from Texas Memory System in 2009.
"The funny thing was that, at the time, the technology only existed in the military so we had to get military clearance to go into a bunker in Texas to evaluate the hardware because the company, back then, had only just started looking into commercializing this thing that they made for the US army. We were one of first clients and they thought it was really funny that they went from building things for the army to something that's so completely light-hearted."
With a grin, Halldor adds, "Of course, we told them that the Internet spaceships are serious business."
Fast forward to the present day and there are few who would look askance at Tranquility. Halldor says that hardware partners have been excited to work with CCP Games as their environment is 'pretty unique' - an understatement, given that most supercomputers are traditionally reserved for job-oriented tasks. Complicated as all this might sound, it's this adherence to optimization that keeps EVE Online functioning like a well-oiled machine.
While other developers might have buckled under the weight of the Burn Jita event, CCP Games chose to migrate the solar system to a node ("Our special snowflake node, as we call them." Halldor deadpans) of its own. Then, after installing a cap, they slowed down time based on the number of people present in the fight. The technique worked. EVE Online players found the war they wanted, and CCP Games found a technique they would later use again in the Battle of Asakai.
"That's an example of an improvement we made. The battle of Asakai couldn't have happened without time dilation."
But CCP Games isn't satisfied yet. Though they have the analogical race horses stabled and prepared for usage, CCP Games are intent keeping a step ahead of their own players. "One of the things we're working on right now is predictive algorithm so we can analyze everything that happens in the world. If we have even 10 minutes forewarning, we can move the simulation and resume it on a different piece of hardware."
And if Skynet ever comes to pass, we'll know exactly who to blame.