How a team of AI researchers took on all comers at StarCraft 2

(Image credit: Google)

This article was originally published in PC Gamer issue 340. For more quality articles about all things PC gaming, you can subscribe now in the UK and the US.

Anyone who’s played a single-player videogame recently—and the chances are you have—will appreciate the very special kind of artificial stupidity necessary to make you feel like you’re getting a decent challenge, but that it was your finely honed skills that finally allowed you to triumph rather than pre-programmed flaws in AI strategy.

Putting up a stiff challenge without cheating—no reading the player’s inputs, reacting to the first frame of an attack or psychically knowing where his units are—is a difficult thing to code. But what if you wanted your AI to crush all its enemies and see them driven before it? All without cheating? That would be even more difficult.

A team from Google has done it, however. The AlphaStar AI from DeepMind triumphed over puny humans in a series of blind games (the meatbags had no idea they were playing against an AI) of StarCraft 2 on Battle.net. It did rather well, achieving Grandmaster rank and out-performing 99.8 percent of players on European servers.

StarCraft 2 has emerged by consensus as the next big challenge for AI now that it’s mastered chess, go, and Jeopardy. While IBM’s latest supercomputer takes on the Cambridge Union in a debate, the DeepMind team chose videogames, something perhaps unsurprising since its CEO is one Demis Hassabis, who readers with long memories might recall. Having contributed some level design to Bullfrog’s Syndicate, Hassabis became co-designer and lead programmer on 1994’s Theme Park, which went on to sell ten million copies. He was 17 at the time.

AlphaStar runs on Google’s proprietary tensor processing units (TPUs), which are application-specific integrated circuits (ASICs) developed specifically for neural network machine learning. These are the same chips that form the back-end of services such as Google Photos, where one can process over 100 million photos a day, and Google Street View, where they, impressively, extracted all of the text in the Street View database in less than five days.

The use of the word 'tensor' naturally throws up comparisons to the tensor cores that enable DLSS in Nvidia’s RTX GPUs, but compared to the graphics chip, TPUs are less precise and lack any hardware for texturing and rasterisation but can rattle through high volumes of computation at a remarkable rate. Google deploys its third-generation TPUs in pods of up to 1,024 chips. "Each of these pods is now well over 100 petaflops," said Sundar Pichai, CEO of Google parent company Alphabet, at the company’s annual I/O conference in Mountain View, California. "This is what allows us to develop better [machine learning] models, larger models, more accurate models and helps us tackle even bigger problems. These chips are so powerful that for the first time we’ve had to introduce liquid cooling in our data centres."

Playing the field

According to a paper published in the journal Nature by the DeepMind team, StarCraft "has emerged as an important challenge for artificial intelligence research" thanks to its "enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges". Unlike chess, the sheer number of 'pieces' that are in play in StarCraft poses immense challenges to an AI.

The game offers 10^26 possible legal moves at every time-step—the moments when you stop to think what to do next—and that, to use a technical term, is a lot. Nonetheless, the AI was limited to making its moves at human speed, forcing it to triumph through developing winning strategies rather than blasting its human opponents with a series of superhumanly fast manoeuvres.

Indeed, the DeepMind team had to repeatedly change the usernames that their AIs were hiding behind, because other players noticed that the three (one for each StarCraft 2 race) agents had played the same number of matches, and were carrying out actions with a speed and precision beyond that of mere synapse-based neural pathways. This led to further restrictions being placed on the AI and a rotation of usernames to keep opponents in the dark over who they were really playing against.

AlphaStar began its training by watching anonymised human games released by Blizzard. It began to imitate the strategies, and was soon able to defeat the game’s built-in AI at Elite level around 95 percent of the time. Of the game’s three races, initially Protoss was preferred, although eventually AlphaStar would take on the world with Zerg and Terrans too. Different instances of the AI then began to play games against one another, with branches of successful instances being taken and reintroduced to the league as new players. It only took 44 days of this to train AlphaStar, although DeepMind has estimated that this exposed each AI agent to up to 200 years of realtime StarCraft 2 play.

Before the online matches, the AI was tested against Dario 'TLO' Wünsch, a pro StarCraft 2 player from Germany who plays Protoss at Grandmaster level. AlphaStar beat him 5-0. "I was surprised by how strong the agent was," he said on DeepMind’s blog. "AlphaStar takes well-known strategies and turns them on their head. The agent demonstrated strategies I hadn’t thought of before, which means there may still be new ways of playing the game that we haven’t fully explored yet."

DeepMind hopes its intelligent systems "will one day help us unlock novel solutions to some of the world’s most important and fundamental scientific problems", but taking on the world at StarCraft is a serious business in itself, with the prize pool at the 2019 WCS Global Finals tagged at $500,000. As we went to press, it had not yet emerged what the DeepMind AI intended to spend its prize money on.

Ian Evenden has been doing this for far too long and should know better. The first issue of PC Gamer he read was probably issue 15, though it's a bit hazy, and there's nothing he doesn't know about tweaking interrupt requests for running Syndicate. He's worked for PC Format, Maximum PC, Edge, Creative Bloq, Gamesmaster, and anyone who'll have him. In his spare time he grows vegetables of prodigious size.