How do you train an AI to play Minecraft? Feed it 70,000 hours of YouTube gameplay

Minecraft RTX on
(Image credit: Mojang, Nvidia)

As training regimes go, being forced to watch eight years worth of someone else playing Minecraft feels pretty harsh. When the revolution comes I fear OpenAI could be first against the wall after the robot uprising after what it's put its latest AI through in order to get it to play the standard version of Minecraft. 

I mean, OpenAI already created DALL-E, and is therefore responsible for the DALL-E mini homage by association. That's the famed AI now forced to create memes 24/7 for the internet at large

For the historical record, I for one welcome our digital overlords and have never kicked a DARPA big dog across a parking lot, jumped in front of an autonomous car repeatedly, or bugged an AI assistant incessantly for larks. I'm on your side, bots.

With all that being said, I still think the fact the OpenAI neural network can now craft a diamond pickaxe off its own back is actually pretty darned incredible.

The detailed blog post on the OpenAI site (via SingularityHub) explains how it managed to teach the network to play Minecraft, and it's some fascinating stuff. Not least how, of those 70,000 hours of Minecraft gameplay footage, it paid $160,000 to a team of contractors to create and tag up 2,000 hours of footage with labels so the AI could understand what it was looking at and how that related to its actions in the game.

The method is called Video PreTraining (VPT) and it claims its model can learn to craft diamond tools, which it says takes a proficient human around 20 minutes.

"Additionally, the model performs other complex skills humans often do in the game," states the OpenAI post, "such as swimming, hunting animals for food, and eating that food. It also learned the skill of “pillar jumping”, a common behavior in Minecraft of elevating yourself by repeatedly jumping and placing a block underneath yourself."

It's also worth noting that this uses the standard interface of mouse and keyboard, too, not some special AI-focused build of the game.

Peak Storage

SATA, NVMe M.2, and PCIe SSDs on blue background

(Image credit: Future)

Best SSD for gaming: the best solid state drives around
Best PCIe 4.0 SSD for gaming: the next gen has landed
The best NVMe SSD: this slivers of SSD goodness
Best external hard drives: expand your horizons
Best external SSDs: plug in upgrades for gaming laptops and consoles

If it was just watching the videos without context it would be extremely challenging to train a neural network, which is why it retained a pool of contractors to create a smaller dataset where they recorded both their video and the actions they took—keypresses and mouse movements. This is then used to tag that 2,000 hours of footage, and then train something called an Inverse Dynamics Model (IDM) with that dataset so it can then go off and tag the larger 70,000 hour dataset accurately.

It's this tagged video content that is seemingly the key to training such complex and open behaviours as you'll find in Minecraft. The use of VPT then has kinda been proven, and the future of this as a training method means that, as OpenAI states, it "paves the path toward allowing agents to learn to act by watching the vast numbers of videos on the internet."

Though whether that's something to wonder at or fear for I'm still not sure. I've seen YouTube, there's a lot of terrible stuff on there. There's a lot of me on there, for god's sake. 


Minecraft seeds: Fresh new worlds
Minecraft texture packs: Pixelated
Minecraft skins: New looks
Minecraft mods:  Beyond vanilla

Dave James
Managing Editor, Hardware

Dave has been gaming since the days of Zaxxon and Lady Bug on the Colecovision, and code books for the Commodore Vic 20 (Death Race 2000!). He built his first gaming PC at the tender age of 16, and finally finished bug-fixing the Cyrix-based system around a year later. When he dropped it out of the window. He first started writing for Official PlayStation Magazine and Xbox World many decades ago, then moved onto PC Format full-time, then PC Gamer, TechRadar, and T3 among others. Now he's back, writing about the nightmarish graphics card market, CPUs with more cores than sense, gaming laptops hotter than the sun, and SSDs more capacious than a Cybertruck.