GitHub has a copy of every open source project in its database during 2020 located deep inside a frozen mine

You may have heard of various projects to back up the Earth's best bits should anything catastrophic happen. The Global Seed Vault in Svalbard, for example, keeps a 'backup' of over a million seeds to prevent any from being eradicated. Just down the road lies another all-important backup: every active GitHub repository up to 2020, buried deep down in the permafrost.

Inspired by the Global Seed Vault, the Arctic World Archive (AWA) is a collection of containers stuffed with data in a decommissioned coal mine—between 250 and 300 metres down. Each container is stuffed with reels upon reels of data—like reels of film in an old theatre in appearance.

GitHub has 188 hardened film reels down the mine in what it calls the Arctic Code Vault. This contains every active repository as of February 2, 2020. That means any open-source software on the website at that time is now filed safely away for future generations, or aliens to discover once we're all gone. That means even incredibly niche code for outdated servers is buried down there.

Each reel contains 65,000 frames, which beside a few frames at the beginning of each reel, contain QR codes. There's roughly 21 trillion bytes of data. 

There's a video over on TikTok from World Heritage on Svalbard that shows what the actual GitHub container looks like. Check out the excellent little tagline inside the door: "Open Source has won."

The idea of the Arctic Code Vault is that this open-source information, produced by thousands upon thousands of humans working together, is worth preserving for future generations. 

@arcticworldarchive

♬ original sound - World Heritage on Svalbard

GitHub explains it more eloquently in the introduction buried with the data.

"This is primarily an archive of software. Software is a series of commands used to control the actions of a computer. A computer is a device which can automatically perform mathematical functions so much faster than a human mind that it has powers far beyond us. Our computers are used to help explore the secrets of the universe, to connect all of humanity in an omnipresent web of information, to manipulate signals fast enough to transmit sounds and project detailed moving images onto electrical screens, and to control enormously powerful machinery which far exceeds both the capacity and precision of human labor.

"A computer without software can do none of these things. A computer is an extraordinary and marvelous thing, but without software, all its power is useless. The purpose of this archive is to pass what we know about software on to you."

Github's code vault QR code up close.

Take a peak at the QR codes used to store the data inside the code vault. I certainly wouldn't want to try and access it without a computer. (Image credit: Github)
Peak Storage

SATA, NVMe M.2, and PCIe SSDs on blue background

(Image credit: Future)

Best SSD for gaming: The best speedy storage today.
Best NVMe SSD: Compact M.2 drives.
Best external hard drives: Huge capacities for less.
Best external SSDs: Plug-in storage upgrades.

GitHub has included instructions of how to uncompress the data, but it's keen to point out that the job of uncompressing and understanding the data will itself require computational power provided by a computer. As the company notes: "Reading, decoding, and uncompressing this data will require considerable computation itself. In theory it could be done without computers, but it would be very tedious and difficult."

In a few ways, this data will only be useful to anyone with a PC built on the same foundations as our own. That's why GitHub also includes something called the Tech Tree.

The Tech Tree is uncompressed, unencoded, and easily read by a human. It contains information on the basics of computing and software, which may one day be useful to build back up computers from the ground up. That is, of course, if anyone ever finds it—you'll have to get up to snowy Svalbard, past the Arctic Circle, and deep into the mine to ever find it. Though ideally there are enough pointers elsewhere around the globe to lead the way.

You can find the entire guide of how to read one of these files over on the GitHub Code Vault guide, should you ever need it. Here's hoping no one ever does. Though it's a wonderful idea to value and store all this open-source software, should the worst ever happen.

Jacob Ridley
Senior Hardware Editor

Jacob earned his first byline writing for his own tech blog. From there, he graduated to professionally breaking things as hardware writer at PCGamesN, and would go on to run the team as hardware editor. Since then he's joined PC Gamer's top staff as senior hardware editor, where he spends his days reporting on the latest developments in the technology and gaming industries and testing the newest PC components.