World

The GitHub code vault - humanity's safeguard against devastation

08:01 am on 17 August 2020

Svalbard is a remote, frozen archipelago midway between Norway and the North Pole.

Svalbard, Norway. Photo: Unsplash / Chris Marquardt

Polar bears outnumber humans, yet it represents arguably the biggest insurance the world holds in case of global technological devastation.

And we just took out a fresh policy.

For the first time ever, open-source code that forms the basis of most of our computerised devices has been archived in a vault that should protect it for 1000 years.

The code vault

If you're thinking that an Arctic code vault sounds like a high-tech library crossed with a Bond villain's lair, you're not far off.

Svalbard is remote, home to the world's northernmost town, and is protected by the century-old International Svalbard Treaty.

Crucially, it's already home to the successful Global Seed Vault, which saves seeds in case entire species ever get wiped out by disease or climate change.

The Svalbard Global Seed Vault. Photo: AFP

Just down the road, the GitHub Archive Program found space in a decommissioned coal mine run by the Arctic World Archive, which already houses and preserves historical and cultural data from several countries.

All put together, the barren archipelago makes the perfect place to seal something you want to protect in a steel vault 250 metres under the permafrost.

The Arctic Code Archive aims to serve as a time capsule for the future, saving huge amounts of open-source computer code alongside a range of data including a record of Australia's biodiversity and examples of culturally significant works.

If you were to make your way into the mine and crack the large steel vault, you'd find 186 film reels inside, each a kilometre long, covered in tiny dots.

It's not just miniaturised text, though. To squeeze in as much as possible, the code is stored in tiny QR codes that pack the information in as densely as possible.

You run into open-source code every day without even knowing it. In fact, you're probably using some to read this article right now.

"Open-source" means the code is shared freely between developers around the world and can be used for any application.

That means a little piece of coding could end up in anything from your TV to a Mars mission.

The concept fosters collaborative software engineering around the globe.

It's incredibly important, and it spans a range of complexity - from huge algorithms that mine Bitcoin to single lines of code that determine whether a number is odd or even.

Archiving all of that work means it won't have to be re-invented if it is ever lost, saving time and money.

The archive reels hold a combined 21 terabytes of code. That may not seem much if you have a hard drive at home that holds 2 terabytes.

But we're not storing your photos or movies here - each character in a line of code takes up a tiny bit of space.

If someone who types at about 60 words a minute sat down and tried to fill up all that space, it would take 111,300 years - and that's if they didn't get tired or need any breaks.

Built to last

If you're making an archive that's going to last, you've got to make sure it isn't going to degrade over time.

While it might seem intuitive to store the information on something like a Blu-ray disc or on hard drives, these are notorious for breaking down.

They're designed to be convenient, not to be heirlooms you pass down for generations.

"You might have seen this in the past ... years after you touched it last, you try to boot it up again and it wouldn't work," says GitHub's VP of strategic programs, Thomas Dohmke.

"The (information) bits have been lost."

Things that survive the ravages of time tend to be physical. Think papyrus scrolls, Egyptian carvings or Assyrian tablets.

In fact, there's a good chance that people of a distant future will know more about ancient people than they will about us.

When it comes to making physical copies, your office A4 wouldn't cut it, so they used a refined version of century-old darkroom photography technology to create the archival film reels.

Each film is made of polyester coated in very stable silver halide crystals that allow the information to be packed in tightly.

The film has a 500-year life span, but tests that simulate aging suggest it will last twice as long.

Storing it in the Arctic permafrost on Svalbard gives you a host of added benefits.

The cold prevents any degradation caused by heat; it's locked deep in a mountain, protected from damaging UV rays and safe from rising sea levels; and it's remote enough that it's not likely to be lost to looters from a dystopian future.

Despite global warming, and a previous event at the seed bank where some of the permafrost melted, it's believed the archive is buried deep enough that the permafrost should survive.

Just in case, they're not stopping there.

The GitHub Archive Program is working with partners to figure out a way to store all public repositories for a whopping 10,000 years.

Called Project Silica, the goal is to write archives into the molecular structure of quartz glass platters with an incredibly precise laser that pulses a quadrillion times a second.

That's a 1 followed by 15 zeros: 1,000,000,000,000,000.

No clouds on the horizon

You might be wondering: doesn't the internet already save all of our information in the cloud?

Yes, but it's not as safe as you might think.

The hot layer is made up of online repositories like GitHub, which allows users to upload their code for anyone to use.

This is copied to servers around the world and is readily accessible to anyone with an internet connection.

While access is quick and easy, if someone removes their code from the hot layer, it is no longer available. That doesn't make for a very reliable archive.

The warm layer is run by the Internet Archive, which runs the Way Back Machine.

It crawls the web and regularly takes snapshots of sites and keeps them on their servers. Anyone can access them, but you have to do a bit of digging.

The Internet Archive isn't a perfect system - it takes regular snapshots, but anything that happened in-between can be lost.

Both the hot and warm layer work well together to give a fair idea of what the internet might have held at any given time, but they both suffer from one critical weakness: they are made up of electronics.

There are actually three levels of archiving, known as hot, warm and cold.

The internet is essentially millions of interconnected computers and huge data storage banks that your device can access.

If there was to be an event that disrupted or destroyed those computers, the information they hold - and therefore the internet - could be destroyed forever.

The Arctic vault represents the cold layer of archiving.

It's an incomplete snapshot taken at regular intervals (the plan is to add to the archive every five years), but one that should survive the majority of foreseeable calamities.

Some of the potential disasters are academic, but some we've seen before.

Going out with a bang

In early September 1859, the sun belched, and the world's very rudimentary electronics were fried.

It's known as the Carrington Event, and as the matter ejected from the sun headed towards Earth, the lights of the auroras were seen as far north as Queensland and all the way down to the Caribbean.

When it hit, the largest geomagnetic storm ever recorded caused sparks to fly off telegraph wires, setting fires to their poles. Some operators reported being able to send messages even though their power supplies were disconnected.

If that were to happen today, most of our electronics - both here and in space - would be destroyed.

And it's not really a matter of if, but when.

It also doesn't have to be a huge astronomical event that causes us to lose many generations' worth of information.

If a pandemic or economic downturn was severe enough, we might be unable to maintain or power the computers that make up the internet.

If you consider how technology has changed in just the last few decades - the rise of the internet, the increased use of mobile phones - then it's easy to understand how people living a hundred or a thousand years from now are likely to have technology that's wildly different from ours.

The archive is part of our generation's legacy.

As Mr Dohmke says:

"We want to preserve that knowledge and enable future generations to learn about our time, in the same way you can learn (about the past) in a library or a museum."

Australian data has found a home in the archive, too, including the Atlas of Living Australia that details our country's plant and animal biodiversity, and machine learning models from Geoscience Australia that are used to understand bushfires and climate change.

A modern-day Rosetta Stone

There's no saying who might want to use the archive in the future, so archivists had to come up with a solution both for those who don't speak English and for those who might not understand our coding languages.

The Rosetta Stone. Photo: AFP

The films start with a guide to reading the archive, since there's a decent chance that anyone finding them in the future may not know how to interpret the QR codes.

Even more importantly, that's followed by a document called the Tech Tree, which details software development, programming languages and computer programming in general.

Crucially, it's all readable by eye.

Anyone wanting to read the archives might need to have at least a basic understanding of creating a magnifying lens (something humans achieved about 1,000 years ago) but after that the archive could all be translated using a pen and paper.

The guides aren't just in English, either. Like a modern-day Rosetta Stone, they are also written in Arabic, Spanish, Chinese, and Hindi, so that future historians have the best chance of deciphering the code.

"It takes time, obviously ... but it doesn't need any special machinery," Mr Dohmke says.

"Even if in 1,000 years something dramatic has happened that has thrown us back to the Stone Age, or if extraterrestrials or aliens are coming to the archive, we hope they will all understand what's on those film reels."

- ABC