Skip to main content

How Hitman 3's devs shrank the entire trilogy install size by over 80GB

Hitman 2
(Image credit: IO Interactive)

Hitman 3 is going to be an enormous PC game—at least if you own Hitman 1 and 2, which Hitman 3 can import to bring the entire trilogy into a single package. That sounds like a recipe for an install size big enough to make an SSD cry, because Hitman 2, with the first game's levels imported, is currently 149 gigabytes. Hitman 2 is one of the mightiest storage hogs on PC, second only to Call of Duty. But last week, we reported that Hitman 3 will actually shrink instead of grow, retroactively optimizing the first two Hitman games into a dramatically smaller package. How did IO Interactive manage to cut the total install size in half?

I had to know more, so I asked IO Interactive to break it down.

"With all content installed (including the locations from H1+H2), we’re expecting Hitman 3 to clock in at approximately 60-70 GB and we're really happy with that," IO Interactive's chief technology officer Maurizo De Pascale told me over email. 

Even without the older games bundled in, Hitman 3 is a leaner install than IO has managed to pull off with its last two games. As De Pascale explained to me, the answer is simple: More compression. But why Hitman 3's compression is so effective, and why they didn't use the same techniques last time, is where it gets more complicated (and more interesting). 

Hitman 3 uses a technique called LZ4 compression that's been around for about a decade. Almost everything in the game runs through this compression algorithm, which is especially efficient. Here's how De Pascale explained it:

"Almost all lossless compression techniques exploit the fact that data often has repeating sequences. For example, 'HITMAN' or 'IO Interactive' will likely appear frequently in an article about IOI. Those duplicated sequences don't need to be stored multiple times and can be omitted, as long as you embed some information in the compressed stream about where they appeared originally, so that you can still perfectly reconstruct the initial data.

"The super simplified description of LZ4 is that it replaces those lengthy sequences with a reference to a sequence that has previously appeared in the decompressed stream. So, for example, instead of storing the word 'compression' as-is, the algorithm can store the equivalent of 'the word that appeared X words ago,' which can be very efficiently encoded with few bits. Of course that's not exactly how it works, but it's sufficiently close to convey the idea.

"This is actually a pretty common technique, which other compressors employ as well, but LZ4 has a very performant implementation that provides a good trade-off between reasonable disk compression and great decompression speed, which makes it a common choice for games."

With Hitman 1 and 2, IO didn't apply compression as broadly "to avoid performance issues on low-spec hardware." The game only has so much CPU power to work with, so decompressing data has to be weighed against everything else it's doing, like running the AI and processing your inputs. The trade-off, then, is to skip compressing some files, resulting in a larger install but better performing game. By Hitman 3, engine improvements have lightened the load in other areas, freeing up more processing cycles to spend on compression.

Another big improvement comes from how IO is importing the data from Hitman 1. Because that game was built episodically, every episode had to have all the code and assets needed to work standalone. "In Hitman 3, we're handling the way we give access to the legacy titles in a different way, which makes it easier for us to aggressively de-duplicate these shared resources," De Pascale said.

Hard Drive

(Image credit: Pixabay via manseok)

The drawbacks of that DLC model actually mirror an older cause for bloated game install sizes: Hard drive seek times. Inside a hard drive, an arm with a tiny read/write head has to move to the physical location of the data on the magnetic disk to read it.

This means hard drives are way better suited to sequential reads than they are random reads—imagine how much harder it would be for you to read a book if a paragraph was split up between pages 1, 13, 46, and 253 and you were constantly flipping pages, for example.

To compensate, game developers "end up carefully storing meshes and textures in the order that you can predict they'll be loaded in memory," De Pascale said. "Sometimes you might even duplicate the same resource, just to avoid having to seek around and break a potentially longer sequential read."

SSDs are also faster at sequential reads than random ones, but because they don't rely on moving parts, the performance hit is nowhere near as severe as it is on a hard drive. Games designed purely for SSDs today don't have to employ those tricks. But IO Interactive has developed all three Hitman games to run on consoles, too, and the PS4 and Xbox One use 5400 RPM hard drives made a full decade ago (with measly 8MB caches, to boot). De Pascale said that IO Interactive has technology built for when its games were loading directly from DVDs, which are even slower than hard drives.

Now that the PS5 and Xbox Series X are here with SSDs, that technology is hopefully soon to be obsolete, and any game developers who still duplicate resources for faster load times can follow IO's lead in slimming down their games. It's going to be hard to upstage an 80 gigabyte diet, though. Agent 47's definitely going to need a new tux.

When he's not 50 hours into a JRPG or an opaque ASCII roguelike, Wes is probably playing the hottest games of three years ago. He oversees features, seeking out personal stories from PC gaming's niche communities. 50% pizza by volume.