Screwtape's Notepad

Delta Compression tests, 2019 edition¶

Nearly a decade ago, I spent some time learning about delta-compression tools and writing a test-harness to compare them. Recently, some of those tools were updated so I decided to update my tests and run them again.

Delta Compression tests, 2019 edition
What is delta-compression?
The contenders
The tests
The results
What about elapsed time or memory usage?
Conclusion

What is delta-compression?¶

Delta compression is a form of file compression that describes the differences (traditionally abbreviated as “d”, or the Greek equivalent, “delta”) between two files. If a person already has a copy of version 1 of a file, and you want to give them a copy of version 2, it can be much more efficient to just send them the differences from version 1 to version 2, rather than the whole thing.

The contenders¶

In my tests, I used the following delta-compression tools:

Tested delta-compression tools
Tool	Format	Algorithm	Notes
beat	BPS	Suffix arrays	The latest tool from byuu, the author of bsnes.
bsdiff	Custom	Suffix arrays	Used by Firefox to distribute updates.
bps	BPS	brute force	The previous tool from byuu, no longer supported.
Flips	BPS	Suffix arrays	Also supports the more primitive IPS format.
python-bps	BPS	hashing	My own attempt at writing a BPS comparison tool.
xdelta3	VCDIFF	Rolling hashes	The most popular tool for delta-compression.

I started writing python-bps to experiment with different delta-compression algorithms, and wrote the test-harness so I could easily compare my own code to others’. The new entries in the 2019 edition are beat and Flips; the others are the same as before.

The tests¶

I collected various pairs of slightly modified files, used each tool to compare the source and target, and recorded the size of the resulting patch file.

bsnes: compares the Windows executable for bsnes v071 with the executable for v072. This is a simple “recompiled binary” scenario.
bsnes-compression: compares a file containing a single newline with the executable for bsnes v072.
der-langrisser: compares Der Langrisser for the Super Famicom (revision 1) with the fan-translated version, which 50% larger. This is the “hand-modified binary” scenario.
dkc2: compares the original American version of Donkey Kong Country 2 with the first revision. This is the “recompiled binary” scenario again, but for a different platform.
mother3: compares the Japanese version of Mother 3 for the Game Boy Advance with the fan-translated version. This is the “hand-modified binary” scenario again, but for a much more invasive change (since the target the same size as the source).
mystic-ark: compares Mystic Ark for the Super Famicom with the fan-translated version. This is the “hand-modified binary” scenario again.
netscape-binary: compares the executable for Netscape Communicator version 4.74 for Linux with version 4.77. This is the “recompiled binary” scenario again, but for a different platform.
simple-extension: compares a file containing “A” with a file containing “A” followed by NUL. This is a basic sanity check.
synthetic-large: takes 8MB of random data from /dev/urandom as the source, and generates the target by randomly copy/pasting chunks of it.
synthetic-medium: is like synthetic-large, but with only 2MB of data.
synthetic-small: is like synthetic-large, but with only 512KB of data.
zdelta-benchmark1-emacs: compares tarballs of two versions of GNU Emacs. This test case was taken from the paper “Delta Algorithms: An Empirical Analysis” by Hunt, Vo, and Tichy (ACM Transactions on Software Engineering and Methodology, 1998). This is the “hand-modified source” scenario.
zdelta-benchmark1-gcc: compares tarballs of two versions of the GNU Compiler Collection. This test was taken from the same paper. This is the “hand-modified source” scenario again.
zdelta-benchmark2-tar625-1M: compares two randomly-generated ~1MB files with a similarity of 62.5%. This test was taken from the same paper.
zdelta-benchmark2-tar800-1M: compares two randomly-generated ~1MB files with a similarity of 80%. This test was taken from the same paper.
zdelta-benchmark2-tar915-1M: compares two randomly-generated ~1MB files with a similarity of 91.5%. This test was taken from the same paper.

In addition, many tests have a “rev” variant, which reverses the source and target files.

The results¶

In case you want to run tests for yourself, or even submit your own, I’ve published my test harness as a git repo. Note that for legal reasons, the repo does not contain every testcase listed above, just the ones that can be legally distributed.

The test harness runs every tool for every test, and produces a tab-separated values file listing the size of every patch, and the tool and test involved. This file can be loaded into your favourite spreadsheet or database tool so you can analyse it as you like.

Total patch sizes¶

Let’s start with the total size of all the patches generated by each tool:

Total uncompressed patch sizes
Size	Tool
7,934,971	bsdiff
9,787,718	xdelta3-9
10,695,964	flips
11,263,992	beat
11,667,963	byuu-bps
12,349,692	xdelta3-0
18,711,680	python-bps

Note that xdelta appears twice in the table. Unlike other formats, the VCDIFF format supports optional internal compression, and xdelta allows the amount of compression to be tweaked. “xdelta3 -0” disables the internal compression, while “xdelta3 -9” compresses as much as possible. Clearly this helps xdelta3 quite a bit, so let’s run the patches through gzip to see if that changes anything.

Total compressed patch sizes
Size	Tool
7,932,612	bsdiff.gz
9,588,976	flips.gz
9,790,203	xdelta3-9.gz
10,198,987	beat.gz
10,541,049	byuu-bps.gz
11,470,947	python-bps.gz
12,351,408	xdelta3-0.gz

“xdelta3 -9” doesn’t change much at all, as expected since it’s already compressed.

“bsdiff” also doesn’t change much, since it is also already compressed. Unlike xdelta3, compression isn’t optional for bsdiff, so there’s no compression-disabled variant.

It looks like all the BPS-based tools (flips, beat, byuu-bps and python-bps) aren’t as efficient as they could be, since gzip shaves about a megabyte off the total for each one.

Weirdly, the uncompressed “xdelta3 -0” gets bigger. I guess the delta-compression information is stored in a way that naive gzip can’t compress, which is why the format supports internal compression.

Reverse patch sizes¶

Some tests have “forwards” and “reverse” variants, where the “reverse” patch just undoes the effect of the “forwards” patch. Are some tools better at one than the other?

Total uncompressed "forward" patch sizes
Size	Tool
6,972,622	bsdiff
8,063,656	xdelta3-9
8,436,037	flips
8,793,056	beat
9,138,600	byuu-bps
10,028,155	xdelta3-0
15,214,791	python-bps

Total uncompressed "reverse" patch sizes
Size	Tool
962,349	bsdiff
1,724,062	xdelta3-9
2,259,927	flips
2,321,537	xdelta3-0
2,470,936	beat
2,529,363	byuu-bps
3,496,889	python-bps

The total size of reverse-patches is much smaller than forward patches, because there’s not as many reverse patches in the corpus. The relative performance of each tool is quite consistent, except that xdelta3 -0 is much better in reverse than forwards, while xdelta3 -9 remains the same. Perhaps that’s because forward patchse add data that needs to be inserted into the file, which can benefit from compression, while reverse patches tend to remove data, and therefore do not benefit as much from compression.

Patch sizes by use-case¶

Now that we’ve establish trends across our entire corpus, let’s look at the different use-cases for for delta compression. Do different tools behave differently for the “recompiled binary” use-case?

Total uncompressed "recompiled binary" patch sizes
Size	Tool
492,739	bsdiff
1,237,717	xdelta3-9
1,676,116	flips
1,821,815	xdelta3-0
1,858,891	beat
1,872,539	byuu-bps
2,547,556	python-bps

bsdiff is the undisputed champion of this category, which is unsurprising since it was specifically designed for such things.

How about hand-modified binaries?

Total uncompressed "hand modified binary" patch sizes
Size	Tool
1,857,705	xdelta3-0
1,862,345	xdelta3-9
1,869,546	bsdiff
1,954,570	flips
1,986,327	beat
2,023,865	byuu-bps
2,177,781	python-bps

bsdiff falls back into the middle of the pack for hand-modified binaries. Somewhat bizarrely, the uncompressed xdelta3 -0 is smaller than every other tool, including the compressed xdelta3 -9. However, all the tools are pretty closely-matched in this category, so it might not be a significant difference.

The last major category is “hand modified sources”:

Total uncompressed "hand modified source" patch sizes
Size	Tool
2,386,047	xdelta3-9
2,616,045	flips
2,671,313	beat
2,860,706	bsdiff
2,937,447	byuu-bps
3,350,040	xdelta3-0
7,721,845	python-bps

For mostly-text data like source-code, bsdiff is again mediocre, and Flips and beat make a good showing. xdelta3 -9 makes the best showing, possibly because the new data added into the target file is text, and therefore compresses well.

Bonus: File compression¶

If you think about it, traditional file-compression is just a special case of delta-compression where the source file is always empty, so any delta-compression tool should be usable as a regular file-compression tool. Just for kicks, let’s try using our delta-compression tools to compresst the bsnes v072 binary from the “bsnes” test:

Compressed bsnes v072 binary size
Size	Tool
356,530	xdelta3-0
397,620	xdelta3-9
401,225	bsdiff
546,702	flips
575,797	beat
603,691	byuu-bps
1,124,889	python-bps

For comparison, the uncompressed file is 1.1MiB, and gzip -9 compresses it to ~400KiB. Surprisingly, xdelta3 beats everything else, including gzip. Even more surprisingly, xdelta3 -0 beats xdelta3 -9. I could not begin to imagine how.

What about elapsed time or memory usage?¶

Total patch size is an important measurement for any kind of file compression, but not the only one. If a particular tool can generate super-tightly compressed files, but requires a gigabyte of RAM for each kilobyte of the source file, or takes days to run, it may not be worth the effort.

Unfortunately, my test harness does not record elapsed time or memory usage, so I can’t provide those numbers here. However, while running the tests I observed the following:

Every tool whose algorithm involves suffix arrays uses quite a bit of RAM to store the suffix array.
Because it uses rolling hashes, xdelta3 is quite RAM-efficient.
“byuu-bps” is quite slow, because of its brute-force algorithm, and “python-bps” is hilariously slow, because of Python.
Every other tool is acceptably fast.

Conclusion¶

From the numbers we’ve seen, we can draw a few conclusions:

If you’re developing software and want to distribute updates to your users, bsdiff is the only game in town.
If you’re hand-modifying binaries or source less than a few hundred megabytes in size, Flips produces great patches, and the BPS file-format is the simplest to support in your patcher.
If you’re hand-modifying binaries like CD or DVD images, you’ll probably want xdelta3.