Screwtape's Notepad

Delta Compression tests, 2019 edition

Nearly a decade ago, I spent some time learning about delta-compression tools and writing a test-harness to compare them. Recently, some of those tools were updated so I decided to update my tests and run them again.

What is delta-compression?

Delta compression is a form of file compression that describes the differences (traditionally abbreviated as “d”, or the Greek equivalent, “delta”) between two files. If a person already has a copy of version 1 of a file, and you want to give them a copy of version 2, it can be much more efficient to just send them the differences from version 1 to version 2, rather than the whole thing.

The contenders

In my tests, I used the following delta-compression tools:

Tested delta-compression tools
Tool Format Algorithm Notes
beat BPS Suffix arrays The latest tool from byuu, the author of bsnes.
bsdiff Custom Suffix arrays Used by Firefox to distribute updates.
bps BPS brute force The previous tool from byuu, no longer supported.
Flips BPS Suffix arrays Also supports the more primitive IPS format.
python-bps BPS hashing My own attempt at writing a BPS comparison tool.
xdelta3 VCDIFF Rolling hashes The most popular tool for delta-compression.

I started writing python-bps to experiment with different delta-compression algorithms, and wrote the test-harness so I could easily compare my own code to others’. The new entries in the 2019 edition are beat and Flips; the others are the same as before.

The tests

I collected various pairs of slightly modified files, used each tool to compare the source and target, and recorded the size of the resulting patch file.

bsnes
compares the Windows executable for bsnes v071 with the executable for v072. This is a simple “recompiled binary” scenario.
bsnes-compression
compares a file containing a single newline with the executable for bsnes v072.
der-langrisser
compares Der Langrisser for the Super Famicom (revision 1) with the fan-translated version, which 50% larger. This is the “hand-modified binary” scenario.
dkc2
compares the original American version of Donkey Kong Country 2 with the first revision. This is the “recompiled binary” scenario again, but for a different platform.
mother3
compares the Japanese version of Mother 3 for the Game Boy Advance with the fan-translated version. This is the “hand-modified binary” scenario again, but for a much more invasive change (since the target the same size as the source).
mystic-ark
compares Mystic Ark for the Super Famicom with the fan-translated version. This is the “hand-modified binary” scenario again.
netscape-binary
compares the executable for Netscape Communicator version 4.74 for Linux with version 4.77. This is the “recompiled binary” scenario again, but for a different platform.
simple-extension
compares a file containing “A” with a file containing “A” followed by NUL. This is a basic sanity check.
synthetic-large
takes 8MB of random data from /dev/urandom as the source, and generates the target by randomly copy/pasting chunks of it.
synthetic-medium
is like synthetic-large, but with only 2MB of data.
synthetic-small
is like synthetic-large, but with only 512KB of data.
zdelta-benchmark1-emacs
compares tarballs of two versions of GNU Emacs. This test case was taken from the paper “Delta Algorithms: An Empirical Analysis” by Hunt, Vo, and Tichy (ACM Transactions on Software Engineering and Methodology, 1998). This is the “hand-modified source” scenario.
zdelta-benchmark1-gcc
compares tarballs of two versions of the GNU Compiler Collection. This test was taken from the same paper. This is the “hand-modified source” scenario again.
zdelta-benchmark2-tar625-1M
compares two randomly-generated ~1MB files with a similarity of 62.5%. This test was taken from the same paper.
zdelta-benchmark2-tar800-1M
compares two randomly-generated ~1MB files with a similarity of 80%. This test was taken from the same paper.
zdelta-benchmark2-tar915-1M
compares two randomly-generated ~1MB files with a similarity of 91.5%. This test was taken from the same paper.

In addition, many tests have a “rev” variant, which reverses the source and target files.

The results

In case you want to run tests for yourself, or even submit your own, I’ve published my test harness as a git repo. Note that for legal reasons, the repo does not contain every testcase listed above, just the ones that can be legally distributed.

The test harness runs every tool for every test, and produces a tab-separated values file listing the size of every patch, and the tool and test involved. This file can be loaded into your favourite spreadsheet or database tool so you can analyse it as you like.

Total patch sizes

Let’s start with the total size of all the patches generated by each tool:

Total uncompressed patch sizes
Size Tool
7,934,971 bsdiff
9,787,718 xdelta3-9
10,695,964 flips
11,263,992 beat
11,667,963 byuu-bps
12,349,692 xdelta3-0
18,711,680 python-bps

Note that xdelta appears twice in the table. Unlike other formats, the VCDIFF format supports optional internal compression, and xdelta allows the amount of compression to be tweaked. “xdelta3 -0” disables the internal compression, while “xdelta3 -9” compresses as much as possible. Clearly this helps xdelta3 quite a bit, so let’s run the patches through gzip to see if that changes anything.

Total compressed patch sizes
Size Tool
7,932,612 bsdiff.gz
9,588,976 flips.gz
9,790,203 xdelta3-9.gz
10,198,987 beat.gz
10,541,049 byuu-bps.gz
11,470,947 python-bps.gz
12,351,408 xdelta3-0.gz

“xdelta3 -9” doesn’t change much at all, as expected since it’s already compressed.

“bsdiff” also doesn’t change much, since it is also already compressed. Unlike xdelta3, compression isn’t optional for bsdiff, so there’s no compression-disabled variant.

It looks like all the BPS-based tools (flips, beat, byuu-bps and python-bps) aren’t as efficient as they could be, since gzip shaves about a megabyte off the total for each one.

Weirdly, the uncompressed “xdelta3 -0” gets bigger. I guess the delta-compression information is stored in a way that naive gzip can’t compress, which is why the format supports internal compression.

Reverse patch sizes

Some tests have “forwards” and “reverse” variants, where the “reverse” patch just undoes the effect of the “forwards” patch. Are some tools better at one than the other?

Total uncompressed "forward" patch sizes
Size Tool
6,972,622 bsdiff
8,063,656 xdelta3-9
8,436,037 flips
8,793,056 beat
9,138,600 byuu-bps
10,028,155 xdelta3-0
15,214,791 python-bps
Total uncompressed "reverse" patch sizes
Size Tool
962,349 bsdiff
1,724,062 xdelta3-9
2,259,927 flips
2,321,537 xdelta3-0
2,470,936 beat
2,529,363 byuu-bps
3,496,889 python-bps

The total size of reverse-patches is much smaller than forward patches, because there’s not as many reverse patches in the corpus. The relative performance of each tool is quite consistent, except that xdelta3 -0 is much better in reverse than forwards, while xdelta3 -9 remains the same. Perhaps that’s because forward patchse add data that needs to be inserted into the file, which can benefit from compression, while reverse patches tend to remove data, and therefore do not benefit as much from compression.

Patch sizes by use-case

Now that we’ve establish trends across our entire corpus, let’s look at the different use-cases for for delta compression. Do different tools behave differently for the “recompiled binary” use-case?

Total uncompressed "recompiled binary" patch sizes
Size Tool
492,739 bsdiff
1,237,717 xdelta3-9
1,676,116 flips
1,821,815 xdelta3-0
1,858,891 beat
1,872,539 byuu-bps
2,547,556 python-bps

bsdiff is the undisputed champion of this category, which is unsurprising since it was specifically designed for such things.

How about hand-modified binaries?

Total uncompressed "hand modified binary" patch sizes
Size Tool
1,857,705 xdelta3-0
1,862,345 xdelta3-9
1,869,546 bsdiff
1,954,570 flips
1,986,327 beat
2,023,865 byuu-bps
2,177,781 python-bps

bsdiff falls back into the middle of the pack for hand-modified binaries. Somewhat bizarrely, the uncompressed xdelta3 -0 is smaller than every other tool, including the compressed xdelta3 -9. However, all the tools are pretty closely-matched in this category, so it might not be a significant difference.

The last major category is “hand modified sources”:

Total uncompressed "hand modified source" patch sizes
Size Tool
2,386,047 xdelta3-9
2,616,045 flips
2,671,313 beat
2,860,706 bsdiff
2,937,447 byuu-bps
3,350,040 xdelta3-0
7,721,845 python-bps

For mostly-text data like source-code, bsdiff is again mediocre, and Flips and beat make a good showing. xdelta3 -9 makes the best showing, possibly because the new data added into the target file is text, and therefore compresses well.

Bonus: File compression

If you think about it, traditional file-compression is just a special case of delta-compression where the source file is always empty, so any delta-compression tool should be usable as a regular file-compression tool. Just for kicks, let’s try using our delta-compression tools to compresst the bsnes v072 binary from the “bsnes” test:

Compressed bsnes v072 binary size
Size Tool
356,530 xdelta3-0
397,620 xdelta3-9
401,225 bsdiff
546,702 flips
575,797 beat
603,691 byuu-bps
1,124,889 python-bps

For comparison, the uncompressed file is 1.1MiB, and gzip -9 compresses it to ~400KiB. Surprisingly, xdelta3 beats everything else, including gzip. Even more surprisingly, xdelta3 -0 beats xdelta3 -9. I could not begin to imagine how.

What about elapsed time or memory usage?

Total patch size is an important measurement for any kind of file compression, but not the only one. If a particular tool can generate super-tightly compressed files, but requires a gigabyte of RAM for each kilobyte of the source file, or takes days to run, it may not be worth the effort.

Unfortunately, my test harness does not record elapsed time or memory usage, so I can’t provide those numbers here. However, while running the tests I observed the following:

Conclusion

From the numbers we’ve seen, we can draw a few conclusions: