Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

And I thought my implementation of Deflate was bad

When I added PNG support to 1.7.0, I wrote my own routines to handle the Deflate compression algorithm. The Deflate algorithm is also the underlying compression algorithm behind the well-known 'zip' format, and is surprisingly complex, with a sliding window matching algorithm combined with three layers of Huffman trees. It was quite an educational experience to write a fast sliding window compressor.

Someone posted an interesting bug in the Visual Studio public bug database about the new zip file support enlarging files, so I decided to try .NET Framework's Deflate compressor:

using System;
using System.IO;
using System.IO.Compression;
namespace VCZipTest {
    class Program {
        static void Main(string[] args) {
            using (FileStream src = new FileStream(args[0], FileMode.Open)) {
                using (FileStream dst = new FileStream(args[1], FileMode.Create)) {
                    using (GZipStream def = new GZipStream(dst, CompressionMode.Compress, true)) {
                        byte[] buf = new byte[4096];
                        for (; ; ) {
                            int act = src.Read(buf, 0, 4096);
                            if (act <= 0)
                            def.Write(buf, 0, act);
                    System.Console.WriteLine("{0} ({1} bytes) -> {2} ({3} bytes)", args[0],
src.Length, args[1], dst.Length); } } } } } D:projwinVCZipTestbinDebug>vcziptest f:shuffle.mp3 f:test.bin.gz f:shuffle.mp3 (1439439 bytes) -> f:test.bin.gz (2114778 bytes)

Yes, the .NET Framework implementation actually enlarged the file by 47%, and yes, gunzip was able to "decompress" the 2.1MB file correctly. I ran the file through VirtualDub's zip decompressor and found that the literal/length Huffman tree in the stream was pretty badly misbalanced, with most of the literal codes allocated 14 bits. Deflate should never enlarge a stream by more than a tiny fraction since it allows individual blocks to be stored.

In contrast, even "gzip -1" was able to compress the file from 1,439,439 bytes to 1,416,488 bytes, and gzip -9 got it down to 1,406,973 bytes.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.