Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive


Dithering, according to Wikipedia, is a way of randomizing quantization error... but in a nutshell, when used for images it's basically a way of simulating a higher depth display than you've got. This is done by bumping pixels up or down in brightness, so that partially lit up patterns of dots average out in areas to the desired color values:

8-bit dither sample

There are lots of ways to dither, with varying levels of quality. When it comes to real-time dithering, though, the trick is how to do this with speed, but still effectively. So how to dither quickly?

It's not that hard, actually.

The main characteristics we're looking for in a dithering algorithm are:

Ordered dithering satisfies these, and is fairly easy to implement. The main idea is to start with a simple dot order in a 2x2 grid:

0 2
3 1

You can then apply this recursively to 4x4 and larger sizes:

 0  8  2 10
12  4 14  6
 3 11  1  9
15  7 13  5

We then add a multiple of this pattern to the source image in order to gradually bump the image pixels between levels. For instance, if a color value is 4/16ths of the way from level 5 to level 6, it will light up four out of the 16 pixels in the 4x4 dither matrix. So one way to convert an 8-bit grayscale image to 4-bit is to add the above matrix with saturation (clamping at 255), and then shift down by 4. Repeat two more times for green and blue if you have a color image. For a 15-bit (555) target image, shift the matrix down by one bit first, since there are 8 source levels for each target level instead of 16. A 2x2 matrix adds two bits (four levels) to the effective depth, and a 4x4 matrix adds four bits (sixteen levels). You're not likely to need an 8x8 matrix for extra levels, although you might need it for better quality (more on that later).

If you're working with hardware that has saturation arithmetic, this may be the best way to go, as it's very fast. Heck, in MMX code, it's probably just one PADDUSB instruction. It has a couple of flaws, though. First, we're adding all non-negative values, so our 4x4 dither matrix has a bias of +7.5. We can fix this by rebiasing the dither matrix, although that has the disadvantage of requiring both saturating add and subtract. The other problem is that we're increasing the contrast of the image slightly -- when dithering, an input value of 240/255 maps to 15/15, for a 6% increase in contrast. This can be fixed by scaling down the source data by 240/255, mapping input white to "just barely all white" on the output.

Eww, multiplies, you say? Not so fast.

If you're doing conversion from YCbCr to RGB, or some other sort of image processing operation, chances are you may be using a table, such as a clip table, to write the output pixels. An example of a clip table is one that contains 256 zeroes, followed by a linear 0-255 table, followed by 256 '255' values. This can be indexed by clip_table[x+256], thus clipping [-256,511] to [0,255]. It turns out that if you are doing this, you can get the dithering for cheap by folding the multiply into the clip table, and then offsetting the pointer used for indexing into it according to each pixel. Consider the following code:

switch(y & 3) {
    case 0:
        for(int x=0; x<width; x+=4) {
            dst[0] = dither_clip_table[src[0] + 256 + 0];
            dst[1] = dither_clip_table[src[1] + 256 + 8];
            dst[2] = dither_clip_table[src[2] + 256 + 2];
            dst[3] = dither_clip_table[src[3] + 256 + 10];
            dst += 4;
            src += 4;
    case 1:
        for(int x=0; x<width; x+=4) {
            dst[0] = dither_clip_table[src[0] + 256 + 12];
            dst[1] = dither_clip_table[src[1] + 256 + 4];
            dst[2] = dither_clip_table[src[2] + 256 + 14];
            dst[3] = dither_clip_table[src[3] + 256 + 6];
            dst += 4;
            src += 4;
    case 2:
        for(int x=0; x<width; x+=4) {
            dst[0] = dither_clip_table[src[0] + 256 + 3];
            dst[1] = dither_clip_table[src[1] + 256 + 11];
            dst[2] = dither_clip_table[src[2] + 256 + 1];
            dst[3] = dither_clip_table[src[3] + 256 + 9];
            dst += 4;
            src += 4;
    case 3:
        for(int x=0; x<width; x+=4) {
            dst[0] = dither_clip_table[src[0] + 256 + 15];
            dst[1] = dither_clip_table[src[1] + 256 + 7];
            dst[2] = dither_clip_table[src[2] + 256 + 13];
            dst[3] = dither_clip_table[src[3] + 256 + 5];
            dst += 4;
            src += 4;

Basically, all you have to do is unroll your pixel loop by four in both the horizontal and vertical directions. This then turns the offset into a constant for each pixel access, which can then be folded into the addressing into the table by the compiler. It's not exactly correct, as this scales the dithering matrix by whatever factor was folded into the table, but for most uses it's close enough, and it's basically free. If you want perfect results, you can generate multiple tables; just be mindful of overloading Mr. L1 Cache. Well, that is, if you have one. I've done this in real-time on a 68000 before, as I couldn't stand seeing that 8-bit displacement in d8(An, Dn.W) go to waste. :)

Nowadays, you're more likely to have a vectorized loop that uses vector ALU operations instead of table lookups, but the same rule applies -- unroll everything by 4x4, and you can hardcode the dither constants. Even without tables, there are still ways you can get the dither addition for cheap. For instance, if you are using the classic fast-float-to-int trick in x86 code, you can combine the dither matrix values with one of the magic constants. Here's some prototype code that I have that does this (note that it's not unrolled yet):

#define X(v) ((v) - 0x49400000)
    static const sint32 kDitherMatrix[4][4]={
        { X( 0), X( 8), X( 2), X(10), },
        { X(12), X( 4), X(14), X( 6), },
        { X( 3), X(11), X( 1), X( 9), },
        { X(15), X( 7), X(13), X( 5), },
#undef X

    const sint32 *pDitherRow = kDitherMatrix[y & 3];
    for(sint32 i=0; i<w; ++i) {
        float r = src[0];
        float g = src[1];
        float b = src[2];

        src += 4;
        sint32 addend = pDitherRow[i & 3];

        union {
            float f;
            sint32 i;
        } cr = {r * 255.0f + 786432.0f},
          cg = {g * 255.0f + 786432.0f},
          cb = {b * 255.0f + 786432.0f};

        sint32 vr = ((sint32)cr.i + addend) >> 4;
        sint32 vg = ((sint32)cg.i + addend) >> 4;
        sint32 vb = ((sint32)cb.i + addend) >> 4;

        // clamp to 0-255
        if ((uint32)vr >= 0x100)
            vr = (uint8)(~vr >> 31);
        if ((uint32)vg >= 0x100)
            vg = (uint8)(~vg >> 31);
        if ((uint32)vb >= 0x100)
            vb = (uint8)(~vb >> 31);

        // output 32-bit pixels
        dst[i] = (vr << 16) + (vg << 8) + vb;

You might ask, who cares about dithering nowadays? True, 24-bit truecolor displays are ubiquitous. People are increasingly working with high enough quality images that even the quantization level of 8 bits per channel is visible, though, so dithering once again becomes useful, as it can remove banding and approximate a display depth higher than eight bits per channel. The above code is actually for dithering floating-point pixels to 24-bit color. For this reason, I was somewhat amused by Photoshop's 16-bit per channel support -- it doesn't seem to dither, which makes it hard to use as you can't see what you're doing beyond 8-bit, even zoomed in. (I might be wrong. After all, I don't use Photoshop much, and when I do, I try to make the art as ugly as possible so no one considers shipping it.) Similarly, if you're working in a 3D pixel shader, you might have to implement dithering manually, as the dithering hardware in the post-blender might not be able to dither floating point to 32-bit, as it does for 16-bit targets. One repeating texture works nicely for this.

There are lots of ways to improve upon the standard 4x4 dithering matrix, by the way. If you're dealing with a static image, adaptive algorithms such as Floyd-Steinberg will usually generate better results without too much more work -- the primary downside is that the dithering pattern shifts around whenever the image changes, so it's usually not a good idea for animations or video. Larger dither matrices with better randomization also produce much less repetition, such as the 16x16 matrices I used in the image above. You don't want a purely random pattern for this, as you'll still get clustering; search for "blue noise" if you're interested in this.

Finally, if you've read to this point, you're truly bored... just kidding. You might be wondering if there's a good way to remove dither. Truth be told, I don't know, as I haven't actually tried it... but I have thought about it a little. If you can rely on a fixed dither matrix, particularly the classic 4x4, then you know that certain pixel positions will be biased compared to the original image, and you can subtract out the dither matrix to remove that bias, and theory, do some sort of adaptive averaging to reduce the variance (one of the ubitquitous noise reduction filters may work well here). When it gets really tricky is if an adaptive palette was used for the target -- in that case, the depth of the dithering may vary with the color, and the method used to do the color matching is important, as it isn't as straightforward as uniform quantization. I don't have a good answer to that, but if I find one, it may end up in a certain video program.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.