v1.10.4 (stable)

Main page
Archived news
Documentation
Capture
Compiling
Processing
Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum

Other projects
Altirra

## § ¶Dithering

Dithering, according to Wikipedia, is a way of randomizing quantization error... but in a nutshell, when used for images it's basically a way of simulating a higher depth display than you've got. This is done by bumping pixels up or down in brightness, so that partially lit up patterns of dots average out in areas to the desired color values:

There are lots of ways to dither, with varying levels of quality. When it comes to real-time dithering, though, the trick is how to do this with speed, but still effectively. So how to dither quickly?

It's not that hard, actually.

The main characteristics we're looking for in a dithering algorithm are:

• Black maps to black.
• White maps to white.
• The average value of a group of dithered pixels approximates the average of the original color values.
• Dots spaced as far apart as possible.

Ordered dithering satisfies these, and is fairly easy to implement. The main idea is to start with a simple dot order in a 2x2 grid:

```0 2
3 1```

You can then apply this recursively to 4x4 and larger sizes:

``` 0  8  2 10
12  4 14  6
3 11  1  9
15  7 13  5
```

We then add a multiple of this pattern to the source image in order to gradually bump the image pixels between levels. For instance, if a color value is 4/16ths of the way from level 5 to level 6, it will light up four out of the 16 pixels in the 4x4 dither matrix. So one way to convert an 8-bit grayscale image to 4-bit is to add the above matrix with saturation (clamping at 255), and then shift down by 4. Repeat two more times for green and blue if you have a color image. For a 15-bit (555) target image, shift the matrix down by one bit first, since there are 8 source levels for each target level instead of 16. A 2x2 matrix adds two bits (four levels) to the effective depth, and a 4x4 matrix adds four bits (sixteen levels). You're not likely to need an 8x8 matrix for extra levels, although you might need it for better quality (more on that later).

If you're working with hardware that has saturation arithmetic, this may be the best way to go, as it's very fast. Heck, in MMX code, it's probably just one PADDUSB instruction. It has a couple of flaws, though. First, we're adding all non-negative values, so our 4x4 dither matrix has a bias of +7.5. We can fix this by rebiasing the dither matrix, although that has the disadvantage of requiring both saturating add and subtract. The other problem is that we're increasing the contrast of the image slightly -- when dithering, an input value of 240/255 maps to 15/15, for a 6% increase in contrast. This can be fixed by scaling down the source data by 240/255, mapping input white to "just barely all white" on the output.

Eww, multiplies, you say? Not so fast.

If you're doing conversion from YCbCr to RGB, or some other sort of image processing operation, chances are you may be using a table, such as a clip table, to write the output pixels. An example of a clip table is one that contains 256 zeroes, followed by a linear 0-255 table, followed by 256 '255' values. This can be indexed by clip_table[x+256], thus clipping [-256,511] to [0,255]. It turns out that if you are doing this, you can get the dithering for cheap by folding the multiply into the clip table, and then offsetting the pointer used for indexing into it according to each pixel. Consider the following code:

```switch(y & 3) {
case 0:
for(int x=0; x<width; x+=4) {
dst[0] = dither_clip_table[src[0] + 256 + 0];
dst[1] = dither_clip_table[src[1] + 256 + 8];
dst[2] = dither_clip_table[src[2] + 256 + 2];
dst[3] = dither_clip_table[src[3] + 256 + 10];
dst += 4;
src += 4;
}
break;
case 1:
for(int x=0; x<width; x+=4) {
dst[0] = dither_clip_table[src[0] + 256 + 12];
dst[1] = dither_clip_table[src[1] + 256 + 4];
dst[2] = dither_clip_table[src[2] + 256 + 14];
dst[3] = dither_clip_table[src[3] + 256 + 6];
dst += 4;
src += 4;
}
break;
case 2:
for(int x=0; x<width; x+=4) {
dst[0] = dither_clip_table[src[0] + 256 + 3];
dst[1] = dither_clip_table[src[1] + 256 + 11];
dst[2] = dither_clip_table[src[2] + 256 + 1];
dst[3] = dither_clip_table[src[3] + 256 + 9];
dst += 4;
src += 4;
}
break;
case 3:
for(int x=0; x<width; x+=4) {
dst[0] = dither_clip_table[src[0] + 256 + 15];
dst[1] = dither_clip_table[src[1] + 256 + 7];
dst[2] = dither_clip_table[src[2] + 256 + 13];
dst[3] = dither_clip_table[src[3] + 256 + 5];
dst += 4;
src += 4;
}
break;
}
```

Basically, all you have to do is unroll your pixel loop by four in both the horizontal and vertical directions. This then turns the offset into a constant for each pixel access, which can then be folded into the addressing into the table by the compiler. It's not exactly correct, as this scales the dithering matrix by whatever factor was folded into the table, but for most uses it's close enough, and it's basically free. If you want perfect results, you can generate multiple tables; just be mindful of overloading Mr. L1 Cache. Well, that is, if you have one. I've done this in real-time on a 68000 before, as I couldn't stand seeing that 8-bit displacement in d8(An, Dn.W) go to waste. :)

Nowadays, you're more likely to have a vectorized loop that uses vector ALU operations instead of table lookups, but the same rule applies -- unroll everything by 4x4, and you can hardcode the dither constants. Even without tables, there are still ways you can get the dither addition for cheap. For instance, if you are using the classic fast-float-to-int trick in x86 code, you can combine the dither matrix values with one of the magic constants. Here's some prototype code that I have that does this (note that it's not unrolled yet):

```#define X(v) ((v) - 0x49400000)
static const sint32 kDitherMatrix[4][4]={
{ X( 0), X( 8), X( 2), X(10), },
{ X(12), X( 4), X(14), X( 6), },
{ X( 3), X(11), X( 1), X( 9), },
{ X(15), X( 7), X(13), X( 5), },
};
#undef X

const sint32 *pDitherRow = kDitherMatrix[y & 3];
for(sint32 i=0; i<w; ++i) {
float r = src[0];
float g = src[1];
float b = src[2];

src += 4;
sint32 addend = pDitherRow[i & 3];

union {
float f;
sint32 i;
} cr = {r * 255.0f + 786432.0f},
cg = {g * 255.0f + 786432.0f},
cb = {b * 255.0f + 786432.0f};

sint32 vr = ((sint32)cr.i + addend) >> 4;
sint32 vg = ((sint32)cg.i + addend) >> 4;
sint32 vb = ((sint32)cb.i + addend) >> 4;

// clamp to 0-255
if ((uint32)vr >= 0x100)
vr = (uint8)(~vr >> 31);
if ((uint32)vg >= 0x100)
vg = (uint8)(~vg >> 31);
if ((uint32)vb >= 0x100)
vb = (uint8)(~vb >> 31);

// output 32-bit pixels
dst[i] = (vr << 16) + (vg << 8) + vb;
}
```

You might ask, who cares about dithering nowadays? True, 24-bit truecolor displays are ubiquitous. People are increasingly working with high enough quality images that even the quantization level of 8 bits per channel is visible, though, so dithering once again becomes useful, as it can remove banding and approximate a display depth higher than eight bits per channel. The above code is actually for dithering floating-point pixels to 24-bit color. For this reason, I was somewhat amused by Photoshop's 16-bit per channel support -- it doesn't seem to dither, which makes it hard to use as you can't see what you're doing beyond 8-bit, even zoomed in. (I might be wrong. After all, I don't use Photoshop much, and when I do, I try to make the art as ugly as possible so no one considers shipping it.) Similarly, if you're working in a 3D pixel shader, you might have to implement dithering manually, as the dithering hardware in the post-blender might not be able to dither floating point to 32-bit, as it does for 16-bit targets. One repeating texture works nicely for this.

There are lots of ways to improve upon the standard 4x4 dithering matrix, by the way. If you're dealing with a static image, adaptive algorithms such as Floyd-Steinberg will usually generate better results without too much more work -- the primary downside is that the dithering pattern shifts around whenever the image changes, so it's usually not a good idea for animations or video. Larger dither matrices with better randomization also produce much less repetition, such as the 16x16 matrices I used in the image above. You don't want a purely random pattern for this, as you'll still get clustering; search for "blue noise" if you're interested in this.

Finally, if you've read to this point, you're truly bored... just kidding. You might be wondering if there's a good way to remove dither. Truth be told, I don't know, as I haven't actually tried it... but I have thought about it a little. If you can rely on a fixed dither matrix, particularly the classic 4x4, then you know that certain pixel positions will be biased compared to the original image, and you can subtract out the dither matrix to remove that bias, and theory, do some sort of adaptive averaging to reduce the variance (one of the ubitquitous noise reduction filters may work well here). When it gets really tricky is if an adaptive palette was used for the target -- in that case, the depth of the dithering may vary with the color, and the method used to do the color matching is important, as it isn't as straightforward as uniform quantization. I don't have a good answer to that, but if I find one, it may end up in a certain video program.

# Black maps to black.
# White maps to black.

...shouldn't that be "white maps to white"?

Darkstar - 03 05 07 - 04:25

Whoops, Well, that'd be a very accurate dithering algorithm, I guess, although the contrast might be a bit low. Thanks, fixed.

Phaeron - 03 05 07 - 04:29

And where does all this fit inside VD?

Carter - 03 05 07 - 15:44

Carter: last paragraph: VD may get a dithering removal filter.

Tom - 03 05 07 - 17:41

Remember how he was writing an animated gif export feature? Dithering necessary to get high quality output.

Simon E. - 03 05 07 - 20:25

>True, 24-bit truecolor displays are ubiquitous.

In desktop computing yes, but Windows CE, for example, uses 16bpp, as it's native resolution. Tricks like this, your alpha blending, and fast average for packed pixels are VERY useful on that platform. Especially since the CPU is about 200MHz and without any SIMD support.

sh0dan (link) - 08 05 07 - 03:32

A few years ago, I dove into the relatively unknown Riemersma dithering algorithm. Quite a fancy algortihm, and not based or error diffusion for a change. All in all a simple algorithm based on a kind of pathfinding (can't remember the name). IMO, it produces better results than the popular floyd-steinberg variant on error diffusion.

What do you guys think? I have some examples, somewhere :)

Thany - 08 05 07 - 07:16

It's still error diffusion, but along an unusual path. The way that the algorithm works, it'd be hard to parallelize; I'm not certain if the response from a change to any pixel is finite. I can't say how it looks, though, since I haven't tried it. I don't doubt that it'd work better for animation than F-S or ordered.

Phaeron - 09 05 07 - 02:11

If you using a static matrix, shouldn't removal of dithering be simple? Assuming you store your image as a column vector (which I do in my case):
A - dither matrix
x - input image
y - output

Ax = y (to dither)
x = (A^-1)y (to undither)

the trick is as you said get the right A if you weren't the one doing the dithering, but for some common static matrices this should work well (and you dong even have to compute inverse of A all the time, so the whole thing is pretty trivial).

Is the issue that very few apps use static matrices? Or am I missing some other complication?

keije - 17 05 07 - 12:34

It's not that simple because there is a quantization step involved as well: it's more like quant(Ax+B). If there weren't such a step there wouldn't be a reason to dither. You can decrease the mean error by subtracting off the dither matrix, but if you want to decrease the graininess of the image you need to do some intelligent averaging or some other sort of estimation too.

Many apps do indeed use adaptive algorithms for better quality, but there are still cases where a fixed dither pattern is used. One benefit is that pixels are still independent - partial updates are possible without disturbing the pattern. Presumably for this reason, OpenGL prohibits use of a dither algorithm that is dependent upon anything other than the input color values and the window position.

Phaeron - 18 05 07 - 03:12

Reading your blog is more educating and more fun that youtubing or anything close to that xD

Michael - 19 05 07 - 21:50

It does sound rather obvious that using adaptive methods such as F-S would be a bad idea for video. However, that didn't stop a large Japanese company trying to use them (custom algorithm, not exactly F-S) in their new FLCD (Ferroelectric Liquid Crystal Display) monitors about 10 years ago. This was needed because the FLCD's didn't have enough color depth. They designed chips to do the error diffusion and everything. Only once the prototypes were working did they discover that if you moved the mouse pointer in the upper left of the display, the whole picture shimmered as the error diffusion shifted around. They even coined a term to explain it - 'sparkle noise'. The displays were shown at a couple of trade shows, and basically laughed at :) These days all you can find about them online are the dozens of patents that were applied for - including several about how fantastic error diffusion is for displays :)

nsayn - 12 06 07 - 09:55

F-S dithering could be used for video or displays... but only if the display/video had:
1) high enough framerate (59fps+?), for LCD that means fast enough response time; think temporal antialiasing on ATI cards (60fps+ required? is it still available on newer models?);
2) display showing what it is expected to; I encountered a not-exactly-cheap LCD that after showing alternating black/white images (60fps) still flickered after returning to desktop;
3) correct and consistent gamma, not only 2*50%gray=black+white, but also for gray combinations;
4) for LCDs: no visible overdrive with response time compensation and consistent transition speed and gamma: if moving light-dark stripes seem darker or lighter than static ones, something's wrong; same for moving darker object that has front edge darker and rear edge lighter than it's body.
To cut the long story short: F-S for video would still be useful if displays were perfect, but they are not.

Anna N. - 17 10 07 - 09:38

It's been a while, but Avery's blogs are always a good reference. I needed a 4x4 dither table and I remembered this. Cut and paste the numbers into the code, let it rip, first impression, pretty good. Looked a little closer and there was a distinct horizontal pattern on some images. And the pattern went away or appeared when I rotated the images.

Did some analysis and realised the columns all nicely added to 30, but the rows did not. The top row adds to 20 and the bottom adds to 40. When used repeatedly these are next to each other and cause the pattern. Damn!

After a little thought it occurred that with a little shuffling all the rows and all columns could be made to add to 30. Plugged the new numbers into the code and viola no more patterns. Any way I will park the numbers here so others may enjoy them in their dither code.

Original table
00 08 02 10 = 20
12 04 14 06 = 36
03 11 01 09 = 24
15 07 13 05 = 40
== == == == ==
30 30 30 30 =120

Improved table
00 11 06 13 = 30
12 07 09 02 = 30
03 08 05 14 = 30
15 04 10 01 = 30
== == == == ==
30 30 30 30 =120

IanB (link) - 15 01 11 - 16:30

I'm unclear as to why the new table is an improvement. The 4x4 pattern is repeated, so the new table can be rotated by one column:

13 00 11 06
02 12 07 09
14 03 08 05
01 15 04 10

The 00-03 values are all in the first two columns, so a 50% checkerboard will appear in adjacent pairs of columns before the other pairs begin lighting up. Thus, with continuous input you would now have a problem with vertical stripes. I'm guessing the issue you're actually hitting is that the pre-dither values you're encountering are not evenly distributed, such as for conversion from 24-bit RGB to 15-bit RGB or perhaps 16-235 0-255.

Phaeron - 15 01 11 - 19:53

### Comment form

Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
 Name: Remember personal info? Yes No Email (Optional): Your email address is only revealed to the blog owner and is not shown to the public. URL (Optional): Comment: / An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam. Small print: All html tags except and will be removed from your comment. You can make links by just typing the url or mail-address.