¶Video operations without pixel shaders
Georgi Petrov asks:
May be this is not the right place to ask, but can I implement brightness/contrast/hue/etc. correction in Direct3D 9 without using pixel shaders? I mean - what's the way this kind of thing is done? Where should I start from?
Well... yes, you can do some of those, but it's a pain.
If you don't have pixel shaders, then you don't have a lot of options. You basically have two choices for getting images on the screen, StretchRect() or drawing polygons. StretchRect() doesn't give you any control other than filtering, so that means drawing polygons, and without pixel shaders, that means the dreaded fixed function pipeline of Direct3D 7. The fixed function pixel pipeline is a cascade of parallel color and alpha stages, with diffuse and specular going in the head, a texture being injected to each color/alpha stage, and the output going to the framebuffer.
Now, if you're assuming lack of pixel shader support, that means you're looking at cards like an TNT/TNT2, GeForce 1/2, or Radeon. The NVIDIA cards have two textures and two blend stages, while the Radeon has three. Furthermore, none of these cards have dependent texture read support, so texture-based table lookups are out and you're going to need very simple algorithms. Brightness/contrast adjustments are doable in two stages as a multiply and an add or subtract, or one if you use a multiply-add operation. You can squeeze in simple color balance too by using separate scale and bias values for each channel. Saturation can be done as a linear blend between luminance and the original color, i.e. lerp(dot(luma_basis, color), color, factor), but can be tricky to set up with the weird DX7 signed dot product. I think it might be doable in one pass if you're not doing anything else and use the framebuffer blend hardware.
Hue shifts are annoying to do if you want the standard HSV/HSL way of rotating colors around a hexagonal prism -- I think it'd be tough to do in fixed function without at least a lot of passes. Simple rotation in chroma would be easier, but you need a minimum of two dot products to do that and that practically means multipassing.
There are two other ways that some of these operations can be done. If YCbCr-to-RGB conversion is also being done through the rasterizer hardware, then some of these operations can be folded into that operation via the color matrix coefficients, which is desirable for both performance and accuracy reasons. In full-screen mode, it is also possible to use gamma ramp tables, although you run into problems if you have any on-screen UI that you don't want whacked by the post-process transform.
Finally, this isn't going to help if you must target Direct3D, but it's worth noting that the NVIDIA cards of that era are considerably more powerful if driven from OpenGL with NV extensions instead of Direct3D. On a GeForce 2, you get to play with two full register combiners along with a lerp-capable final combiner instead of two pathetic texture blend stages.
Overall, using fixed function rasterizing hardware to process video is difficult and restrictive. Creativity is a requirement as even simple operations like subtraction can require ingenuity. You can gain additional flexibility by multipassing in the framebuffer or off-screen surfaces using the post-blender, but you pay dearly in terms of fill rate and possibly accuracy. I did this in order to implement bicubic stretching in fixed function, and when you're doing somewhere between 5-9 passes with blending on and all texture stages active the GPU may not have enough fill rate to run at full video frame rate.