¶Beware of the CPU-specific optimizations
Every once in a while, I get a crash report whose diagnosis looks like this:
10116789: 0ff6c1 psadbw mm0, mm1 <-- FAULT Crash context: An integer SSE (Pentium III/Athlon) instruction not supported by the CPU was executed in module '****'... ...while decompressing video frame 0 with "********* Codec" [biCompression=********] (VideoSource.cpp:1772).
I blocked out the codec ID information so as to not single out a video codec manufacturer.
A crash like this usually means that the video codec you were attempting to use was compiled with CPU-specific optimizations that your CPU doesn't support. This generally means that your CPU is below minimum requirements for the codec. Unlike normal CPU requirements, missing instructions doesn't mean the codec will run really slowly -- it simply means the codec won't work at all. There's nothing I can do about this in VirtualDub; if you're seeing something like this and are indeed below the minimum spec, you need to either upgrade or beg the codec vendor to support your CPU (if it is actually fast enough to handle the video format).
Note that crashing here also means that the codec didn't properly check for the availability of special CPU instructions before attempting to use them. This is unwise from a customer support standpoint and I would encourage adding CPU detection code and an error dialog instead. One trap that a lot of coders fall into is that they attempt to use the Pentium Pro conditional move instructions (CMOVcc), assuming that they are available since no one will be using a CPU below 300MHz. Unfortunately, the AMD K6 series of CPUs don't support this instruction and are available at least as high as 400MHz, so this is a bad assumption. Similarly, Athlons exist as fast as 1GHz that don't support SSE. Also, you would be surprised how slow of a system people will try your code on; I got a crash report recently from someone who tried using modern video codecs on a Pentium without MMX!
Embrace the CPUID instruction. CPUID is your friend. If you are targeting the integer SSE instructions -- such as pshufw, psadbw, pavgb, pavgw, and movntq -- remember to check for either the SSE bit (for PIII and Athlon XP or higher) and the 3DNow! extensions bit (for the original Athlon).
In VirtualDub, I generally write scalar C versions of processing routines and then keep those around even after writing CPU-specific assembly optimized versions. I do this for two reasons: one is for compatibility with all CPUs, and another is so that I don't have to write the AMD64 version immediately. (The AMD64 compiler doesn't support inline assembly, which was a pain at least during the initial port.) The scalar code also serves as a reference test for the optimized code. VirtualDub queries for CPU capabilities at the start of an operation and automatically chooses the appropriate optimized routine.