¶Debugging the DirectShow capture driver the hard way
Those of you who have been following my development history for a while know that at one point I predicted I would be releasing a 2.0 version of VirtualDub. Well, that never panned out due to my getting a full-time job, so I scaled my plans back and went down the 1.5, 1.6, etc. incremental plan. One of the modules that I wrote for 2.0 way back in college was a somewhat functioning DirectShow capture module. Unfortunately, I couldn't drop it into 1.4 when I canned 2.0 because the two versions were quite different; it wasn't until I brought over the system libraries in 1.5 and rewrote the capture layer in 1.6 that it was possible. But that wasn't the only problem.
DirectShow is a very different beast compared to Video for Windows, and there were (and still are) a ton of problems in the module. Part of the problem is the increased flexibility of the API, which lends itself to more complexity, and thus more bugs. Another problem, though, was what amounted to a random, full-screen UI lock during development. Frequently all applications on my development machine would cease to respond during testing; I could Ctrl+Alt+Del to the logon desktop, could Alt+Tab and see the window list, could even run console apps in a command prompt -- but all GUI apps were simply locked. Basically, I had to hard restart my system, which on my laptop meant a power-off. This annoyed me to no end and put a major crimp in development, which was one of the reasons I stopped working on it.
I finally figured out what was (is) going on today, when I hit the same stupid hang again... but I had to use a rather drastic trick to do so.
Over months I had figured out a couple of strategies for dealing with the problem. At one point during 2.0 development I discovered that logging off would eventually unblock the system, but of course with the annoyance of closing all running apps. Killing VirtualDub.exe did the trick, but since the debugger was attached to it I had to kill the IDE instead. That was difficult however since Task Manager wouldn't display in anything under two hours, so I started using a command-line kill utility from Sysinternals. Recently I found out that Windows XP has its own tools for doing this, which quite conveniently can do a remote kill:
tasklist /s <system> taskkill /s <system> /f /pid <process-ID>
Very handy, except that it still didn't help me actually figure out what triggered the hang.
I hit another such situation today when trying to switch frame rates on a VFW device going through DirectShow, and had the UI lock up again. This time I was determined to figure out why the system was freezing -- I was convinced it was a bug somewhere in WIN32K -- so I set out to do a remote kernel debug. Unfortunately I soon discovered to my frustration that I had no Firewire cable and no 9-pin null-modem serial cable, one of which is required. (This will have to be rectified at the local Fry's tomorrow.) So instead I activated the CrashOnCtrlScroll Registry key in the keyboard driver, set the system to do a full memory dump on crash, purposely blue-screened the system when the froze occurred, and then waited forever for all 1GB of RAM to be dumped to disk.
The next step was to load MEMORY.DMP into Microsoft WinDbg and figure out what was going on. The !locks command didn't show any deadlocks or major lock contention, nor did thread times show a high-priority thread eating up CPU -- but after enough fooling around I managed to successfully backtrace all kernel threads for VirtualDub.exe into user space, at which point I saw... UnhandledExceptionFilter(), being called from VDCaptureDriverDS::SetFramePeriod(). Uh oh. Basically VirtualDub had crashed, and for some reason the system really didn't like it when the main UI thread crashed while the DirectShow filter graph was running. This explains why the UI locks up, as DirectShow has hooks into kernel drivers and also uses DirectDraw or Direct3D, but it still reeks of a kernel bug because a user app shouldn't be able to block the system in this manner.
Suffice it to say, I am increasingly liking WinDbg and NT kernel debugging. The system has to be really trashed for the kernel debugger not to work. I still have bad memories of debugging video capture crashes on Windows 95, where I once had a bug that was causing the system to instantly reboot. I ended up peppering debug print statements everywhere with Sleep(2000) everywhere, and watching the debug output window very closely to catch the last point of execution before the machine keeled over. That was not fun.
Anyway, going back to the hang -- it presents a bit of a problem because it means I can't stably present a crash dialog in capture mode with my current system. I tried disabling the crash handler, and the system still froze because of the default system error dialog. Crash dialogs are very important to me because they are frequently the only way I can figure out where my program has blown up so I can fix it, given that remote debugging is an impossiblity with users on the Internet, and also because they can give the user clues as to possible workarounds. I think the only way I can safely display a crash dialog in this case is to immediately relaunch another copy of VirtualDub.exe, pass the crash context to it through a shared memory window, and then immediately TerminateProcess() to unlock the display so the second copy can display the crash UI. It's nasty, but it's safer.
To end on a happy note, though, I did manage to squish the bug in the 1.6.3 DirectShow capture driver that was causing the problem. I was assuming that the capture filter had independently configurable capture and preview video settings, which the VFW Wrapper filter doesn't.