¶The infamous frame 9995 MP3 bug
For years, I've been receiving reports about a mysterious problem with VirtualDub hanging during a save operation. The program's UI was still responsive, so the application hadn't totally died, but the processing pipeline jammed up such that the render couldn't make any progress or be aborted (a "livelock"). I had initially assumed that this was a deadlock caused by thread synchronization issues, which wouldn't lock the UI because VirtualDub's UI runs in a separate thread, and this was reinforced by the livelock log messages in recent versions indicating that the audio thread was stuck in a system call. However, I could never find the culprit or reproduce the problem. This was very frustrating for me, because it was a long-standing issue that made my program unusable for some people, and which I couldn't fix.
Until now.
I recently found out which software triggers the problem and why — it's the Creative Labs MP3 codec (ctmp3.acm), and it's because of peculiar notifications being sent by that codec. It comes with the software that ships with certain SoundBlaster Live! sound cards; in particular, installing the PlayCenter application will also install the codec. Either renaming the driver file temporarily, uninstalling it, or lowering its priority in the Sounds and Audio Devices control panel so that another MP3 codec has priority will work around the problem. I think I have a viable workaround that I can put into VirtualDub itself, but why this codec causes a lockup is an interesting question in itself.
A quick note
The Creative MP3 codec shows up as just "MP3" in the audio codec list, and it of course, uses a tag of 0x0055 (WAVE_FORMAT_MPEGLAYER3) for its compressed format. One of the sources of confusion is that when the lockup problem occurs, you can get it whether you choose that codec or any other MP3 codec in VirtualDub's audio compression dialog. The reason for this is a deficiency in VirtualDub's settings handling. When an audio format is selected for compression, VirtualDub only records the compressed format that you have selected. The problem is that if you have multiple audio codecs installed that can compress to that format from the audio source, it is indeterminate which codec is used, regardless of which one you picked. The result is that picking either the MP3 setting, the MPEG Layer-3 entry provided by the ubiquitous Fraunhofer codec, or even a Lame ACM entry, can cause the Creative codec to be used instead.
I plan to solve this at some point by adding the codec name or ID in the internal settings as a hint, but haven't gotten around to modifying all the code paths yet. In the meantime, only having one codec installed per compressed format is the best way to avoid ambiguity.
The reason for the livelock
The test that finally succeeded in reproducing the problem was to do MP3 compression from 44kHz, 16-bit PCM to 160kbps with a long audio stream, nearly an hour long. Once it locked up — and after I had done the customary finally-a-repro-case dance of joy — attaching the debugger revealed something really strange. The thread that was processing the audio had locked up in the call to acmStreamConvert() inside the CTMP3.ACM codec, which had, in turn, made a system call at the time of the break. What was totally unexpected, though, was that the system call wasn't WaitForSingleObject(), but Sleep()! Well, after a little disassembly, here's the surrounding code that I found in the codec:
do { Sleep(10); } while(!PostThreadMessage(GetCurrentThreadId(), MM_STREAM_DONE, hacmStream, 0));
Uh, right.
A little background: PostThreadMessage() is used to asynchronously send a message to a thread's message queue, but without associating that thread with any window. The only way such a message can be handled is by a message hook or by direct handling in the thread's message loop, because DispatchMessage() just dumps messages on the floor. You can use thread messages as a way to communicate between threads, as long as the receiving thread doesn't create a modal dialog box without a thread hook. One of the problems you can run into is that if the thread hasn't gotten to a function like PeekMessage() or GetMessage() yet, PostThreadMessage() will fail. One of the solutions that the Platform SDK recommends is precisely the above, to call Sleep() and then retry PostThreadMessage() until it succeeds. Yuuuuck!!
However, there are two problems with the above.
One is that I can't see why the audio codec would post MM_STREAM_DONE as a thread message. According to the ACM documentation, MM_STREAM_DONE is only used in two contexts: posted as a message to a window that was specified with the stream with CALLBACK_WINDOW, and used as an event ID when calling the callback function specified with CALLBACK_FUNCTION. VirtualDub doesn't set either of these flags, and moreover, I couldn't find any other audio codec on my system that even imports PostThreadMessage(). The other problem is more serious, though, which is that....
...It makes no sense to do a loop like this with the current thread as the target. The current thread obviously can't be running a message loop, since it's busy running this post loop instead, and if there is a problem with the message queue that causes PostThreadMessage() to fail, it will never be resolved. And in fact, because VirtualDub calls the audio codec on a non-UI thread that doesn't run a message pump, the message queue fills with MM_STREAM_DONE messages and eventually fills up, causing PostThreadMessage() to fail with ERROR_INSUFFICIENT_QUOTA. And then the thread loops forever calling Sleep().
How big is the message queue in Windows XP? You guessed it, the default is 10,000 messages. Take into account a 1:1 audio interleave with a little skew for the audio preload, and that's where the frame 9995 part comes from.
I can't definitively say that this is a bug, because the Platform SDK documentation in the Windows Multimedia area is rusty and I've definitely found a lot of it lacking, misleading, or even completely in error compared to the docs for the original 16-bit Windows systems that the modern Win32 multimedia modules are based on. The good news is that it is unlikely to affect most other programs, since many of them are single-threaded and will thus run a PeekMessage() loop in order to keep the UI responsive, or at least check for an abort request. The bad news is that it still isn't entirely safe, even in a thread that runs a message pump. Posting a thread message like this looks unnecessary; I see nothing in the docs that says you need to run a message queue to use ACM, and looping infinitely on an error without checking the error code is bad practice. I partly blame the MSDN technical writer who put such a lame solution in the documentation for PostThreadMessage().
What's sobering about the statement "not my bug" is that "not my problem" doesn't necessarily follow.
The way I am experimenting with solving this for 1.6.13 is to add a PeekMessage() loop to my audio codec class that only drains MM_STREAM_DONE thread messages, which in theory should work around the problem for non-UI threads while still not fouling normal window message dispatch if I call it on a UI thread. It seems to work so far, which makes me happy.