Old news

5/15/2004 News: Bicubic resampling

Long, lengthy rant^H^H^H^Hdiscourse on 3D to follow.

One of the features I've been working on for 1.6.0 is the ability to do bicubic resampling in the video displays using hardware 3D support. We've been using simply bilinear for too long, and it's time we had better quality zooms accelerated on the video card. Problem is, 3D pipelines aren't really set up for generic FIR filters, so the task is to convolute and mutate the traditional 4x4 kernel into something that a GPU understands.

To review, the 1D cubic interpolation filter used in VirtualDub is a 4-tap filter defined as follows:

tap 1 = Ax - 2Ax² + Ax³;
tap 2 = 1 - (A+3)x² + (A+2)x³;
tap 3 = -Ax + (2A+3)x² - (A+2)x³;
tap 4 = Ax² - Ax³;

where taps 2 and 3 straddle the desired point and x is the fractional distance from tap 2 to that point. Applying this both horizontally and vertically gives the bicubic filter. The fact that you calculate the 2D filter as two 1D passes means that the 2D filter is separable; this reduces the number of effective taps for the 2D filter from 16 to 8. We can do this on a GPU by doing the horizontal pass into a render target texture, then using that as the source for a vertical pass. As we will see, this is rather important on the lower-end 3D cards.

Now, how many different problems did I encounter implementing this? Let's start with the most powerful cards and work down:

DX9, some DX8 class cards (Pixel Shader 1.4: NVIDIA GeForce FX, ATI RADEON 8500+)
Six texture stages, high-precision fixed point arithmetic or possibly even floating-point. There really isn't any challenge to this one whatsoever, as you simply just bind the source texture to the first four texture stages, bind a filter LUT to the fifth texture stage, and multiply-add them all together in a simple PS1.4 shader. On top of that, you have fill rate that is obscene for this task so performance is essentially a non-issue. Total passes: two.

NVIDIA has some interesting shaders in their FXComposer tool for doing bicubic interpolation using Pixel Shader 2.0 in a single pass, without any need for temporaries. However, it chews up a ton of shader resources and burns a ton of clocks per pixel — I think the compiler said somewhere around 50 clocks. I'm not sure that's faster than a separable method and it chews up a lot of shader resources. Did I mention it requires PS2.0?? It does compute a more precise filter, however. I might add a single-pass PS2.0 path because it offers the possibility of more advanced effects such as doing warpsharp in the pixel shader.

I have a GeForce FX 5600 now, but when I first wrote this path, I had no PS1.4 capable card, so I had to prototype on the D3D reference rasterizer. Refrast's awe-inspiring 0.2 fps performance gives new meaning to "slow." Unfortunately, I think refrast is still a procedural rasterizer, like old OpenGL implementations; just about all other current software rasterizers now use dynamic code generation and run orders of magnitude faster.
DX8 class card (Pixel Shader 1.1: NVIDIA GeForce 3/4)
Four texture stages — not quite enough for single-pass 4-tap, so we must do two passes per axis. Now we run into a problem: the framebuffer is limited to 8-bit unsigned values, and more importantly, can't hold negative values. The way we get around this is to compute the absolute value of the two negative taps first into the framebuffer, then combining that with the sum of the two positive taps using REVSUBTRACT as the framebuffer blending mode. Sadly, clamping to [0,1] occurs before blending and there is no way to do a 2X on the blend so we must throw away 1 LSB of the image and burn a pass doubling the image, bringing the total to five passes. And no, I won't consider whacking the gamma ramp of the whole screen to avoid the last pass.
DX7 class card (Fixed function, two texture stages: NVIDIA GeForce 2)
This is where things get uglier. Only two texture stages means we can only compute one tap at a time, since we need one of the stages for the filter LUT. This means that 9 passes are required, four for the horizontal filter, four for the vertical, and one to double the result. As you may have guessed a GF2 or GF4Go doesn't have a whole lot of fill rate after dividing by nine and I have trouble getting this mode working at 30 fps above about 800x600. That sucks, because my development platform is a GF4Go440.

I came up with an alternate way to heavily abuse the diffuse channel in order to do one tap per texture stage: draw one-pixel wide strips of constant filter (vertical for the horizontal pass, horizontal for the vertical pass) and put the filter coefficients in the diffuse color. This cuts the number of passes down to five as with the GF3/4 path. Unfortunately, this turns out to be slower than the nine pass method. I doubt it's T&L load, because 5600 triangles in 4 batches isn't exactly a complex scene; more likely the render target textures are tiled and I'm blowing the tiling pattern by drawing strips. Sigh.

I've been racking my brain trying to bring this one below nine passes, but I haven't come up with anything other than the method above that didn't work.
DX7 class card (Fixed function, three texture stages: ATI RADEON)
Three texture stages means we can easily do two taps at a time for a total of five passes, which should put the original ATI RADEON on par with the GeForce 3 for this operation. Yay for ATI and the third texture stage! Oh wait, this card doesn't support alternate framebuffer blending operations and thus can't subtract on blend. On top of that, D3D lets us complement on input to a blending stage but not output, and we can't do the multiply-add until the final stage. Never mind, the original RADEON sucks. So now what?

We first compute the two negative taps using the ugly but useful D3DTOP_MODULATEALPHA_ADDCOLOR. How do we handle the negation? By clearing the render target to 50% gray and then doing a blend into it with a source factor of zero and a destination factor of INVSRCCOLOR, basically computing 0.5(1-src). We then add the two positive taps with their filter scaled down by 50% and using a straight 1+1 additive blend. The result is the filtered pixel, shifted into the [0.5, 1] range. The vertical pass is computed similarly, but with input complement on both passes to flip the result inverted to [0, 0.5], after which we can take advantage of the already narrowed input range to compute the vertical filter with slightly higher precision. (The filtering operation is linear and can be commuted with the complement.) The final pass then doubles the result with input complementation again to produce the correct output. Rather fugly, but it does work. The precision isn't great, though, slightly worse than the GeForce 2 mode.

Interestingly, the RADEON doesn't really run any better than the GeForce 2 despite having half the passes.
DX0 class card (Intel Pentium 4-M 1.6GHz)
Here's the sad part: a highly optimized SSE2 bicubic routine can stretch a 320x240 image to 1280x960 at 30fps and still leave enough time left over to upload the result to the video card. That means systems with moderate GPUs and fast CPUs are better off just doing the bicubic stretch on the CPU. Argh!

You might be wondering why I'm using Direct3D instead of OpenGL. That is a valid question, given that I don't really like Direct3D (which I affectionately call "caps bit hell"). The reason is that I wrote a basic OpenGL display driver for 1.5.5 and found that it was unusable due to a bug in the NVIDIA drivers that caused a stall of up to ten seconds when switching between display contexts. The code has shipped and is in 1.5.10, but is hard-coded off in VideoDisplayDrivers.cpp. I might resurrect it again as NVIDIA reportedly exposes a number of features in their hardware in OpenGL that are not available in Direct3D, such as the full register combiners, and particularly the final combiner. However, I doubt that there's anything I can use, because the two critical features I need for improving the GF2 path are either doubling the result of the framebuffer blend or another texture stage, both of which are doubtful.

4/26/2004 News: YV12 is b0rked

My daily commute takes me across the San Mateo Bridge. Coming back from the Peninsula there is a sign that says: "Emergency parking: 1/4 mile."

Several people suggested __declspec(naked) for the intrinsics code generation problem. Sorry, not good enough. Not only does naked disable the frame pointer omission (FPO) optimization and prevent inlining, but it also doesn't stop the compiler from using spill space if it needs to — which means you basically have to set up a stack frame anyway.

I've been trying for some time to get YV12 support working perfectly, but at this point it looks like a wash. The problem is that different drivers and applications are inconsistent about how they treat or format odd-width and odd-height YV12 images. Some support it by truncating the chroma planes (dumb). Some do that and have unused space betweeen the Cr and Cb planes (weird). Many simply crash (very dumb). And a few simply don't support it (lame but pragmatic). YVU9 tends to be even more broken. Arrrgh.

Now, if people had sense, they would have handled this the way that MPEG and JPEG do, and simply require that the bitmap always be padded to the nearest even boundaries and that the extra pixels be ignored on decoding. Unfortunately, no one seems to have bothered to ever define the YV12 format properly in this regard, and thus we have massive confusion.

3/17/2004 News: Taking the 64-bit plunge

First, I finally fixed the FAQ link from the program, and also updated the knowledge base for known bugs in 1.5.10. I dropped the older KB entries, but they're basically redundant with the change log in VirtualDub.

I put together a new Athlon 64 based system a few days ago, installed the preview of Windows XP for 64-bit Extended Systems and the prerelease AMD64 compiler, and hacked up the VirtualDub source code a bit. The above is the result. It plays MPEG-1 files, but nearly all of the assembly optimizations are disabled and none of the video filters work, so it's still far behind the 32-bit version, but it's still neat to be able to experiment with 64-bit code.

Although I have it installed for some time now, I've been avoiding using Visual Studio .NET 2003. The incremental improvements in the compiler simply aren't worth putting up with the braindead, butt-slow IDE. Thus, I've been continuing to use Visual C++ 6.0 SP5+PP. Well, after installing XP64, I was vindicated — none of the VS.NET IDEs will install on it, because they rely on the 32-bit .NET Framework, which currently doesn't work under WOW32. Which means I'm using... VC6, with the pre-release VC8 compiler from the Windows Server 2003 DDK. This is a bit clumsy since the VC6 debugger doesn't understand VC7+ debug info, and certainly can't debug a 64-bit app, so I have to use the beta AMD64 WinDbg instead, but at least I have the AMD64 build in the same project file as the 32-bit build. Having a configuration called "Win32 Release AMD64" is a bit weird, however.

There are two major bottlenecks to getting VirtualDub running smoothly on AMD64: the compiler doesn't support inline assembly, and the OS doesn't support MMX for 64-bit tasks. I know a few of you are going to yell out "use compiler intrinsics," but please look at this first:

#include <xmmintrin.h>

unsigned premultiply_alpha(unsigned px) {
	__m64 px8 = _m_from_int(px);
	__m64 px16 = _m_punpcklbw(px8, _mm_setzero_si64());
	__m64 alpha = px16;

	alpha = _m_punpckhwd(alpha, alpha);
	alpha = _m_punpckhwd(alpha, alpha);

	__m64 result16 = _m_psrlwi(_m_pmullw(px16, alpha), 8);

	unsigned x = _m_to_int(_m_packuswb(result16, result16));

	_mm_empty();

	return x;
}

Visual C++ 6.0	Visual Studio .NET 2003	What I want
push ebp mov ebp,esp and esp,0FFFFFFF8h sub esp,8 movd mm1,dword ptr [ebp+8] pxor mm0,mm0 punpcklbw mm1,mm0 movq mm0,mm1 movq mm2,mm0 punpckhwd mm2,mm1 movq mm1,mm2 punpckhwd mm1,mm2 pmullw mm0,mm1 psrlw mm0,8 movq mmword ptr [esp],mm0 emms movq mm0,mmword ptr [esp] movq mm1,mm0 packuswb mm0,mm1 movd eax,mm0 mov esp,ebp pop ebp ret	push ebp mov ebp,esp and esp,0FFFFFFF8h sub esp,8 movd mm1,dword ptr [ebp+8] pxor mm0,mm0 punpcklbw mm1,mm0 movq mm0,mm1 punpckhwd mm1,mm0 movq mm2,mm1 punpckhwd mm2,mm1 pmullw mm0,mm2 psrlw mm0,8 movq mmword ptr [esp],mm0 emms movq mm0,mmword ptr [esp] movq mm1,mm0 packuswb mm1,mm0 movd eax,mm1 mov esp,ebp pop ebp ret	movd mm0, [esp+4] pxor mm7, mm7 punpcklbw mm0, mm7 movq mm1, mm0 punpckhwd mm0, mm0 punpckhwd mm0, mm0 pmullw mm0, mm1 psrlw mm0, 8 packuswb mm0, mm0 movd mm0, eax emms ret

Visual C++ 6.0

Visual Studio .NET 2003

What I want

push        ebp
mov         ebp,esp
and         esp,0FFFFFFF8h
sub         esp,8
movd        mm1,dword ptr [ebp+8]
pxor        mm0,mm0
punpcklbw   mm1,mm0
movq        mm0,mm1
movq        mm2,mm0
punpckhwd   mm2,mm1
movq        mm1,mm2
punpckhwd   mm1,mm2
pmullw      mm0,mm1
psrlw       mm0,8
movq        mmword ptr [esp],mm0
emms
movq        mm0,mmword ptr [esp]
movq        mm1,mm0
packuswb    mm0,mm1
movd        eax,mm0
mov         esp,ebp
pop         ebp
ret

push        ebp
mov         ebp,esp
and         esp,0FFFFFFF8h
sub         esp,8
movd        mm1,dword ptr [ebp+8]
pxor        mm0,mm0
punpcklbw   mm1,mm0
movq        mm0,mm1
punpckhwd   mm1,mm0
movq        mm2,mm1
punpckhwd   mm2,mm1
pmullw      mm0,mm2
psrlw       mm0,8
movq        mmword ptr [esp],mm0
emms
movq        mm0,mmword ptr [esp]
movq        mm1,mm0
packuswb    mm1,mm0
movd        eax,mm1
mov         esp,ebp
pop         ebp
ret

movd        mm0, [esp+4]
pxor        mm7, mm7
punpcklbw   mm0, mm7
movq        mm1, mm0
punpckhwd   mm0, mm0
punpckhwd   mm0, mm0
pmullw      mm0, mm1
psrlw       mm0, 8
packuswb    mm0, mm0
movd        mm0, eax
emms
ret

This, historically, is why I have not bothered to use MMX/SSE/SSE2 compiler intrinsics in VirtualDub — the code generation sucks. The VC6 processor pack was quite bad and tended to generate about two move instructions for every ALU op; this was improved in VS.NET 2003, but it still isn't able to resolve binary ops of the form A+A correctly, which I use a lot. But there is an even worse problem — note that the compiler moved MMX ops below the emms instruction. The generated code is wrong! The problem is that the global optimizer doesn't see emms instructions as a barrier and freely flows ops around it, leading to incorrect code. Believe it or not, using __asm emms doesn't work either — the only workaround I know of is to use volatile to hammer down the flow before the emms, and that's ridiculous.

Never mind that the intrinsics version is also quite unreadable.

But wait — we're in the era of Pentium 4 and Athlon 64 CPUs. We don't need this emms junk, because we should be using SSE2! So let's use SSE2:

#include <emmintrin.h>

unsigned premultiply_alpha(unsigned px) {
	__m128i px8 = _mm_cvtsi32_si128(px);
	__m128i px16 = _mm_unpacklo_epi8(px8, _mm_setzero_si128());
	__m128i alpha = _mm_shufflelo_epi16(px16, 0xff);

	__m128i result16 = _mm_srli_epi16(_mm_mullo_epi16(px16, alpha), 8);

	return _mm_cvtsi128_si32(_mm_packus_epi16(result16, result16));
}

Visual C++ 6.0	Visual Studio .NET 2003	What I want
push ebp mov ebp,esp pxor xmm0,xmm0 movdqa xmm1,xmm0 movd xmm0,dword ptr [ebp+8] punpcklbw xmm0,xmm1 pshuflw xmm1,xmm0,0FFh pmullw xmm0,xmm1 psrlw xmm0,8 movdqa xmm1,xmm0 packuswb xmm1,xmm0 and esp,0FFFFFFF0h movd eax,xmm1 mov esp,ebp pop ebp ret	push ebp mov ebp,esp pxor xmm0,xmm0 movdqa xmm1,xmm0 movd xmm0,dword ptr [ebp+8] punpcklbw xmm0,xmm1 pshuflw xmm1,xmm0,0FFh pmullw xmm0,xmm1 psrlw xmm0,8 movdqa xmm1,xmm0 packuswb xmm1,xmm0 and esp,0FFFFFFF0h movd eax,xmm1 mov esp,ebp pop ebp ret	pxor xmm1,xmm1 movd xmm0,dword ptr [esp+4] punpcklbw xmm0,xmm1 pshuflw xmm1,xmm0,0FFh pmullw xmm0,xmm1 psrlw xmm0,8 packuswb xmm0,xmm0 movd eax,xmm0 ret

Visual C++ 6.0

Visual Studio .NET 2003

What I want

push        ebp
mov         ebp,esp
pxor        xmm0,xmm0
movdqa      xmm1,xmm0
movd        xmm0,dword ptr [ebp+8]
punpcklbw   xmm0,xmm1
pshuflw     xmm1,xmm0,0FFh
pmullw      xmm0,xmm1
psrlw       xmm0,8
movdqa      xmm1,xmm0
packuswb    xmm1,xmm0
and         esp,0FFFFFFF0h
movd        eax,xmm1
mov         esp,ebp
pop         ebp
ret

push        ebp
mov         ebp,esp
pxor        xmm0,xmm0
movdqa      xmm1,xmm0
movd        xmm0,dword ptr [ebp+8]
punpcklbw   xmm0,xmm1
pshuflw     xmm1,xmm0,0FFh
pmullw      xmm0,xmm1
psrlw       xmm0,8
movdqa      xmm1,xmm0
packuswb    xmm1,xmm0
and         esp,0FFFFFFF0h
movd        eax,xmm1
mov         esp,ebp
pop         ebp
ret

pxor        xmm1,xmm1
movd        xmm0,dword ptr [esp+4]
punpcklbw   xmm0,xmm1
pshuflw     xmm1,xmm0,0FFh
pmullw      xmm0,xmm1
psrlw       xmm0,8
packuswb    xmm0,xmm0
movd        eax,xmm0
ret

The code is at least correct this time, but it is still full of unnecessary data movement, which consumes decode and execution bandwidth. Now for the real kicker: those extraneous moves hurt on a Pentium 4, because on a P4, a register-to-register MMX/SSE/SSE2 move has a latency of 6 clocks. If you have extra shifter or ALU bandwidth you can attack this by replacing movdqa with pshufd or pxor+por, but you can't do this when the compiler is generating code from intrinsics. And before you say that performance doesn't matter so much, remember that the purpose of those intrinsics is so that you can optimize hotspots using CPU-specific optimizations. 10-20% in a critical inner loop matters.

This all only pertains to the Microsoft Visual C++ compiler, and as it turns out, the Intel C/C++ Compiler generates much better MMX and SSE2 code. I suspect recent versions of gcc would beat MSVC too. As it stands right now, though, I still have to use Visual C++, and that means I'm still going to have to hand-roll a lot of assembly code for performance. And with AMD64, that means I'm going to have to duplicate and reflow a lot of it.

2/9/2004 News: Playing hooky

I finally caught that nasty head cold that seems to be travelling everywhere this month. It doesn't make you very ill, but just makes you stuffy, irritable, grumpy, and hoarse — sufficient for me to call in sick for the first time in quite a while. I probably could have gone in and gotten some work done, but hacking and coughing is not a great sound for your coworkers to hear.

Besides, we already recorded all the sound effects we need.

While sitting at home enduring a feeling that can only be described as "oogy," I did the only thing I can do at such a time: code. The Super Nintendo version of Konami's "Tokimeki Memorial" plays a tune during its main game start screen that I've been trying to find for a while — it's a fast version of Shiori's theme. I have several versions of it called yuunagi tayori, but they range from slow to annoyingly slow to even painfully slow, none of which I like nearly as much. Collecting Tokimemo CDs has given me a much nicer collection of music to listen to but none of them have the song I want. And I can play the .spc version of it through WinAmp, but it sounds... like an SNES.

So I decided to write my own SPC player, to learn how SNES music works and (maybe) extract the music into nicer formats.

SNES music involves two major components: an 8-channel DSP that generates sound, and an SPC700 microcontroller to control it. The SPC700 is an interesting beast. Its instruction set shares a lot in common with that of the venerable 6502, but unfortunately the instruction decoding template isn't quite as regular, as some oddball instructions decode two effective addresses. I've been doing x86 so long that I kept making stupid mistakes in the CPU emulation, such as goofing the carry bit on SBC (it's flipped on 6502/SPC700 vs. x86), and forgetting to commit the result of ADC back to the register file. For several hours I ended up staring at instruction execution traces, trying to figure out which of the 150 instructions that just executed didn't work correctly, in foreign code that I haven't dissected properly, on a foreign CPU that I've never coded for before.

Somehow, it sorta works right now. Compressed sample decoding isn't quite right and the ADSR envelopes are bogus, but the melody is there and the instruments are recognizable. What's weird is that the CPU emulation is still screwed up in a way that causes the various tracks to eventually desync. Apparently, Konami's sound player has independent data streams for each of the eight sound channels. The result is that the different bass and melody tracks eventually rotate away from each other in the looping portion of the song, and it sounds pretty cool, like a remix. I'm going to have to save off this version and figure out how to reproduce this bug later after I've gotten the emulation working correctly.

Either that, or it goes on Avery's Random Pile of Half-Baked and Vaguely Working Projects. But before that can happen, it has to be named after a totally unrelated anime character, like Atsuko (my fanfic reader), or Takuya (my x86 dynamic recompiler).

VirtualDub 1.6.0 is still in progress. I'm getting closer to declaring alpha and starting to clean up and stabilize the build, but I'm not there yet; I have a couple of half-baked features I have to decide yay/nay on and check that I haven't forgotten any major promised ones.

2/3/2004 News: Fix for AVI files being locked after seeing them in Windows XP Explorer

Thanks to an observant user on the forums, I found one reason for Windows XP Explorer locking AVI files after opening its parent folder:

If you have VirtualDub's frameserver installed in proxy mode (proxyon.reg) on a Windows XP system, turn it off with the proxyoff.reg file and then restart Windows or log out of your current session so Explorer restarts.

When proxying is enabled, VirtualDub installs its frameserver under the regular Windows AVI driver, tunneling AVI files through to AVIFile and AVS files through to Avisynth. The tunnel code leaks reference counts on the tunneled objects and this causes the corresponding files to stay locked in read-only mode until the process dies... which in this case, is Explorer. My bad — I will fix this for 1.6.0. If proxying is not enabled, this only affects .vdr files. I'm surprised it took this long to find out the cause. The good news is that this also explains why some applications don't work properly even with proxy mode.

1/27/2004 News: Inbox is offline

Hadn't planned to post a news entry today, but a note: thanks to the new f#$&*(#ing virus going around, my Inbox is for all intents and purposes offline. By "offline," I mean I am receiving what looks to be about 8-10 emails per minute right now. If you have something to email me about in the next couple of days you may want to wait until the storm has quieted down, especially since my email account is likely to become full at this rate while I am sleeping or at work. Those of you who still want to email me, please use a distinctive subject and don't title your email with something stupid like "problem" — they're hard to pick out amongst the viruses and spam and with the volume I'm seeing right now I'm liable to delete it.

12/14/2003 News: 3D solution

No one solved the 3D unprojection problem before I did, so here's the solution.

The original problem outlined last time, for those who didn't see it or don't remember, is to find a transform to unproject an image to a rectangle, only given its projection and four source points on the 2D projection image. So for those of you who thought Z values from the depth buffer were available, bzzzzzzzt. Depth buffers don't come attached with video frames.

The coordinates of normalized device coordinate (NDC) space are [-1, +1] for both the X and Y axes, after perspective division. Given four points (xn, yn) in original image space, we need to derive a 3x3 matrix transform that produces four homogeneous points (xn', yn', wn') that map to the NDC space corners. The matrix will be applied as follows:

x' = A*x + B*y + C
y' = D*x + E*y + F
w' = G*x + H*y + I

Requiring the four points to map to the corners of NDC space result in the following constraints:

x0'/w0' = -1
x1'/w1' = +1
x2'/w2' = -1
x3'/w3' = +1
y0'/w0' = -1
y1'/w1' = -1
y2'/w2' = +1
y3'/w3' = +1

Simple algebraic transformation transforms the divides into additions and subtractions:

x0 + w0 = 0
x1 - w1 = 0
x2 + w2 = 0
x3 - w3 = 0
y0 + w0 = 0
y1 + w1 = 0
y2 - w2 = 0
y3 - w3 = 0

The matrix transform can then be used to substitute the original points for the post-transform points:

A*x0 + B*y0 + C                   + G*x0 + H*y0 + I = 0
A*x1 + B*y1 + C                   - G*x1 - H*y1 - I = 0
A*x2 + B*y2 + C                   + G*x2 + H*y2 + I = 0
A*x3 + B*y3 + C                   - G*x3 - H*y3 - I = 0
                + D*x0 + E*y0 + F + G*x0 + H*y0 + I = 0
                + D*x1 + E*y1 + F + G*x1 + H*y1 + I = 0
                + D*x2 + E*y2 + F - G*x2 - H*y2 - I = 0
                + D*x3 + E*y3 + F - G*x3 - H*y3 - I = 0

This is a underconstrained linear system with eight equations and nine unknowns — no good. However, a homogeneous 3x3 transform can be arbitrarily scaled, so setting I=1 drops the ninth unknown:

A*x0 + B*y0 + C                   + G*x0 + H*y0 = -1
A*x1 + B*y1 + C                   - G*x1 - H*y1 = +1
A*x2 + B*y2 + C                   + G*x2 + H*y2 = -1
A*x3 + B*y3 + C                   - G*x3 - H*y3 = +1
                + D*x0 + E*y0 + F + G*x0 + H*y0 = -1
                + D*x1 + E*y1 + F + G*x1 + H*y1 = -1
                + D*x2 + E*y2 + F - G*x2 - H*y2 = +1
                + D*x3 + E*y3 + F - G*x3 - H*y3 = +1

The result is now solvable via simple Gaussian elimination, and the 3x3 matrix can be converted to standard 4x4 form by setting Z=0 on input and output. I am not entirely sure that I=1 is safe, but it appears so. The only case in which I=1 is impossible is if w must be zero at (x,y)=(0,0), which means that the unprojected form requires that point to be at infinity. That seems extremely rare even when (0,0) is not within the projected image.

In any event, the transform works and is more stable than I'd thought it would be, so I'm sticking with it. The fact that a single 3x3 transform can encompass all possible required rotations, shears, translations, flips, and projections is not obvious, but it is true. The algorithm can also be used to transform any convex quad into any other convex quad, simply by applying it twice, once on the source points, and a second time with the target points and the resultant matrix inverted.

12/6/2003 News: 3D fun

I suppose by this point I should really label this page the "blog page" rather than the news page, but oh well.

I'm currently developing a 3D software rasterizer in the 1.6.x branch. Why? Because I can. Actually, one has been in VirtualDub for quite some time now: it's in the module for the About box. Transform and lighting, triangle setup, rasterization, and texturing is all done manually and runs full speed even on a lowly Pentium. Of course, it's generally only drawing a few thousand pixels per frame. But I digress.

Why do I need a 3D rasterizer? I got the idea to write a deprojection filter, to correct for an off-center camera position. (If you always choreograph your shots perfectly at the correct angle at the right time, good for you.) The speed sucks right now at about 8 Mpixels/sec with mipmap generation, trilinear filtering, and per-pixel perspective correction enabled, but I can speed that up later. I had to make sure there were no dropouts, that subpixel addressing was working, that the lambda determination worked properly, etc. first, and at least the image quality is good.

The problem I'm having right now is not in the triangle rasterizer, but in the filter that uses it. I have the forward transform determination working nicely, where a flat image is pasted onto an oblique plane, but I can't figure out the reverse transform, to convert the projected image back to the view plane. I have no need for correct depth coordinates since I'm not doing Z- or W-buffering, and thus I think I can abuse homogeneous transforms to get the required mapping. I have a plain old 4x4 OpenGL-style matrix, so I can do just about any such transform. The problem is that I don't know how to derive the transform from a projected rectangle on a plane to the viewport, since the transformation ultimately isn't linear. Ideally, the user would only specify the corners of the projected rectangle, and appropriate depth values would be inferred to create the transform.

I have a nagging suspicion that the quad-to-quad transform isn't hard, but that I might need projective texturing to unwarp the source texture correctly. With projective texturing, not only is the destination coordinate interpolated homogeneously as [x/w y/w z/w 1/w], but so is the texture coordinate, as [s/q t/q 1/q]. The usual way to handle this is apparently to turn the 1/w divide into q/w, making q almost free. Unfortunately, my texture mapper is not straight divide or affine subdivision based — it uses Newton-Raphson iteration to compute 1/w per pixel and so I'd have to throw in an extra multiply. Thus it wouldn't be so cheap to add it, and I'd like to do the deprojection solely using 1/w if possible.

The bonus of getting this transform right is that if I were to enable the existing OpenGL path in the display code, or add a Direct3D version to it, I could trivially plunk the transform into the projection matrix and do the de-projection in real-time for free on previews. But I have to get the algorithm working first. Any 3D experts reading this that are bored and willing to explain the solution to me? My copies of Real-Time Rendering and Jim Blinn's Corner: A Trip Down the Graphics Pipeline aren't helping. :)

12/2/2003 News: VirtualDub 1.5.10 released

VirtualDub 1.5.10 is out -- it fixes a couple of critical crashes, one of them being the VideoCD crash, and the other being a stability issue on Windows 95/98 systems. Full props go to "fccHandler" for finding the bug in the source code that caused the latter problem. This version also fixes a few random problems I happened to identify on the way. Work is still progressing on the experimental version, which I hope to get to releasable — major-embarrassment-free — status in the near future.

1.5.10 contains a workaround for a rather sticky problem with certain filters, such as Deflicker. Basically, some filters that rely on separate analysis and render passes make a slightly invalid assumption — that once the analysis pass finishes, the next startProc call received will be the user starting the render pass. Well, this isn't actually guaranteed in the spec and recent 1.5.x versions break such filters when they refresh the output pane after the analysis. The result is that the filter dumps its analysis data and builds a one-frame trace before the render starts. Whoops. This problem actually exists in all versions of VirtualDub, but in earlier versions you have to explicitly step the current frame position in order for the filter chain to restart, whereas 1.5.9 will do it any time the output pane needs to be refreshed and the filter chain is idle. A possible workaround is to disable the output pane before doing the analysis pass, although I haven't actually tried this.

Video codecs that support multi-pass modes, such as DivX, are not affected by the problem as they are not fed frames except during an actual render to disk. They do receive extra start/end notifications in the video compression dialog, but anyone who has tried testing a multi-pass codec against VirtualDub has probably discovered this long ago. (It's a workaround for some early codecs that accept formats in ICCompressQuery(), but then reject them in ICCompressBegin().)

The change in 1.5.10 is that the FilterStateInfo structure contains an extra field indicating whether a preview is active. Unfortunately, filter authors will have to add a check for this structure field and recompile their filter against the filter.h header from the 1.5.10 source to take advantage to this. I have bumped the API version so that this can be done without breaking compatibility with earlier hosts.

Backwards compatibility, while desirable, is a huge pain.

I had an interesting encounter this weekend while playing Final Fantasy XI. While in Selbina I partied up with a few Japanese people who mostly didn't speak English — and, of course, I don't speak Japanese. I can read some Hiragana and Katakana glyphs, so I could mostly figure out who they were addressing and simple questions like "are you ok?". Beyond that, though, the most I ended up with was making an idiot out of myself by typing token phrases in romanji that I learned from anime. (Perhaps the most embarrassing was that one of them asked in romanji if I understood romanji, and I answered "iie" without thinking.) Trying to communicate across the language barrier is kind of fun, especially since in this case the worst that happens is that you end up lost in Vana'diel or perhaps virtually die a couple of times, and because words aren't necessary to convey "I'm getting whaled on." I do feel a bit ashamed, though, that my Japanese party members did know some English, while the only other languages I'm fluent in are C/C++ and assembly. I've always wanted to learn Japanese, but it's very difficult to do so without (a) serious time and effort placed into learning it, (b) an immersive environment, and (c) a real dictionary.

P.S. The "automatic translation" ability in the game is rather useless, as you have to choose from an incomplete list of phrases, and the menus that display the phrases are far too small so they all ellipsize ("Are you...").

P.P.S. A game that leaves your character stranded in the world because you hit the Windows key or Alt-Tab, and then displays an error dialog saying "Final Fantasy XI quit because the app lost full screen mode" is lame. This has little to do with the language barrier but it's so stupid I had to mention it. I tried using my own custom WinKey blocker which uses the Windows NT low-level keyboard hook, but for some reason it failed when FFXI was running (DirectInput?).

Current build (1.5.10, stable):
   [features added]
   * Removed "accept partial streams" from MPEG-1 options
     and made it enabled by default; added warning.
   * Filters are now notified whether a render is for
     preview or output purposes.

   [bugs fixed]
   * Fixed a stall condition at end of render when advanced
     audio pipeline is active.
   * Fixed "frame not found" errors when processing
     truncated MPEG-1 streams.
   * BMP reader can now handle BITMAPCOREHEADER type
     headers (fixes incompatibility with ZSNES
     screenshots).
   * Filters were receiving garbage frame timings in
     capture mode.

   [regressions fixed]
   * Fixed instability in application when parsing VideoCD
     streams.
   * Fixed crash on exit on Windows 9x systems.
   * Fixed visual errors in input pane when decoding
     Microsoft Video 1 to a 565 16-bit display.

11/18/2003 News: VideoCDs and 64-bit computing

There is a bug in the VideoCD MPEG-1 parser in VirtualDub 1.5.8 and 1.5.9 that causes heap corruption, and thus application instability. Please avoid processing VideoCD MPEG-1 (.dat) files with those versions. I wish it had been reported on 1.5.8 so I could have fixed it before the next release. The parser is rather hacky to begin with, though -- you'll probably have better luck with third-party demuxing tools anyway as VideoCDs are rather prone to bit errors, since they are written without the second-level of correction that most CDs have. Regular MPEG-1 streams shouldn't trigger the bug.

Lots of people are apparently having trouble figuring out the changes I made to the "Save Processing Settings" command. Folks, it's really simple: there is a checkbox on the dialog where you can specify whether the edit list is saved. Check it if you are preserving settings for a particular file, uncheck it if you want to use the settings on other files.

Every once in a while I hear people saying that we don't need 64-bit on the desktop. Well, us developers will need it soon. If you have a large program, it may take a couple hundred megs of memory to compile and link without swapping. Another few hundred megs is required to keep object, library, and debugging database (.pdbs in Visual C++) resident in the disk cache. And if you're working on something that requires a large data set to start up, like say, a game, you can need another few hundred megs to keep that resident in the disk cache. So to do a full compile/link/test cycle you need a full gigabyte of RAM on the machine. Try it with half a gig and the disk cache ends up thrashing so both the compile and the program load hit disk, which is now three orders of magnitude slower than memory, doubling or tripling the cycle time. Doh. Look two or three years down the road, and it's not hard to envision average developers hitting 2GB soon, which is where the trouble starts.

Current x86 processors have 36-bit addressing and can address up to 64GB of RAM, so you might think this is a non-issue. Unfortunately, the extensions required to access that much memory are not always pleasant, and even if the whole OS can address more than 2GB you can't get that much to applications. Win32 applications only get 2GB by default, with 3GB being possible on 2000/XP if a Physical Address Extension (PAE) kernel is in use. You can only get that 3GB, though, if your drivers and your software are PAE-capable. Under Windows NT, the disk cache is itself a process, so it suffers from the same addressing limitations as a regular process.

As for speedups from 64-bit computing, don't expect too much. 64-bit brings some downsides over regular 32-bit, particularly in terms of higher memory usage and thus lower cache locality due to the larger pointers. Existing x86 CPUs already have at least one 64-bit ALU, for floating-point and MMX, so the increased width of general-purpose calculations itself isn't going to help for applications that already make heavy use of CPU extensions. In the specific case of AMD64 (x86-64), it appears that most of the gains come from the increased number of registers, compared to the IA-32 architecture.

11/9/2003 News: VirtualDub 1.5.9/stable released

It's Sunday again. I don't plan to keep this pace forever, but you might as well enjoy it while you can. :)

The major fix in this version-of-the-week is for a dumb oops in a fast display copy routine, but one fix that isn't mentioned in the release notes is that I fixed a lot of compile errors that only occur under Visual Studio .NET 2003. VirtualDub's main compiler is still Visual C++ 6.0 SP5+PP, and not .NET 2003, because I refuse to put up with the slow, half-broken IDE of the latter. (It's not that I'm unfamiliar with the 2003 IDE, because I use it at work. It's just that I hate it.) However, apparently some people are trying to compile VirtualDub under 2003 and are discovering that there are a lot of compile errors due to new overloads of C runtime functions that Microsoft added to the RTL for improved standards compliance. I've added the requisite casts to fix that problem; VS.NET 2003 users should also disable buffer security checks (/GS) and warnings 4018 and 4244 on the imported project.

There are two other major incompatibilities between VC6 and VC7.1 that you may encounter. One is that some of my VC6 resource scripts have "#include "afxres.h"" replaced with "#include "winres.h"" to allow the .rc to compile without MFC installed. VC7.1 has a newer Platform SDK header set so this must be substituted instead:

#ifndef IDC_STATIC
#define IDC_STATIC (-1)
#endif
#include "winresrc.h"

The other is a problem with the definition of the wide character functions in the RTL, such as iswspace. There are both library and inline versions of the functions, and unfortunately, depending on which of <wchar.h> and <ctype.h> you include, either can get used. If one module uses library wide-char functions and the other uses inline functions, you get a link error even if the same function isn't used. This wouldn't be much of a problem except apparently the two headers are referenced in interesting ways through other header files and I've found it's very easy to produce a project that links on VC6 and doesn't link on VC7.1, and vice versa. This is very annoying. I want to create a system where I can work on VirtualDub and VC6 and batch-create the corresponding .sln and .vcproj for VC7.1, but this won't be possible if I constantly have to frob code around to get _iswspace to link correctly.

Current build (1.5.9, stable):
   [features added]
   * Made 'autodetect additional segments by filename'
     option sticky.
   * Removed trackbar ticks when ticks are a solid bar
     to speed up edits on very long timelines.
   * Added option to disable use of DirectX in video
     displays under Preferences/Display.

   [bugs fixed]
   * Fixed odd lock-to-keyframe behavior with edit lists
     that have out of order segments.
   * Fixed move-to-next-keyframe command at end of time-
     line.
   * Fixed decompression of 1-bit and 4-bit uncompressed
     AVI files under Windows 95/98.
   * Audio compression dialog showed the wrong set of
     valid formats if a precision was selected under
     Audio Conversion.
   * The current edit list is no longer applied to batch
     jobs created from entire directories.

   [regressions fixed]
   * Fixed display crashes with odd-width images.
   * Segment loading wasn't hopping across drives to pick
     up segments from a multisegment capture, as directed
     by the AVI's segment hint block.

11/2/2003 News: VirtualDub 1.5.8/stable released

Random Helpful Tip: The heat dissipation of a 3GHz Pentium 4 CPU is not wasted if your room is freezing cold, as mine is right now. I'm tempted to overclock it in order to warm the room up some more.

1.5.8 is out on SourceForge and is once again a minor "stable" release. Those of you who actually read my change notes (all three of you) will notice that the "access denied SMP bug fix" has mysteriously jumped from 1.5.7 to 1.5.8. This fix was supposed to go into 1.5.7, but got omitted at the last minute due to a source code control error. (It was stuck in the client spec of the machine that had bad RAM.) The fix has been pushed out with 1.5.8 and those of you with SMP or HyperThreaded systems should no longer have to whack CPU affinities to bypass the random errors. Other goofs that have been fixed are an inability to run under Windows 95 (oops), and displays not coming up with 8-bit paletted video.

Work is still continuing on the experimental branch, which has now been pushed to 1.5.9, obviously. I have a cleaner and more versatile image library in progress that will support planar formats, and in particular, the 1.5.9/exp display code can now handle YV12. Prototyping of a new video filter system is also in progress, although it's still quite rudimentary and hasn't been hooked into the main system yet. The toughest part is figuring out what I do and don't want to support; if I had tried to support everything I could have thought of when I made the original filter API I never would have finished it. The good news is that the old filter API can easily be retrofitted, so whatever I come up with should be able to support existing filters.

Current build (1.5.8, stable):
   [features added]
   * DirectDraw support is disabled when Terminal Services
     or Remote Desktop clients are detected to work around
     a DirectX clipping bug.
   * Re-enabled places bar on open and save dialogs.
   * Disabled FPU state warning and made the fixup silent.
     WAY too many drivers are screwing up the FPU unit.
   * Edit lists can now be omitted from configuration
     files without needing to close the source file.

   [bugs fixed]
   * Main window is disabled during MPEG-1 scan to prevent
     crash if main window is closed.
   * Previous-key and next-key movement commands were not
     correct for B-frames in an MPEG-1 file.
   * Fixed decoding of MPEG-1 B-frames at the start of
     GOPs with broken_link set.
   * Palette change blocks no longer appear as garbage
     video streams.  In-stream palette changes are still
     not supported, however.
   * biSizeImage was incorrect when using fast recompress
     in YV12 mode.
   * Sequence appends failing on the first file now throw
     an error rather than a warning.

   [regressions fixed]
   * A race condition in the fast write code occasionally
     resulted in spurious write errors.
   * Fixed swapped 00db/00dc tags in AVI output.
   * Fixed display of 8-bit paletted video.
   * Program starts under Windows 95 again.

10/22/2003 News: Stupid shell tricks, revisited

Thanks to all who emailed me about the "echo.off" solution to the puzzle from the last news entry. You can stop now. :)

For those of you running VirtualDub through Windows Terminal Server (or Windows XP Remote Desktop), there is a bug in Terminal Server's DirectDraw support: blits partially outside of the primary surface appear at the top-left instead of their intended origin. The problem is reproducible with the DirectX 7 sample applications. Unfortunately, current versions of VirtualDub do not have an option for disabling DirectDraw support; I will need to add this in, as well as code to automatically fall back to GDI when a Terminal Services Client is connected. Arranging windows so that the video panes aren't partially off the screen should work around the problem.

I've recently been having stability issues with my desktop system and thought originally that it was due to bad drivers. (It's bad when you md5sum -b d2exp.mpq twice and get different checksums.) I later thought it might be caused by a bad patch from Windows Update, after WinDbg pointed out a kernel SendMessage call in some of the minidumps. Turns out it was neither; after running Memtest86 on a hunch, it turned out that one of the RAM sticks in the system had a subtle pattern-sensitive flaw in it at PC1066 speeds. What sucks is that this system is RDRAM-based so the memory is more expensive than DDR, which is dirt cheap. Bleah. Don't bother telling me to switch to DDR, because 256MB of RDRAM is still cheaper than a new motherboard.

10/20/2003 News: VirtualDub 1.5.7 (stable) released

Thanks to the miracle of source code control branching, VirtualDub 1.5.7 is out, and it contains none of the new features I'm currently working on. As a pure bug-fix release, though, it should bring nothing but more stability over the 1.5.6 release. Try it, report errors, wait for new release, lather-rinse-repeat. Well, you probably know the drill.

1.5.6 contains a bug in its MP3 rate correction code that causes MP3 audio streams to be written out to AVI files with dwLength=0 in the header, resulting in some players not playing audio properly. This has been fixed in 1.5.7. For those of you who have already pushed audio through 1.5.6, do not fret — the problem is easily fixed by running the file through VirtualDub again in direct/direct mode, which will cause the dwLength field to be recomputed properly.

Now for stupid shell tricks.

I got asked an interesting question today about Windows NT command scripts: someone needed to write an INI entry out to a file without extra spaces, but found that the obvious:

echo hello=1>foo.ini

doesn't work, since the shell interprets 1> to mean "redirect stdout." In a Un*x-like OS, this would be simple: backslashes are magic and solve everything. Well, Windows NT's shell isn't quite so simple, but using the shell's quote character to keep it from interpreting 1> as a single token still works:

echo hello=^1>foo.ini

But then I got asked another teaser: how to write the string "off" to a file. Again, the obvious doesn't work, due to a rather annoying special case in the echo command:

echo off

Quoting doesn't work here because it's the internal echo command that is the problem, not the shell's command parsing. In fact, echo is quite lame, because it doesn't support escapes, or any way to suppress the newline that it prints. The best that I was able to come up with off the top of my head was to abuse the filesystem:

echo x >off && dir off /b

I'm sure there are better ways, but I haven't thought of one yet.

Current build (1.5.7, stable):    [October 20, 2003]
   [bugs fixed]
   * Quick preview didn't work if the current position was
     past the number of source frames, even if the timeline
     was longer than that.
   * Added FPU guards that were missing in a couple of
     critical places (AVIFile open, codec negotation).
   * The timeline wasn't properly extended if segments were
     auto-attached by filename after edits had been made.
   * Configuration scripts saved when no file is open
     no longer alter the edit list when loaded.
   * Jobs launched from the command line used the normal
     error modes rather than those set as default by the
     user.

   [regressions fixed]
   * Filter preview buttons weren't updating the frame.
   * Save Image Sequence command would randomly produce
     an unrequested job instead of initiating the render
     directly.
   * MP3 audio streams were getting written with
     dwLength=0 if correction was enabled.
   * MPEG decoder occasionally decoded garbage into the
     video frame, resulting in sporadic block errors
     during processing.
   * A race condition in the fast write code occasionally
     resulted in spurious write errors.
   * Append AVI function was incrementing the filename
     extension instead of the name component.
   * Avisynth scripts smaller than 60 bytes weren't being
     autodetected properly.

10/13/2003 News: Codec issues

A friend of mine takes examples of bad code snippets that he finds and emails them out to a select few under the subject "Lame Code of the Week." Typically, these are code snippets have have dumb errors, like this:

if (p->doSomething()) {
    ...
    assert(p);  // make sure p is not NULL

Why do I mention this? Well, I finally got a chance to debug against that blasted "Grand Tech Camera Codec" that's been crashing all over the place, trying to decompress formats that belong to other codecs. It turns out that the validation done by its ICDecompressQuery() function is to check... the width and height. Believe it or not, it doesn't check the FOURCC. Even worse, if you call ICDecompressQuery() or ICLocate() with "any format" as the target format, the codec essentially checks... nothing. Does it accept DivX? Yes. Does it accept Indeo? Yes. Does it accept 43-bit RGB? Yes. That means it claims to be able to decompress ALL formats! As such, it is only fitting that I award GTCODEC.DLL the Lame Codec of the Week award for absolutely breaking the Windows video codec system.

This doesn't normally affect VirtualDub too badly as it first attempts to search for a codec with the same FOURCC as the compression format. Where the codec screws over applications is for formats that either (a) aren't accepted by any currently installed codec, or (b) are secondary formats that a codec handles besides its primary format, such as YUY2. In these cases if the codec search ever gets to the Grand Tech codec the codec grabs the format and then immediately crashes trying to decompress it. I'm not sure if this is something I can work around. I can rewrite VideoSource.cpp to do a manual codec walk and avoid it, but the DrawDibDraw() call implicit in the Windows video capture system is a bit more difficult. And I can't work around the problem for Avisynth or an embedded DirectShow graph. Wonderful.

Needless to say, if you have this codec installed I recommend you uninstall it.

The other codec worthy of mention is the VFAPI Reader Codec. This codec will trip the FPU warning in 1.5.5/1.5.6, apparently because it was built with Borland C/C++, which for some strange reason likes to flip the FPU to 80-bit and exceptions enabled in initialization code of DLLs it initializes. This then causes problems in other floating-point code that expects to be able to use invalid or indeterminate number values without crashing, which is ordinarily the case with the Win32 standard 64-bit/all-masked mode. MP3 codecs tend to have this problem and Direct3D, if it ever gets initialized, may also trip in its transform pipeline. This is a rather obscure problem and I'm not surprised that the author didn't catch it; I've notified the author but haven't gotten a response back yet. 1.5.6 will correct the FPU control word back to standard 027F, so odds are you won't see ill effects in VirtualDub besides a warning. For older versions or other applications, the exception that occurs is FP Invalid Operation (C0000090); the failure condition is thankfully rare so even if you do have this problem you're not guaranteed to crash.

There is a rather stupid bug in the AVI append command of 1.5.6: it increments the filename extension rather than the core name itself, so it tries to open foo1.avi1 and foo1.avi2 instead of foo2.avi and foo3.avi. This was actually in 1.5.5 as well but nobody happened to catch it during the experimental phase, and doesn't happen if the segments are implicitly attached during the first open, so nobody caught it. Sigh. If I come up with some good workarounds for above problems in 1.5.6 I might get a 1.5.7/stable out soon with a fix for the attach bug. We'll see.

10/10/2003 News: VirtualDub 1.5.6 (stable) released

Thanks to all that reported issues with the experimental 1.5.5 release -- nearly all the bugs have been fixed in 1.5.6. I'm beginning to like branched development; while waiting for bug reports to come in on the 1.5.5 release, I was able to work on the dev branch for 1.5.7 without screwing up the stable release. So unless there is a goof-up that needs to be addressed, expect 1.5.7 to be the next unstable release.

1.5.5 had a bug in that it forgot to disable the displays when "fast recompress" mode was enabled, and so it displayed garbage during rendering, because it blitted YCbCr data to the screen as RGB formatted data. So I "fixed" it in 1.5.6 by setting the format correctly so that the display code either uses a YCbCr hardware overlay or software conversion fallback, thus turning it into a feature. So now you can see the input video during a fast recompress operation, if the format is UYVY or YUY2.

I forgot to mention with 1.5.5 that many of the filename strings in script files have changed from ANSI (8-bit) encoding to UTF-8 encoding, and thus configuration files that have been saved with high-bit characters in them aren't portable between 1.5.4- and 1.5.5+. This was required because 1.5.5 can read and write files with filenames that are not ANSI-safe. For those of you maintaining front ends, information on UTF-8 encoding is available from the Unicode website. Windows 98+, 2000+ can convert directly to and from UTF-8, and for 95/NT4, the conversion between UTF-16 and UTF-8 is straightforward. All of the UTF-8 characters are escaped using \x notation and therefore the script file itself remains ANSI-safe.

I've been playing Final Fantasy Tactics Advance for a while now and have come to a conclusion: it's nicer, but easier, than the original FFT. The AI isn't as good in FFTA and it's a lot easier to do huge amounts of damage. Also, unlike FFT, in this game you get awarded XP even for worthless actions, such as curing someone who's already at max HP. I haven't decided whether this is good or not. On one hand, I'm walking into battles with screwed up parties and basically stomping all the enemies effortlessly; on the other hand, I'm not spending hours at a time throwing rocks between party members for JP. I've only lost one battle -- the main character accidentally dinged the last enemy with a sword when swords weren't allowed, got sent to jail, and thus ended the game. Baka.

Current build (1.5.6, stable):      [October 10, 2003]
   [features added]
   * Added support for YV12 during fast recompress.
   * Input video is now displayed during fast recompress
     for UYVY and YUY2 modes.
   * Video display updates are now suppressed for panes
     that are totally hidden.

   [bugs fixed]
   * MP3 rate correction was correcting dwRate but not
     interleaving, causing some problems for embedded
     decoders.  The interleaving rate is now adjusted
     on the fly.  Note that MP3 correction is still not
     enabled when segmentation is active.
   * Time base for the position control was improperly
     affected by the "convert to fps" option.
   * "Box blur" filter was broken on CPUs without MMX.
     Dumb (Dumb compiler bugs....)

   [regressions fixed]
   * Menu cleanup: removed synchronous blit and histogram,
     fixed vertical layout and pane swap.
   * Fixed crash when loading some job configurations from
     1.5.4 and below.
   * Fixed audio displacement not working in simple audio
     pipeline with forward offset.
   * Adjusted Z-order of status bar relative to panes.
   * MPEG-1 decoding was broken on platforms with MMX
     but without SSE2.
   * Plugin code could crash if no plugins were installed,
     particularly under Windows NT 4.0.
   * Video codec code occasionally named the wrong codec
     when reporting video format corruption during codec
     arbitration.
   * Corrected DCT coefficient pruning in MJPEG decoder.
   * Fixed crash when WAV open fails.
   * Delete was producing invalid subsets in some cases.

9/30/2003 News: VirtualDub 1.5.5 (experimental) released

VirtualDub 1.5.5 is out on SourceForge and is the first version that I've explicitly tagged as "experimental." The primary reason is the new display code — 1.5.5 is the first version to use DirectDraw by default, the result of which is a significant increase in rendering speed as well as a usable stretch. (Right-click the panes for the new options.) 1.5.4 is pretty stable at this point and as such it's a good idea to split versions into stable and development releases. So please try 1.5.5 and report the problems, and if you have problems, use 1.5.4. 1.5.5 does add a few more workarounds for various problems as well as some optimizations for direct stream copy mode, so if all goes well it should work better than 1.5.4.

1.5.5 allows audio filters to be plugins, but I haven't completed the audio filter SDK yet and I'm not sure I like the current API. For those of you that want to experiment with it, there is a new "samplefilter" project in the VirtualDub source code. (Contact me if you want the preliminary SDK.) Keep in mind that the API is still fluid and I'll probably nuke this API version in the future.

There is no P4 version of 1.5.5, and there may not be P4 versions of subsequent releases. The reason is that Intel C/C++ 6.0 started miscompiling parts of the code base starting with 1.5.5 in ways that can cause crashes and/or heap corruption — specifically, in some exception handling contexts the generated code double-destroys objects. This then causes string objects to trash memory. As I'm increasingly moving toward dynamic strings instead of fixed-size buffers for text handling I've decided that I cannot afford possible instability in my P4 releases in exchange for the minor performance gains provided by the Intel compiler. The code base still compiles under Intel C/C++ however, for those of you that want to try.

A couple of weeks ago, I posted a longish essay on the various display methods I was considering. The only options that are enabled in 1.5.5 are GDI and DirectDraw, because I ran into some unexpected problems with NVIDIA OpenGL drivers and switching between rendering contexts — sometimes nvoglnt.dll would spin for up to 10 seconds at 100% CPU when creating the second context. This is somewhat disappointing as I can control filtering with OpenGL and not with DirectDraw. As a matter of curiosity, I tried Microsoft's new GDI+ API to check out its image interpolation. The GDI+ people did a great job with the scaler's subpixel accuracy; it's both subpixel accurate and smoothly filters at all sizes, when decimation filtering is requested. Pity it's somewhere between one-fifth to one-half the speed of VirtualDub's scaler, which makes it useless. Apparently, GDI+ doesn't have a hardware DDI of its own and makes use of the regular GDI DDI, so the vast majority of its options are emulated in software. You know, all I want is a simple API for a hardware accelerated stretch blit without BS like lost surfaces and having to do basic pixel conversions myself. I'm still waiting.

I've also released version 2.4 of my subtitler filter on the filters page, which fixes a minor error in shadow address calculation that could cause a crash. It's been more than a year since I worked on it and I don't know if anyone still uses it, but I pushed 2.4 out in case anyone was. I've heard of an expanded SSA format that someone coined Advanced SubStation or a similar name. Personally, I would have picked a name with a slightly different filename extension.

Rewatching the anime series "Martian Successor Nadesico" was a new experience for me. In particular, I'm convinced that Vandread should be renamed "Nadesico: The Next Generation." I also thought the ending sucked, which is why I sought out fanfics to close the gap, most of which didn't help. Fortunately, I found "Magical Girl Pretty Ruri," which is over 500K of sarcastic Ruri baka-goodness... but now I have to wait for episode 23 of it. -_-;

Current build (Version 1.5.5):
   [features added]
   * Partial Unicode support -- you can now open and save
     files using Unicode filenames.
   * "Chroma smoother" video filter refilters point-sampled
     chroma with linear interpolation.
   * Single-stream cut & paste.  (Be patient....)
   * Improved performance of AVI parser, particularly for
     Direct mode streaming.
   * Improved performance of bicubic upsampler.
   * Audio filter graph now shows intermediate audio
     formats on connections.
   * Audio filters can now be plugins.
   * New MPEG-1 video core (Meia) -- full vertical
     clipping.  Horizontal clipping is still by macroblock.
   * Rewrote display code -- DirectDraw support is now
     automatic.
   * Log windows now have a context menu for clearing,
     copying, and saving the log text.
   * Modified AVI2 indexing to relax indexing restrictions
     somewhat, although it's still not user configurable
     yet.

   [bugs fixed]
   * Hex editor occasionally displayed the wrong data after
     a find or save command.
   * "Attach extension" option didn't work for signpost
     save dialog.
   * Fixed crash when I/O errors occur during a processing
     operation, and then occur again when attempting to
     gracefully finalize the partial output file.
   * "Clear" didn't work in audio filter graph.
   * Fixed I/O errors when attempting to push audio forward
     with advanced audio filtering enabled.
   * "Go to" command didn't handle timestamps with frac-
     tional seconds that only had 1 or 2 decimal digits.
   * "General convolution" generated bad code for factors
     of 2, 4, and 8 when dynamic compilation was enabled.
   * Interleave periods of zero are no longer allowed.
   * Added workaround for crash or hang when compressing
     with the "3ivx D4 4.0.4" video codec.
   * Fixed non-interleaved save mode and made it cooperate
     with segmentation.
   * Added workaround for heap corruption when processing
     audio in advanced mode sourced from some versions of
     Avisynth.
   * Clarified DivX warning to note that it doesn't apply
     to the DivX 4+ codecs.
   * Fixed filter cropping not working properly when
     "motion blur" was the first filter in the chain.

9/25/2003 News: DivX 5.1, 3ivx, and plugins

My announcement for the week is that I will not be supporting the use of DivX 5.1 in any way, shape or form in VirtualDub. That doesn't mean you can't use it or that VirtualDub will prevent you from using it, but merely that I won't bother answering questions about issues with the use of the codec, and any email about the use of DivX 5.1 with VirtualDub will be dumped into the trash. The reason is the following:

I have never been a big fan of so-called "protection" wrappers, primarily due to technical reasons. However, putting one into a userspace driver is one of the dumbest and rudest ideas I can think of. Here are the problems:

I can't debug VirtualDub using the DivX 5.1 codec. Remember that B-frame glitch that was causing VirtualDub 1.5.3 to loop infinitely at the end of processing operations? Can't use the debugger if the DivX 5.1 Pro codec prevents it.
I can't debug VirtualDub AT ALL while the codec is installed, because the DivX codec's "protection" triggers on load, even in the Free driver. Both the video codec search and video compression dialogs trigger it even if I'm not trying to use the DivX codec.
I can't debug my other programs either, because if I hover over an AVI file in an Open Dialog, Explorer loads video codecs to try to display a tooltip about it, and the DivX codec terminates my program.
The "protection" dialog has no indication that it comes from the DivX codec, has the client application's icon on the taskbar, and when you click OK, the codec calls ExitProcess(0) and terminates the app.

I can live with games and applications that won't let you launch them under a debugger, but when it comes to a driver that keeps me from using Visual Studio in general, the driver gets uninstalled. Immediately. It is a waste of time for me to verify any compatibility issues with DivX 5.1 if I have to deal with this crap and I refuse to do so. It looks like DivXNetworks is considering adjusting or removing the "protection" in their next release; please encourage them to do so and resume working on the codec itself, which is what people actually care about.

Now that I've said that....

I finally looked into the strange divide-by-zero crashes with the 3ivx D4 4.0.4 codec. The problem is that the codec isn't clearing MMX state properly before returning from ICCompressBegin(), causing VirtualDub's floating-point calculations to screw up. It occurs more often in newer versions because I recently rewrote the interleaver to use FP rather than integer math; however, it can still cause filters to malfunction in older versions, particularly the subtitler. As such, I do not recommend that you try using 3ivx with VirtualDub at this time. I have contacted 3ivx about the matter and they say it will be fixed in the next version of their codec; in the meantime, I have a change in my development tree that will work around the problem in 1.5.5 if the updated codec isn't out by that time. The same problem and workaround also applies to the Windows Media Video 9 prerelease beta codec, but anyone using that should upgrade to the final release, in which Microsoft has already fixed the problem.

A few days ago I looked at the filter SDK I currently have posted and concluded that even though I am a native English speaker, no one could tell that by reading the SDK. As such I have decided to rewrite it, as well as rework the API headers to a somewhat more sane form that doesn't require massive pointer hacking to push pixels. Writing documentation takes an awful long time, and it's no surprise that many programmers don't bother with it at all, especially when you consider that incorrect documentation is in some ways worse than no documentation. This will get worse when I export the audio plugin API, which introduces multithreading into the mix. I design my APIs for longevity — filters dating back to VirtualDub 1.2 are still valid — so I'm hoping that my new APIs remain relatively simple and easy to program for. We'll see, I guess.

9/14/2003 News: Back to normal (long)

Everything's calmed down a bit here... I'm on vacation now, SoBig.F expired so I have my Inbox back again, and it's not as blisteringly hot as it was a month ago. That gives me some time to attack The Legend of Dragoon and Final Fantasy Tactics Advance. Oh, and I could work on VirtualDub too. (grin)

[Warning: Long technical brain-dump ahead.]

VirtualDub's display code is a bit dated and I've been working on rewriting it to support resizing and bilinear filtering, as well as more speed. Playback mode already had some acceleration features, but the main edit mode had only the lamest support for stretching, and the filter preview window couldn't be stretched at all. There are currently five ways I could implement for blitting stretched images, and none of them are particularly complete. Frankly, it amazes me that stretching an image onto the display still requires this much effort. Timing statistics below are done on a P4 1.6GHz laptop with an NVIDIA GeForce4 Go 440, with preliminary 1.5.5 display code.

Win32 Graphics Device Interface (GDI).
Ubiquitous and the most reliable; this is VirtualDub's fallback in all cases. Vendor drivers for GDI are mostly very stable and functional at this point, which is good; unfortunately, there are two problems. One is that GDI provides extensive software fallbacks which are not very fast. In fact, they're quite slow, and how much they're used varies widely between drivers. Many drivers implement hardware color conversion, but only a few implement hardware stretchblts. (A WinHEC presentation from a few years back indicated that a number of vendors tried accelerating StretchBlt() and ended up failing WHQL tests because they got texel alignment wrong.) ATI's GDI drivers are particularly good in this department and a BitBlt() on a RADEON will often beat a DirectDraw rendering path; Matrox may be good here as well but I've never had one.

The Windows NT implementation of GDI is significantly more powerful than the 9x implementation -- it can do a filtered stretch if you call SetStretchBltMode(hdc, HALFTONE). The subpixel precision is poor, as it appears to perform a point-sampled stretch and then do a low-pass on top, and the speed is even worse than a normal DIB StretchBlt(). However, it's not bad considering that you only have to add one line to enable it, and in the time of NT 3.1 I imagine it was very high quality compared to anything hardware could do.

The upcoming "Longhorn" release of Windows is supposed to have a next-generation GDI that is based on top of Direct3D and will provide much better performance and capability than the existing GDI. I just hope the API isn't a mess like DirectX and that it isn't going in the same direction as the rest of Longhorn. "Managed Explorer" was not what I wanted to hear, after my experiences with Visual Studio .NET.

On my dev machine, a 1:1 GDI blit of a 320x240 image takes about 1.2ms, and a 1.01:1 stretchblt takes 11ms. That 10:1 ratio on stretches is a killer -- taking 1/3rd of the frame time on a P4 1.6GHz is a bit excessive!
DirectDraw offscreen surface blit
DirectDraw is very much a self-serve API in that you have to handle a lot of device abstraction yourself, but of the few features the hardware emulation layer (HEL) will emulate is stretchblt, and it does so at much faster rates than GDI. DirectDraw does not support color conversion in any way, however, which means that in some cases it can be beaten by a well-optimized GDI driver for 1:1 blits. The DirectDraw API is also much less simple to use than GDI because you have to create about a dozen objects, check pixel formats, check for lost surfaces, check for failed lock calls, etc. DirectDraw doesn't give you control over whether a blit is filtered; generally only 3D chipsets will do so and it looks like DirectDraw uses integer coordinates internally so the blit warps when it is clipped and sliced into subrects. Finally, DirectDraw doesn't cooperate very well with multiple threads; my current implementation simply thunks all DirectDraw blits down to the UI thread because doing anything else is likely to break.

On low-end systems whose blit probably consists of rep movsd anyway, it's probably even faster to write directly to the primary surface than to blit through an offscreen one. But this involves considerable complexity given that to obtain best write performance you need to write 64 bits at a time to VRAM, which is a pain with 16-bit or 24-bit pixels, and you cannot write unaligned to VRAM (doing so can lock the system if VFLATD is active). Clipping also has to be done manually. Handling the fixups at the beginning and end of scanlines while doing color conversion and a stretchblt is not much fun.

DirectDraw blitting is slower than GDI for 1:1 at around 2.4ms -- but it stays 2.4ms even when stretched to 1600x1200. Hardware acceleration is good!
DirectDraw overlay surface
DirectDraw overlay surfaces are essentially giant hardware sprites and have two big advantages over offscreen surfaces. Overlays are generally done via special scanout hardware rather than a generalized blit engine, so you can get bilinear filtering and fast, large stretching even on low-end hardware. (Bandwidth requirements are reversed for an overlay stretch compared to a blit stretch because the higher the stretch ratio, the less often source pixels have to be fetched relative to scanout rate, and the result is never written back to the framebuffer.) The second advantage is that the overlay generally accepts YCbCr data instead of RGB, and most modern codecs work in YCbCr, allowing software color conversion to be skipped.

Overlays do cut some corners compared to blit engines, though. The filtering is sometimes not as good as the 2D or 3D engine; NVIDIA TNT/TNT2s cannot filter vertically with some driver revisions and many video chips don't fully upsample chroma. Sometimes the luma will have excessive contrast as well, which is annoying. Overlay hardware almost always supports the two primary YCbCr formats, YUY2 and UYVY, but rarely support RGB. And finally, on most hardware you only get a single overlay, which means they can't be relied on for general image display. One notable exception was the Tseng Labs ET6000, which was probably one of the last PC video cards to use a display list and could support as many as three overlays at once.

Overlays beat GDI BitBlt() slightly at 0.9ms instead of 1.2ms. That's cheating, however, because UYVY is 16 bits per pixel instead of 32 bpp, and thus only half as much data is being uploaded to the video card.
OpenGL™
I like OpenGL -- it's a well-designed API with a well-written specification. Seeing as though image stretching is a subset of texture mapping where U and V are constrained to X and Y, respectively, it seems perfect for this task. A bit of groundwork has to be done here in that the image has to be broken down into overlapping textures, but any respectable 3D card (read: ATI or NVIDIA) is going to support OpenGL 1.2 packed pixels and hardware mipmap generation. With hardware mipmap generation, we get trilinear filtering, which means no more aliasing when shrinking. And unlike DirectDraw, filtering can be forced off when it's not wanted. Using the 3D pipeline gives you other niceties, such as free brightness/contrast adjustment (modulate2x + add specular) and free dithering.

Did I mention I like OpenGL? I like OpenGL. Go learn OpenGL! :)

One downside to OpenGL is that it doesn't support YCbCr textures, so color space conversion can't be performed in hardware. Another is that there are some really bad consumer-level OpenGL implementations out there from the early days of 3D on PCs (so-called "QuakeGL" implementations). A couple of years ago, most of the OpenGL drivers still hadn't gotten texture conversion correct and were frequently swapping red/blue or thresholding alpha incorrectly; NVIDIA was noticeably ahead of the game in this department although it got a few wrong too. One vendor's implementation was so bad that it set depth mask off on init, fouling up depth clears unless you called glDepthMask(GL_TRUE), and would blue-screen the machine if you did an empty glBegin()/glEnd() pair with no verts. These problems have been mostly cleared up now but occasionally you still hear about someone's glTexGen() goofing up. And glDrawPixels() still sucks on consumer drivers.

OpenGL is the same speed as DirectDraw at 2.4ms. However, it doesn't clip funny when other windows are on top, and the coding is a lot more enjoyable.
Direct3D®
Direct3D, or rather, DirectX 9 Graphics, doesn't have much of an advantage over OpenGL for an operation as simple as image stretching. I thought briefly about whether it'd be possible to abuse pixel shaders to do bicubic resampling, but it'd require something like 32 texture fetches and I don't have any ps2.0 capable hardware anyway. The one major advantage DX9 should have over OpenGL for image blitting is StretchRect(), since it could do hardware color conversion during the stretch. (NVIDIA's OpenGL drivers do color conversion in software on texture upload.) Unfortunately, I don't seem to have any hardware that supports this, and even if I did, I'm not sure I would want to put up with the stupidity of CheckDeviceFormatConversion(). The same goes for YCbCr textures -- I can only use them if I use the reference rasterizer. Yay.

This rendering path is the one path I haven't coded yet and probably won't code at all. The API is not generalized; there are large parts of the API that can be readily identified as "the NVIDIA part" and "the ATI part," and other parts that have no formal spec other than "probably like OpenGL." You're supposed to check caps bits for available functionality, but some of the caps bits are so basic that the 3D device is useless if they aren't set, others have never been set by any device other than the reference rasterizer, and of the rest you can't easily tell which ones are usable because no vendors publish their caps bits. The best part of all is that when your display is switched out, either due to Ctrl-Alt-Del or a full-screen exclusive app starting, DirectX goes nuts. All of your textures are instantly dumped into oblivion, your calls fail and you can't do anything, and you start hitting driver and kernel bugs en masse, such as a vertex buffer lock succeeding but giving you an invalid pointer. And finally, requiring that the global application FPU precision be kicked down to single precision (24 bits) for performance is ridiculous. This is enough hassle for a game, and more than I want to deal with for video displays in VirtualDub.

I don't have any numbers for Direct3D because I don't have a D3D path written. I don't expect they would be significantly different from OpenGL, however.

On a passing note, I've been watching the latest developments in the NVIDIA<->ATI war with some amusement. All I'll say is that the current GPU situation bears an awful lot of similarities to the Pentium 4 vs. Athlon race; hopefully it'll continue in the same fashion, where both companies periodically kick each other in the rear and consumers get ridiculously fast and cheap hardware out of it.

8/20/2003 News: Temporarily absent

Life outside VirtualDub has been very busy for me lately, so I haven't really done much hobby coding lately. That isn't to say that I've neglected Vdub completely.... well, you've probably heard this excuse before. What's new this time is that my email has practically been made useless by the new strains of worms that have been making the rounds. Thanks to the miracle of Microsoft Outlook's address book, I've been getting nailed nonstop by 100K viruses and bounces from idiotic mail servers that don't know about From: spoofing viruses, to the tune of about 10-20x the volume I was previously getting. I've temporarily lowered my maximum email size to 50K and raised my Inbox limit to 50MB in an attempt to keep the crap from blocking legitimate email, but this is becoming difficult with 100 million styles of mail delivery failures and you may have severe problems getting through to me. For VirtualDub questions, the forums are currently a better way to get help. Please try to search the forums first and DO NOT send me a PM with your question as well. I hate people who don't do due diligence before spamming their question across eight forums and my email address.

A recap of some current issues, since I'm too tired to update the KB right now:

The Creative MP3 codec is responsible for the frame 9995 hang problem -- due to the way that audio codecs are enumerated in Windows it can activate even though you think you're choosing a different MP3 codec. Disable it or lower its priority in Control Panel, Multimedia.
"Grand Tech Camera Codec" may conflict with the DivX codec and cause crashes attempting to decode DivX material. Disable it if you have problems and it shows up as the offender in VirtualDub's crash context dump.
The "Lame ACM 0.9.1 (stable)" codec distributed in the Nimo codec pack has a habit of crashing during Windows audio codec searches. Guess what the solution is!
Non-interleaved audio is currently broken. Oops. Will fix later.

6/19/2003 News: Optimization

I've been told that VirtualDub 1.5.4 doesn't process files in direct mode as fast as 1.4.13 does, because it drops to non-streaming AVI read mode. This isn't surprising as the 1.5.4 video and audio pipelines are decoupled and thus tend to drift farther apart during operation, leading to the AVI layer disabling streaming due to a high cache miss rate. After adding pipeline balancing code as an attempted fix, I profiled the app under Intel's VTune Analyzer and discovered a different problem: when processing a highly compressed file in direct mode, the highest CPU hogging function in the app is _alldiv! _alldiv is the Visual C++ 64-bit divide function. Apparently, VC6 can't convert signed 64-bit divides by constant powers of two into adjusted shifts, like it can for 32-bit ints. (For that matter, neither can VC7 or VC7.1.) This just goes to show that when optimizing an app, the correct route is always profile, profile, profile! Expect Direct mode throughput for highly compressed video files to be significantly better in 1.5.5.

Someone asked me recently if full-scene antialiasing (FSAA) on a 3D card could be used to improve deinterlacing quality. Sorry, no. The reason why FSAA improves visual quality is that polygon space has infinite resolution due to precise triangle primitives and texture interpolation, and you can always improve the quality of triangle edges by sampling more. That is not the case with deinterlacing where your input and output sample resolutions are the same (and finite). I should know, since I've written software 3D rasterizers and know intimately how triangle rasterization and supersampling work. If you don't believe me take a look at VirtualDub's About dialog. :)

The problem with 3D programming is that you spend half your time getting anything to draw on screen at all and the other half figuring out who left alpha test enabled.

Many of you have had trouble compiling VirtualDub 1.5.x due to errors in <vd2/system/zip.h> and the OpenRaw function. The problem is this gem in some versions of Microsoft's winbase.h:

#define OpenRaw  OpenRawA

To work around the compilation errors, rename VDZipArchive::OpenRaw() to VDZipArchive::OpenRawStream(). Note that unless you are using Visual Studio .NET, you will still need to update your Platform SDK headers as the ones that come with VC6 are quite old.

5/28/2003 News: VirtualDub 1.5.4 released

First, thanks to all of you who sent various Windows ports of ls in response to my last post, but it really wasn't necessary.

1.5.4 is out, and is another quick bugfix release -- so grab it and pound on it. Actually, as I type this, I haven't yet uploaded the SourceForge download page yet, so if you're too fast, go play a game for an hour or something until I get it up. The main changes are a fix for a thread race condition and a workaround for the hang at the end of a 2-pass DivX 5 operation. Also, I threw in bitrate calculation for AVIs under File | File Information. If it still doesn't work... well, I guess we'll just try again.

"Regression testing"? What's that? If it compiles, it is good, if it boots up it is perfect.
—Linus Torvalds, right before release of Linux 2.1.94

I just got my Visual Studio .NET 2003 upgrade today, and although I was expecting to get a CD in an MSDN-Library-style package, I got a heavy box that was essentially the same as the full .NET 2003 package, except the box was uglier and the CDs were upgrade only. Fortunately, the .NET 2003 upgrade allows you to install 2003 without 2002 installed first, as long as you provide the 2002 CD for a moment. (I was amused by the .NET 2002 installer, which asked me to register after it had finished the uninstalling the product.) The IDE looks largely the same, with the same ugly flat style and a tendency to overrefresh the solution tree, but the build dependency check appears to run much faster. I don't know if they've fixed the butt slow output window yet. I haven't dug much into the compiler yet either, but at the very least it generates much better code for MMX/SSE/SSE2 intrinsics, finally making them useful.

One more note: if anyone has purchased a copy of video software called "Luxuriousity Video," please drop me an email at phaeron (aht) virtualdub (daht) net. I have a question to ask.

Build 16296 (Version 1.5.4):
   [features added]
   * Added workaround for infinite B-frame delay
     interaction with DivX 5.0.5 Pro.
   * File information for AVI files now shows estimated
     bitrate.

   [bugs fixed]
   * Fixed race condition in processing pipeline shutdown
     that was more likely to occur in Windows 95/98.
   * Key frame markers were getting written on some drop
     frames when upsampling the video stream.
   * MPEG code was issuing warnings whenever decode time-
     stamps were more than 0.62s apart; this has been
     fixed to use the actual spec limit of 0.7s.
   * Operation couldn't be aborted while B-frame lag
     frames were being flushed at the end.
   * Fixed a bad error message that displayed a bogus
     filename.

5/20/2003 News: VirtualDub 1.5.3 hopefully not buggy

Releasing VirtualDub is an interesting process. SourceForge and my web account on pair are Unix-based, while my development environments are Windows-based. Given that I have to juggle four machines during the release process, it's guaranteed that this happens at least once per release:

C:\p4root\dev>ls -l
'ls' is not recognized as an internal or external command,
operable program or batch file.

&#*($^)@*(&%*.

Thanks to some detective work by some forum members, it's fairly apparent now that the culprit of the famous frame 9995 hang bug is the Creative MP3 codec. If you are experiencing this problem, drop the priority of the Creative MP3 codec in Control Panel so that the Fraunhofer codec activates instead. (In Windows 95/98, this is done under Multimedia; in XP, go to Sounds and Audio Devices, Hardware, Audio Codecs, Properties. I don't remember where it is under 2000.) I don't have a Creative card that does MP3 assist, so I can't verify the problem myself, but I've seen enough reports now that I'm fairly certain of the diagnosis. The underlying problem, though, is that although you can have several MP3 codecs installed, you may end up using a codec other than the one you chose because the audio compression dialog only remembers what format you picked and not which codec that format came from. So even though you picked MPEG Layer-3, you can actually end up with a format from plain old MP3. That's true of both the standard Windows codec dialog (acmFormatChoose) and VirtualDub's custom dialog. I can fix the latter so that it records the ID or name of the codec as well, but I figured I'd warn you since the mixup can hit other programs too. You're better off toggling drivers so that only MP3 codec is active at a time.

And for the last time, MP3 stands for MPEG audio layer III, not MPEG-3. There are both MPEG-1 and MPEG-2 variants of audio layer III.

1.5.3 is out and is mostly only bug fixes. The known regressions in 1.5.2 have been fixed and I even found some bugs from the 1.4 branch that no one noticed. I had to rip out and redo some algorithms that didn't work out in 1.5.2; in particular, that build tried to dynamically guess which frames were going to be pushed in the future, and although 1.4.13's algorithm wasn't totally correct, it was better than 1.5.2's. So 1.5.3 simply computes a static reverse frame map at the start of the operation. This consumes 16 bytes/frame, but if you're processing a 100K frame file I assume you can spare 1.6MB for the frame map. (It swaps well anyway.) The reason for the complexity is that VirtualDub tries hard to allow frame skipping in Direct video mode even though technically it can't be done exactly, and this has to happen in the middle of a multithreaded pipeline with audio interleaving active. What happens is that frames get pulled sequentially from a key frame until the next key frame is available. So resampling from 30fps to 25fps on an MPEG-4 stream that has key frames every 1000 frames is likely to produce badly desynced output, but on an Indeo 5 stream with key frames every 15 frames it'll almost be perfect, and on a Huffyuv stream it'll pull exact frames. You can also now upsample a stream to a higher frame rate in Direct mode, which is always exact. Since the upsampling works by inserting drop frames, it's virtually free (~24 bytes/frame) space-wise and even allows the player to drop the duplicates since it knows the frames are dupes.

As it turns out, one of the "features" in this release is actually a bug. People have asked me to hook the spacebar to playback and stop. Well, hooking it to playback is easy, but hooking it to stop is a problem because the processing mode message loop is actually a separate modal loop that doesn't have access to keyboard accelerators. (I could just shove it in via a global, but that'd be so 1.2.) So I only implemented the playback command, thinking I'd solve the stop for later, and I discovered that space already worked for stop! The reason is that inline playback spawns a visible status dialog first and immediately hides it, so it takes the focus and its default button is Abort... which responds to both space and Enter. This works fine unless the window focus changes. Needless to say this is lame and I need to implement stop properly for a later release, but it's amusing that the bug worked out this way.

Build 16249 (Version 1.5.3):          [May 19, 2003]
   [features added]
   * Added preview input/output commands to menu and
     accelerator tables.

   [changes]
   * Tweaked job control behavior for jobs that complete
     with warnings, to be a bit more intuitive.

   [bugs fixed]
   * Frame marking didn't always mark the correct range and
     could cause "Scan for bad frames" to fail.
   * Fixed invalid batch scripts produced when video codec
     has a config struct larger than ~6K (Windows Media 9
     VCM).
   * Status markers were being logged as warnings in jobs.
   * Conversion to a higher frame rate produced amusing
     results in Direct video mode.  It now produces source
     frames interspersed with drop frames for nearly zero-
     cost point upsampling of video.
   * Dubber pretended there was a input-to-output lag if
     such filters existed in the video chain (temporal
     smoother), even if the filters weren't active.  This
     resulting in duplicated frames (fast/normal) or
     erroneous zero-byte keyframes (direct) at the end of
     the output.
   * Arbitrary framerate conversion option wasn't disabled
     in the UI when IVTC was enabled (the two are mutually
     exclusive).
   * Edit point seeks (<, >) didn't update the frame
     windows.

   [regression bugs fixed]
   * Fixed pipeline not getting flushed at end of
     operation, resulting in some frames getting lost.
   * Fixed subset code pulling in wrong frames in direct
     mode.
   * Interleaving values were inverted and thus always
     forced one-per-frame.
   * Position slider wasn't updating properly around cuts.

5/10/2003 News: VirtualDub 1.5.2 buggy

I'm sure this is not news to many of you, but I figured I should note it now since I'm in the middle of crunch time at work and don't really have the time to address this immediately.

Essentially, the 1.5.2 processing pipeline has two major bugs in it: it doesn't always flush all the frames out of the pipeline before it finishes, thus cutting some frames off at the end, and the mapping from output frame to source frame is incorrect when Direct video mode is used and frame segments have been deleted. This basically means that 1.5.2 is not stable for production use and you should only use it for testing or experimentation until I release 1.5.3. If you do not have 1.5.2, it is available in the "previous versions" section at the bottom of the download page on SourceForge. However, please do test 1.5.3 for bugs that I haven't heard of, as I want to squish as many bugs as possible! This is the current changelist for 1.5.3, which has not yet been released:

Current build (Version 1.5.3):
   [changes]
   * Tweaked job control behavior for jobs that complete
     with warnings, to be a bit more intuitive.

   [bugs fixed]
   * Frame marking didn't always mark the correct range and
     could cause "Scan for bad frames" to fail.
   * Fixed invalid batch scripts produced when video codec
     has a config struct larger than ~6K (Windows Media 9
     VCM).
   * Conversion to a higher frame rate produced amusing
     results in Direct video mode.  It now produces source
     frames interspersed with drop frames for nearly zero-
     cost point upsampling of video.

   [regression bugs fixed]
   * Fixed pipeline not getting flushed at end of
     operation, resulting in some frames getting lost.
   * Fixed subset code pulling in wrong frames in direct
     mode.
   * Interleaving values were inverted and thus always
     forced one-per-frame.
   * Position slider wasn't updating properly around cuts.

I know the Knowledge Base is very much out of date, but I figure given a hard choice between my documenting bugs and actually fixing them, you'd prefer the latter. Also, apparently some of you haven't heard of the term "regression"; it means reverting to an earlier, lesser state. In quality assurance (QA) testing, it refers to the recurrence of a bug or undesired behavior that was previously fixed/improved in an earlier version. So regressions are new to recent versions, whereas other bugs could go as far back as 1.0. Regressions are most likely to happen while fixing other bugs, and thus it's very important in large scale projects to track bug history and make sure the code actually goes, well, forward. As it turns out, I don't have much of a regression test plan, and the test file I frequently used for 1.5.2 (Vandread 1st season OP) has a lot of repeated frames at the end. Oh well.

In other news....

Interestingly, the Windows Media group at Microsoft has released a beta Video Compression Manager (VCM) codec for Windows Media Video 9 that interoperates with Video for Windows based applications. It works with VirtualDub, with the exception of batch mode due to a bug on my part (see above). The interface is a bit (ahem) familiar, but it contains all the options you would expect from a modern codec, including CBR/VBR and 1-pass vs. 2-pass. (And they spelled my program's name correctly on the web page!!!) Looks like another toy to tinker around with, at the very least.

4/30/2003 News: VirtualDub 1.5.2 released

Adding more features adds more code and thus adds new bugs.
--Andrew S. Tanenbaum, Modern Operating Systems

Feature-wise, 1.5.2 is a minor release, with fixes for a couple of regressions in the 1.5.x series, as well as some fixes for AVI format incompatibilities with other programs. One new feature is logging, which means VirtualDub now notifies you of issues that it used to silently correct. Another is error control -- you can now tell VirtualDub to attempt to work around decode errors rather than bombing the whole operation. Of course, if a codec crashes, the operation will definitely stop anyway. Finally, you can now convert to any other frame rate, the so-called "fractional decimation" people have been asking for. This feature can actually target any exact AVI rational frame rate, but the UI only allows you to enter the frame rate in ten-thousands of fps. Edit a script directly if you want to hit a specific value. The temporal resampling is point-sampling, so expect some jerkiness if you attempt, say, an NTSC-to-PAL conversion.

Internally, the code has been changed significantly, and for this reason 1.5.2 is more of an experimental release than usual. Specifically, 1.5.2 is the first version to use separate audio/video pipes and a pull architecture rather than a push architecture. This removed a lot of the cruft in the code related to interleaving and spilling, and in particular should make split segment output more reliable (once the bugs have been shaken out). When the old spill code failed, the result was that nice bug where everything ground to a halt and the program sat forever at 0 fps. (This is not the same as the 9995 frame bug, for which the current theory is a specific faulty MP3 codec -- but reports are still all over the place on this one.) The new code is simpler and also doesn't produce funny interleaving when a delayed-frame codec is active (DivX or XviD in B-frame mode). It's also absolutely guaranteed to piss somebody off by breaking something that used to work, but that's the price of progress.

Additional note: the 1.5.2 source archive contains the HTML compiler that I used to build this website, Lina. Its documentation is very terse and its usage somewhat cryptic, but there has been some interest expressed in the past and I note its release here for anyone wanting to mess around with it.

As it turns out, VC7.1 beat both VC6 and VC7 by a longshot -- it managed to ICE with C1001 before I even received my update disc. I discovered that the .NET Framework SDK 1.1 comes with the Standard edition of the VC7.1 compiler; this version is useless for actual development as it is missing the C++ libraries as well as the optimizer (castrated code generator), but it has the full parser. The very first code fragment I tried was the one I posted last time and... ICE in ehexcept.c. Sigh. Well, at least I know Microsoft still does their builds on their f: drive. I once frightened several coworkers by deliberately crashing the compiler on the build machine and reading the resultant C1001 message to determine if its copy of VC6 had been patched to Service Pack 5. I've always wanted to write a "Service Pack detector" for Visual C++ by deliberately using pieces of code that crash the various builds of the compiler and using #line directives to print out the detected service pack level, but I've never gotten around to it.

Build 16188 (Version 1.5.2):          [April 30, 2003]
   [features added]
   * Converted help from WinHelp to HTML and updated dialog
     help to current feature set.
   * Added frame rate conversion to arbitrary frame rates.
   * Added logging to report non-fatal warnings during
     operations.
   * MPEG parser detects and reports timestamp
     discontinuities.
   * Added limited error concealment capabilities to input
     handlers.
   * Optimized audio filters a bit and added tap count
     control for lowpass, highpass, and resampling filters.
   * Added "new rate" audio filter to relabel an audio
     stream with a new sampling rate without resampling.
   * Incomplete audio format headers that are rejected by
     ACM MP3 codecs are automatically fixed with the
     required fields (the infamous "tag 0055" problem).
   * Added workaround for AVI1 files with MP3 audio being
     detected as MP3 files by Windows Media Player 8.

   [features removed]
   * Removed coach dialogs.  Not helpful enough and too
     outdated to maintain.

   [bug fixes]
   * Fixed capture free space indicator being limited to 4GB
     under Windows 98 (regression in 1.5 series).
   * Fixed crash when job queue could not be flushed to
     disk.
   * VDFs that contained multiple filters were only showing
     the last filter in the library (regression in 1.5
     series).
   * Fixed crash when attempting to direct copy a video
     stream with an abnormally large BITMAPINFOHEADER (>16K).
     Added code to detect and correct such mistakes.
   * Fixed hang in audio filter graph editor when placing
     output filter with autoconnect on and no place for it
     to attach.
   * Fixed livelock at end of operation when lowpass/
     highpass audio filters were in use.
   * Fixed internal error when attempting to start an
     incomplete audio filter graph (unconnected pins).
   * Fixed garbage wLanguage/wPriority values being written
     to audio AVI track headers when converting an MPEG-1
     file.
   * Fixed crash when attempting to load an AVI stream with
     an invalid sample rate (zero or infinite).  Added code
     to guess and substitute a reasonable value.
   * Fixed small memory leak in "smoother" video filter.

4/27/2003 News: "LAME MP3 Codec v0.9.0 - 3.93 (stable)"

I've been receiving a lot of crash reports of the following form:

An out-of-bounds memory access (access violation) occurred in module 'lameACM'...
...while enumerating formats for audio codec "LAME MP3 Codec v0.9.0 - 3.93 (stable)" (acompchoose.cpp:188)...
...while enumerating audio codec ID 00148028 (acompchoose.cpp:183).

(LAME is the name of the MPEG audio encoding library, not a quality statement.)

It appears that the version of the LAME ACM codec that is distributed in the Nimo codec pack is broken in some way and crashes during codec enumeration, at least under Windows XP. I can reproduce this with Microsoft AVIEdit (a Platform SDK sample application) as well as Windows Sound Recorder (sndrec32.exe). In other words, this codec appears to destabilize the Windows XP audio codec system, and should not be installed. I do not know whether the problem lies in the codec itself, or in the particular build that was compiled -- LAME is distributed in source code form only and as such the ACM codec may not be exactly the same when compiled by two different sources. I wasn't successful in compiling it myself since it needs some headers from the Windows DDK, which unfortunately isn't distributed online anymore.

I ordered my Visual Studio .NET 2003 upgrade CD on Wednesday, and the next day I received a notice from Microsoft that it was backordered until mid-May. Arrgh. I'm looking forward to seeing how improved MMX intrinsic code generation is, as well as how long the new compiler can last before I can get it to emit C1001 INTERNAL COMPILER ERROR (grin). Visual Studio .NET 2002 only lasted about two minutes:

#include <windows.h>

struct Autolock {
    Autolock(CRITICAL_SECTION& cs) : mcs(cs)
                { EnterCriticalSection(&mcs); }
    ~Autolock() { LeaveCriticalSection(&mcs); }
    operator int() const { return 0; }
    CRITICAL_SECTION& mcs;
};

#define synchronized(cs) switch(struct Autolock lock = cs) default:

void InterlockedXOR(CRITICAL_SECTION& cs, int& x, int y) {
    synchronized(cs)
        x ^= y;
}

c:\test\autolock.cpp(16) : fatal error C1001: INTERNAL COMPILER ERROR
                (compiler file 'f:\vs70builds\9466\vc\Compiler\Utc\src\P2\ehexcept.c', line 904)

Another reason I'm looking forward to VC7.1 is that reportedly it fixes a nasty bug in the VC7 global optimizer, namely that it aggressively prunes computation of unused formal parameters in inline functions. Unfortunately, this occurs in functions that have inline assembly that accesses formal parameters by stack offset instead of by name, which is virtually required if you write assembly routines without frame pointers like I do.

4/22/2003 News: When an AVI file is not an AVI file

While trying to test some new code in VirtualDub 1.5.2 I happened to make an interesting discovery about Windows Media Player that could explain some of the bug reports I've been receiving. In particular, I now know why some AVI files refuse to play under Windows Media Player, instead showing a visualizer, even though you have the video and audio codecs you need to play the file. The answer will elicit great amounts of shock and awe.

Okay, maybe it won't.

The version of Windows Media Player that ships with Windows XP, and possibly newer versions, appear to have a bug in their media type detection code: specifically, any file that contains two or more consecutive MP3 frames in the first 8K of the file is considered an MP3 file. Unfortunately, that means an AVI file written with an MP3 audio track and without an OpenDML hierarchical index has a high likelihood of being mistaken as an audio file. When Windows Media Player does this, it displays a nice flashy visualizer, announces the audio track's bitrate with some ridiculous duration, and then refuses to play the file properly. DirectShow itself doesn't have this problem, as neither Windows Media Player 6.4 (mplayer2.exe) nor the old Media Player (mplay32.exe) with the MCI DirectShow driver goofs up in this fashion. I do not know whether this bug affects WMP9, as I don't have that installed and don't plan to anytime soon -- 6.4 works just fine. (I still use WinAmp 2.77 too. Why change what works?)

Files written out by DirectShow won't trigger this bug, as they have AVI headers more than 16K long. VirtualDub normally won't trigger this bug either, for similar reasons -- it has to reserve space for the OpenDML indices whether or not they're actually needed. The problem occurs when VirtualDub writes AVI files in compatibility mode (old format AVIs), or segmented files, which automatically turn off the OpenDML index support. In these cases, VirtualDub writes a smaller 2K header instead, and this is what triggers the Windows Media Player bug. Annoying as it is, I'll probably modify VirtualDub to write 8K headers instead, as there is practically no need for tiny AVI files, and I don't feel like doing more experiments to figure out exactly what Windows Media Player 8 considers valid MP3 frames.

Current files that have this problem can be "fixed" by running them through VirtualDub in Direct mode with the normal Save AVI option. For best results, disable the "trim" option in Video > Frame Rate, so VirtualDub copies all data from both streams even if they're not the same duration.

4/12/2003 News: VirtualDub.NET

I got bored yesterday and compiled VirtualDub in Visual Studio .NET to the .NET common language runtime (/clr). After fixing a couple of violations of the One Definition Rule (different global definitions of MyFilterData in different files) and sidestepping a link bug with VDswprintf(), I had about 90% VirtualDub 1.5.2 running in bytecode. Now, the program was still i386 bound (due to inline assembly), still glued to Windows (due to many API calls), and about 5-10% slower -- but it was cool to see my MPEG decoder running in portable bytecode after flipping a switch. It's obvious now where the real work went into Visual C++ .NET, since it only took a few minutes to convert my program to the Common Execution Environment, including seamless integration with existing assembly functions. Definitely a lot easier than moving to the JVM (Java Virtual Machine).

The downside is the file size.

VirtualDub 1.5.2 alpha is about 1MB built Release, before getting packed by UPX in the build script. The .NET version is 1.9MB, of which ~640K is .NET metadata and the other 250K or so appears to be IL. It's strange to me that IL would be bigger than the equivalent native code; perhaps VC++'s IL optimization is not yet to the level of its native optimization. The 640K of metadata, however, is unacceptable. Having symbol information is not a problem since VirtualDub ships with source anyway, but having a executable that's twice as big is -- and VirtualDub doesn't use C++ objects that extensively save for a little STL usage. The information would be useful for a crash handler, but it's way too big for that, the VirtualDub.vdi file being about one-tenth the size.

This is a bit of a disappointment, really. One of the advantages of .NET is that garbage collection and the execution engine are not tied together -- you can still use unmanaged memory as in native C++ when targeting IL bytecode. Another is that there is significant effort being put into .NET environments for Linux (Mono and DotGNU). Just-in-time compilation (JIT) has advanced to the point that most traditional optimizations are included and the speed hit is acceptable for UI and framework code, and with seamless integration into native code for inner loops OS portability without recompilation is feasible. Garbage collection is still unwanted by me, however. The technology is at the point that the speed and memory usage are much better than before, but it seems that every time a language is converted to garbage collection the first thing the language designers do is kill destructors. Sorry, my scoped lock class can't release a critical section in a finalizer called with random delay between zero and infinity.

Managed C++ (/clr) allows you to switch between managed and unmanaged memory on a per-declarator basis with __gc and __nogc, but those look too much like __near and __far. Yuck!!

I think I'm about to declare "feature freeze" for 1.5.2, and begin cleaning up the bad parts of code to prepare for release. There isn't going to be any major new feature in 1.5.2, just minor improvements and more internal tectonic upheavals to prepare for major new features later (thus saving me the effort of writing said features now). 1.5.2 will still not support external audio filters yet, but the internal API has been improved in preparation; in particular, filters can now request that the host convert upstream data into a specific PCM format. I'm a big proponent of pushing as much work into the host as possible. In my opinion, this makes for easier plugin development, meaning more plugins, and more importantly, more reliable plugins.

2/22/2003 News: VirtualDub 1.5.1 released

When you said you wanted free software, you should have specified you wanted bug-free software.

I wouldn't have released 1.5.1 nearly this quickly if it weren't for the glaring bugs in the 1.5.0 release. The biggest one, the random crash in the menu bar, is system-specific. I actually put 1.5.0 through a short beta test period since I knew the heavy refactoring I had done broke a lot of code; surprisingly, even though the testers did a great job and found a number of bad glitches that I fixed before release, none of them noticed the menu bug. I can't even reproduce it on the general version, but it did appear on the P4 version when I actually used the mouse to control the program. Stupid stack-sensitive bugs. The glitches in the capture module were due to some in-progress conversion of filename handling to Unicode -- a couple of subsystems in 1.5.0 are capable of accepting Unicode filenames under Windows NT/2000/XP, and this will grow with time. The simple truth is that QA is labor intensive, nondeterministic, and boring. Which is basically why most projects, commercial or not, don't do enough of it before shipping 1.0. Asymptotic version numbering schemes are a popular remedy.

On the good side, I received ~100 crash traces for 1.5.0. Which means that, in spite of the crash context additions I made to the source code, I escaped the cardinal embarrassment of shipping a crash handler that crashes.

I recently discovered a few interesting additions to the Visual C++ .NET code generator that didn't exist in Visual C++ 6.0 SP5+PP. One of them is constant evaluation of static initializers, which means you can, for instance, declare a global 3D vector constant and have VC7 optimize it down to pure initialized data. The second is that VC7 can generate the bswap instruction intrinsically (_byteswap_ulong()), which is great for bitmap processing and MPEG bitstream parsing. Third, the undocumented compiler switch /QWMIemu causes VS.NET to emit SSE2 instructions with a lock prefix instead of a size override prefix for software emulation purposes. This borders on absolutely useless, but it's interesting.

A great plugin for the Visual Studio .NET IDE:

http://www.workspacewhiz.com/OtherAddins.html

Fast Solution Build fixes my #1 hair-pulling pet peeve of Visual Studio .NET: the pathetically slow, moronic dependency checker that add 30 seconds to the build cycle because it has to print "up to date" over and over and link executables whose dependencies didn't compile. Highly recommended. We recently switched from VC6 to VS.NET at work, and I swear I would have broken my keyboard in half if it weren't for this plugin because the VS.NET IDE is so braindead. I still might due to the 300 baud output window and the dumb solution tree that opens itself up and sorts filenames with case sensitivity on a case-insensitive filesystem. Hopefully VS.NET 2003 will solve a lot of the problems, but I'm not holding my breath.

Direct3D is another Microsoft creation that makes me want to throw equipment, but at least I don't have to pay for it. (Directly.) Ask me about it if you're really bored and haven't heard of a bad API yet.

Build 15654 (Version 1.5.1):          [February 22, 2003]
   [features added]
   * Improved audio filter dialog.
   * Added "split" and "mix" audio filters.
   * Capture mode: Added menu item to launch Windows Volume
     Control in Recording mode.
   * Save Segmented AVI now attempts to cut before keyframes
     when the video mode is set to "direct copy."

   [bug fixes]
   * "Change so durations match" frame rate option was using
     microsecond periods as frame rates.
   * Fixed random crash when selecting menu option with no
     video file loaded.
   * Fixed broken free space gauge in capture mode.
   * Fixed trashed filenames when attempting to set capture
     file.
   * Save Segmented AVI was using one digit instead of two
     for the segment number.
   * Video capture without an audio device now works.
   * Audio compression dialog no longer shows incompatible
     compression formats when it first appears.
   * Added workaround for "shutdown when finished" job
     control option to work under Windows 98.
   * Assignments to string variables now work in scripts.

2/16/2003 News: VirtualDub 1.5.0 released

VirtualDub 1.5.0 is out. From the user's prospective, 1.5.0 consists mostly of bug fixes, not the revolutionary change you would expect from a major version bump. The reason for the major bump is that the program has been internally restructured, breaking out some code into libraries and cleaning up the build process significantly. Sources are no longer split across four archives and sylia.dll is no longer statically linked, which should simplify project management a bit.

That is not to say that there are no new features in this release, however. 1.5.0 is the first release to have audio filtering, which has been a requested feature for some time. Now, the audio filtering system is quite rough at this point -- it's not optimized, and the filter selection is a bit sparse. You also cannot write external audio filters yet, although that is definitely going to change. The current selection of audio filters consists mainly of miscellaneous algorithms that I have been playing with for a while within a sandbox WinAmp 2 plugin:

Center cut. The classic "vocal cut" filter, except that the output is stereo instead of mono. This is accomplished through FFT phase analysis; the output will have some warbling in it, but stereo separation is preserved. Also known as the "make your own karaoke to embarrass yourself with" filter.
Ratty phase shift. A time-domain, sawtooth-swept delay line, with rake-like correlation to smooth out the jumps. Good for about +/-20% variation in pitch.

Now, you might ask what a pitch shifter is good for with video. Well, 1.5.0 also contains a stretch filter that allows you to slow down or speed up audio, like a tweaked tape recorder. Combine a stretch and a matched pitch shift, and you get a time stretcher. If you set the pitch shifter and stretcher to the same ratio and tweak the video frame rate to match, you can speed up or slow down video. (Yes, you can now make "Sakura Saku" even faster!) I'm still looking for a better pitch shift algorithm -- the current version has problems with clicking when multiple dominant tones are present. I tried a frequency-domain version once, but it didn't work out too well: frequency-based algorithms don't like sharp attacks and tend to "smear" them, making percussives sound mushy.

Now, for the obligatory off-topic paragraph: (warning, plot spoilers)

I saw a bit of the anime Onegai Teacher recently. (I just got the Nuku Nuku DVD -- guess what my next DVD purchase will likely be? :)) I have to say, the series is totally original. A young woman played by Kikuko Inoue, ~~Belldandy~~ Mizuho, ends up living with a student called ~~Keiichi~~ Kei. They have to keep their relationship secret since she is not from this world. They are then visited by two people. ~~Her older sister, Urd~~ Her mother, Hazuho, is a bit lewd and tries to get them closer together, whereas her younger sister, ~~Skuld~~ Maho, doesn't like ~~Kei-chan~~ Kei-kun and wants to break them apart. The couple is broken apart later in the series, causing them anguish, but ~~Belldandy~~ Mizuho returns and they live happily ever after. Not that I'm complaining, really -- Belldandy without the tranquilizers, yay! -- but geez, this is a bit too familiar.

Oh well, voice actor reuse is always fun, I guess. Take Inuyasha, for example. This is another series for which I thought, "haven't I seen this before?".

Build 15584 (Version 1.5.0):          [February 16, 2003]
   [features added]
   * Can add a single job to the batch list with syntax:
        /p[input_file],[output_file]
   * Filters are now loaded and unloaded on the fly to
     circumvent TLS (thread local storage) selector limits.
   * Cropping bounds can now be dragged via the mouse.
   * Improved, friendlier crash diagnostics.
   * Basic audio filter support (no plugin support yet,
     though).  The pitch shifter sucks.
   * Increased accuracy of audio/video timing by switching
     from microsecond to rational calculations.
   * Changed font on dialogs to enable ClearType on XP.

   [features removed]
   * Deleted outdated 3x3 average filter -- it has been
     superceded by "blur."

   [bug fixes]
   * New MPEG audio core (Priss) -- fixes decoding errors
     in layer I and layer III audio and adds SSE polyphase
     support.
   * Fixed motion JPEG decoding bugs when padding is
     present before markers.
   * Fixed crash in SSE2 code when decoding MPEG-1 file
     odd width in macroblocks.
   * Fixed crash in SSE2 resize routine when doing 4-tap
     vertical resample with odd width.
   * Fixed swapped UVs in About dialog box. ^^;
   * Fixed sync errors in MPEG-1 playback when decoding
     an audio stream which flips the copyright bit between
     frames or switches layer III bitrates (VBR).
   * Fixed "Frame not found" errors in MPEG-1 decoder when
     GOP is longer than 128 frames.
   * Rewrote resampler clip determination code again.
     Hopefully this one will be Bug Free (tm).
   * Fixed spurious errors at end of operation when saving
     WAV file.
   * Outputted configuration files and job scripts now
     include the correct audio filename when a .wav file is
     selected through a script.
   * Fixed AVI segmented output creating short files when
     working from MPEG-1 source or when IVTC is active.
   * Image import filter wasn't caching frames.
   * Fixed a couple of Get*() script calls that were declared
     incorrectly internally and didn't work (thanks to
     Cyrius).
   * Blur filter now handles cropping properly.

1/16/2003 News: Quick fix^2

Correction to the Antigua fix below: the DWORD value should be 0x00000000, not 0x00000001, because you want to disable preview. Sorry about that.

1/14/2003 News: Philips SAA713x (Antigua) capture fix, and random other stuff

One of my friends commented that my web page wasn't interesting because I didn't update it enough.

I have been in contact with Philips regarding a problem with VirtualDub's capture module and the drivers for the Philips SAA713x (Antigua) chip -- essentially, you can get video capture to work one or two times, then the thing dies. Overlay works, but preview doesn't, and you get zero frames trying to capture. For you coders out there, from the Video for Windows side this very simply means you get zero samples on both the video and preview callbacks. As it turns out, Philips' staff discovered that the problem is VFWWDM driver connecting to both the Preview and the Capture pins on the Antigua capture filter. You can disable the Preview pin with the following Registry entry:

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Class\{4D36E96C-E325-11CE-BFC1-08002BE10318}\nnnn\Parameters\CapPreviewEnabled = DWORD:00000001

nnnn is a four-digit number that varies by system -- the best way to figure it out is to search for a key that is fairly unique to the driver, perhaps ADC Phase Clock Delay or VideoTunerEnabled. There should already be a bunch of other Enabled type entries in the registry key. (Note that CurrentControlSet is an alias and usually points to ControlSet001.) When you are done, reboot the system (I mean it) and then restart VirtualDub in capture mode. If the change took, the Overlay option should be grayed out. After making this change on my Windows 2000 system, capture works reliably. Full credit goes to the Antigua team for figuring this out -- my role for the most part was trying a capture and responding back, "uh, yup, it doesn't work."

Please note, however, that the registry change disables functionality in the capture driver and may interfere with normal operation of DirectShow-based capture applications. It may also void your OEM technical support. It is suggested that you keep note of the change for future reference, and perhaps bookmark the key if you are running the Windows 2000/XP Registry Editor, in order to delete it later if you experience problems in other applications.

As usual, if your system fails to boot, you must have screwed up somewhere.

I recently discovered that a lot of the crashes I have been receiving are the result of people aborting VirtualDub after the "deadlock detected" dialog appears. Folks, when you hit OK on that dialog, VirtualDub does an abnormal process abort, and all bets are off. The most common result is the DivX 5 codec crashing on a line that looks like this:

mov al, [ecx+630]    <-- FAULT

Please don't send me these reports, as they are not useful. This crash can also occur as a secondary crash after an initial crash is intercepted and reported. Do not save the subsequent crashes. The first one is the only one that is useful. I know sending these crash reports is a pain in the butt, so I am working on an improved crash handler for the next release that will report more user-friendly analysis.

On yet another note....

I am hearing more and more of VirtualDub being distributed in "rippacks" along with front-ends, Avisynth, and lots of plugins. GPL issues aside, there is an increasingly occuring problem with such composite systems under Windows 95/98 -- codecs, filters, and/or plugins are failing to load at random. The problem is that a lot of codecs and plugins are statically linked to the C run-time library (CRT). Each instance of the CRT consumes one Thread Local Storage (TLS) slot, of which there are only 64 under Windows 95/NT4, and ~80 under Windows 98. Once all TLS slots are consumed, DLLs fail to load. Codecs have a tendency to stick around in memory, and VirtualDub filters are always loaded, so you can hit this limit really fast.

VirtualDub will be switching to dynamic filter loading in order to help this problem somewhat, but codec authors can help out by using the shared CRT -- switch C++ Code Generation to Multithreaded DLL. Don't do this if you are using Visual Studio .NET, because it will require MSVCR70.DLL instead of the MSVCRT.DLL that ships with the OS. End users can work around the problem by temporarily removing filters, codecs, and plugins that aren't immediately needed. Better yet, upgrade to Windows 2000/XP, which raises the TLS limit to ~2000.

11/30/2002 News: VirtualDub 1.4.13 released

1.4.13 is out way ahead of schedule, because I botched some changes to the resize filter and RGB color conversion routines very badly in 1.4.12. It also fixes a nasty audio desynchronization bug that has been in the codebase for some time. Please upgrade to 1.4.13 ASAP. Also, the P4 version now has its own .vdi file for debugging purposes.

Now if you'll excuse me, I have to wear the brown paper bag on my head again.

Build 14328 (Version 1.4.13):          [November 29, 2002]
   [features added]
   * Added frequently-requested Lanzcos3 kernel to resize
     filter. Can't see any difference whatsoever.

   [bug fixes]
   * Fixed nasty resize filter bug and RGB24<->RGB32
     conversion errors on odd bitmap widths.
     (Regressions in 1.4.12)
   * Fixed audio desynchronization when processing
     compressed audio in direct stream copy mode with both
     a start offset and deleted segments.  Thanks to Cyrius
     for the bug report and fix.
   * Audio compression system now drops a final partial
     block from an audio codec -- Microsoft ADPCM was
     producing these, resulting in a runt AVI stream
     sample. The new behavior matches that of Sound
     Recorder.
   * A partial final block no longer triggers the VBR audio
     adjustment routine.
   * "Previous keyframe" from beyond the end no longer
     seeks to start if frames have been deleted or masked.
     Also thanks to Cyrius.
   * Fixed crash in MPEG-1 decoder when playing or saving a
     video stream with D-frames or invalid frame types.
   * B-frame audio skew support is now also enabled for the
     'XVID' video format.

11/23/2002 News: VirtualDub 1.4.12 released

1.4.12 is out, and it breaks a rather old VirtualDub tradition: it has a separate version optimized for the Pentium 4, instead of all optimizations in one codebase. Intel Corporation has graciously given me a 3.06GHz Pentium 4 with HyperThreading Technology, along with copies of Intel C/C++ and VTune, and I spent some time optimizing the MPEG-1 decoder, resize filter, and color conversion routines for the P4. The reason for the separate executables is that the P4 version is compiled with Intel C/C++ with the /QxW flag, and won't run at all on CPUs without SSE2. However, don't fret, because 1.4.12 still has a standard executable with auto-CPU-specific dispatch, and it even has some of the optimizations of the P4 version. To run the P4 version, just drop it in the folder where you unzipped the regular version, and launch VeedubP4.exe instead; it otherwise should function exactly as the usual version. And yes, I do still have a Pentium III, so I do know the normal version does not require a P4.

This wouldn't be a new release, of course, without a little something for everyone else too. As it turns out, the HyperThreaded CPU exposed non-atomic synchronization code in the playback routine, and so this version fixes random lockups during playback on any SMP or HT-capable system. (A rather neat feature of HyperThreading is that you find all the mistakes in your threading code without having a second CPU do nothing all the time other than run WinAmp.) The VTune 6.0 profiler also spotted an unaligned row buffer in the resize routine, which should execute a little faster now. I fixed a bug that made the copy construction support in the filter API unusable, and fixed the directory bug that everyone's been telling me about in the Save Image Sequence command. I'm sorry I wasn't able to squish some of the other bugs or missing features that still exist, but I wanted to get the P4 version and the above critical fixes out first.

My philosophy is that one executable should contain optimizations for all CPUs and users should not have to switch executables to do so, but I have to rethink my strategy for doing so. Intel C++ has a much better code generator than Visual C++, even VC7 -- the output of the Intel compiler makes me say "hey, that's pretty good," whereas my reaction to Visual C++'s output is usually "hey, that doesn't suck." The main problems are that CPU-specific dispatch is tougher when you have a large amount of C++ code involved, and that the Intel compiler generally produces executables about 30% bigger than the Microsoft compiler. A third downside is that the IC++ inline assembler... well, miscompiles some of my assembly code. For that reason, one module in the P4 version, mpeg_idct.cpp, is compiled with Visual Studio .NET rather than Intel C++. My plan for current releases, however, is for the codebase to be buildable on Visual C++ 6.0 SP5+PP, Visual Studio .NET, and Intel C++ 6.0. Also, at least for the short term, VirtualDub will continue to run on all 80486-compatible CPUs; I haven't decided to require MMX yet.

On a random note, I've been distracted by two new games. I just started playing Final Fantasy X (beaten FF4-FF9/FFMQ/FFT/SD2/SD3/RS3, can't stop now), and although I like the new battle system, I hate Blitzball. It's actually been out for a long time, but I just got it. For some reason, I have a strong urge to rename my main character "Selphie." The other game is Need for Speed: Hot Pursuit 2, which has much improved car physics and gameplay -- the cops no longer have the giant electromagnet at their disposal and resetting gives you a running start -- but what I find annoying is the game insulting my driving. I play NFS:HP2 with the keyboard, and the cops keep radioing "he's all over the road!" to each other.

The changelist for 1.4.12:

Build 14303 (Version 1.4.12):          [November 23, 2002]
   [features added]
   * Parts of the MPEG-1 decoder, some color conversion
     functions, and parts of the resize filter have been
     optimized for SSE2 (Pentium 4).
     
   [bug fixes]
   * Fixed intermittent deadlock during playback caused
     by non-atomic thread synchronization. This affects
     SMP systems as well as CPUs with HyperThreading
     Technology.
   * Fixed Save Image Sequence regression in 1.4.11 that
     caused the directory portion of the dialog to be
     ignored.
   * Fixed broken copyProc support.

11/1/2002 News: Knowledge base, filter SDK, and scripting document updated

The Knowledge Base has been updated rather lamely -- I just copied over the entries from the changelog that were likely to have affected users. The filter SDK has also been updated to V1.05 in order to cover the copy construction of filter structures, a feature new to 1.4.11 that allows you to write filters that use regular C++ class objects for the filter_data structure. I have a filter framework in progress that makes this easier, but it is not ready yet. Also, the scripting document now covers the configuration commands for the logo and HSV filters, and the new SaveImageSequence() command.

Finally, here's the changelist for 1.4.11:

Build 14279 (Version 1.4.11):          [October 31, 2002]
   [features added]
   * Added support for reading and writing TARGA (.tga)
     sequences, with optional RLE compression.
   * Added simple logo filter.
   * Added (not-quite-optimzed) HSV filter.
   * The "Save Image Sequence" command is now batchable
     and scriptable.

   [bug fixes]
   * Fixed OpenDML files having bad duration values in
     their index if video frames weren't all the same
     size.
   * Fixed some subset-related position slider glitches.
   * RLE AVI files weren't being decompressed correctly
     (GDI's RLE isn't the same as AVI's RLE).
   * Fixed crash disassembler not disassembling some
     instructions properly.
   * Fixed glitches in the first three frames of the
     temporal smoother's output.
   * Fix for some MPEG rounding errors (arrgh).
   * AVI parser now accepts and reindexes LIST/movi chunks
     with 0 sizes.
   * AVI parser no longer drops stream 0 samples at the
     start of a file when reindexing.
   * AVI video reader detects and flips inverted RGB DIBs.
   * Fixed 1/16th darkened line on left side of "blur more"
     filter.
   * Added support for properly cloned filter data
     structures.
   * Allowed position control font to enlarge slightly
     according to screen dpi and fixed font leak.
   * Fixed MIME BASE64 encoding errors at end of codec
     configuration blocks that probably caused some codec
     crashes or configuration funniness.
   * Calls to video codecs now eat MMX errors rather than
     reporting them, to workaround a bug in the MSMPEG4V3
     codec that will probably never be fixed.
   * WAV writer now writes out the required 'fact' chunk
     for compressed WAVs.
   * Filter preview dialog now sanely reports errors in a
     non-ugly font.

10/31/2002 News: VirtualDub 1.4.11 released

VirtualDub 1.4.11 is out there -- I finally released it because there are two critical bug fixes in it that I've been sitting on waaay too long. One is the OpenDML bug, which went largely unnoticed in my testing because it only occurs with variable size video blocks. The other is a MIME64 encoding bug that was probably crashing some codecs in batch mode by trashing the last byte of the codec configuration structure. My apologies -- I value third-party software compatibility very much, and these were two very big booboos. Accompanying 1.4.11 is a new release of buildtools, 1.2, an upgrade over the 1.1 release that almost no one had because I forgot to update the link. Anyway, if you are fooling around with the source and want to rebuild the debug resource:

Download source code for VirtualDub build tools from virtualdub.org (HTTP).

1.4.11 has a new icon as well, thanks to Spire. Actually, I touched it up a bit and removed the 256-color versions (although good looking, I couldn't bring myself to ship with a 10K icon), so don't blame him if you think it looks bad. To tell you the truth I rarely pay attention to the aesthetics of my program, because although many people appreciate open-source software, very few appreciate open-source art, because it generally looks ugly. Honing the "bad art skill" is very important for a professional career in software development because it lessens the chance of your company shipping a product with your programmer art, which is generally embarrassing.

1.4.11 also includes a new scripting command for saving image sequences, and supports a few new features in the filter API, most notably a copyProc procedure -- this allows you to implement a copy constructor for your filter in order to escape the dumb "must be POD" restriction in 1.4.10-. Doing this properly requires some slightly obscure C++ (placement new) so I will probably need to polish a framework for this. I'm too lazy to update the filter SDK and scripting document tonight, as well as the knowledge base, so I'll probably do that stuff tomorrow.

Uh, I guess that's all I have to say.^[1]

¹Blinn, Jim. Jim Blinn's corner: a trip down the graphics pipeline. Read it!

8/18/2002 News: More HTML fun

As you can see, I've spend the weekend doing the exciting task of redoing my website. The biggest part of this was rewriting my HTML compiler from scratch -- it now sanely uses STL strings and generates a parse tree of psuedo-XML that it then reprocesses to output the final pages. This means I can do all sorts of transformations that I couldn't do before. About two-thirds through I realized that I was basically rewriting XSLT, but writing my own utility freed me to add file generation and output-compaction features that XSLT can't do. The other half of the rework is the introduction of CSS into the mix so I can discard all of the stupid <FONT> tags I used to have everywhere. I'll probably end up crashing a lot of Netscape 4 browsers out there, but I figure I'm doing everyone a favor by doing so.

Christian HJ Wiesner has graciously started a VirtualDub users forum at [Edit: Old link no longer valid] -- this looks to be a great place to collect all of the common questions and drop my mail load. I encourage you to try it out. He's actually been bugging me for weeks to put up this link, and I kept putting it off because I was in the middle of reworking the site -- for that, I apologize.

Donald Graft's website was down for a while, but it's now up under a slightly new URL: [Edit: Old link no longer valid]. Sorry, but our favorite VirtualDub and Avisynth filter guy doesn't disappear that easily.

Finally, on a completely random note, if you are a Ranma fan you should really read these two fanfics from Jeffrey "OneShot" Wong's site:

Ranma's girls
The least one can do (Almost but not quite done as of this writing -- check the home page)

They're very good, although so frighteningly long that they could probably be bound into small books. Oh, be wary of the stories titled "____'s usual morning." You may regret opening them. :)

Current version

Navigation