¶Appending streams and mismatch errors
I see that one of the append dialogs from VirtualDub has been featured in The Daily WTF. Personally, I consider the presentation of this dialog as a WTF to be a grave disservice, since there are so many better WTFs in the program that could have been used. I feel insulted. I suppose I should explain the reasoning behind this dialog, though, and why it pops up annoyingly when it does.
To clear up a misconception: one of the comments noted that the error was caused by comparison in floating-point. This actually isn't the case, and if it were, it would actually make the dialog appear less frequently due to roundoff. (Remember, floating-point can be inaccurate, but consistently so.) Frame rates in AVI are not stored as floating-point or fixed-point, but in rational form as the ratio of two 32-bit unsigned numbers. This means that the sample rates of two streams can differ by up to the 10th significant digit, and more importantly, two streams can have different values in the header that correspond to the exact same frame rate. The submitter didn't indicate what version he was using, but versions prior to 1.4 have a bug in that they don't do a proper fraction check (they check numerator and denominator independently); this can cause error messages like the submitted, where the two frame rates are the same. One way to cause this is to process one of the segments with a version of VirtualDub that normalizes the frame rate fraction to lowest common denominator — the starting version of which I forget — so make sure you're using the newer versions of VirtualDub across the board when possible, to use the more liberal check.
And, yes, admittedly the error is not very informational, and I should look into clarifying it. However, it exists because of a sticky limitation with the way append is implemented, and I won't apologize for making an overly technical error message instead of one that says "I can't append."
What the append command is actually for
VirtualDub isn't a non-linear editor, and has never been one. In fact, neither the rendering engine nor the UI can handle more than one input video and audio stream. This isn't to say I don't want it to be one, but to get that far takes a lot of work, and there's a lot that I do or want to do with the program that doesn't require NLE support and is a lot easier to implement. I think I could buy software acceptable to my future NLE needs more easily than I could obtain one for my peculiar video capture requirements.
What is the append function for, then? It's to re-splice a movie that has been split across multiple files to circumvent file size limits, mainly from video capture. Due to the granularity of audio and video streams, particularly when audio compression is in use, the files can't be cut exactly cleanly, and thus the video stream will be slightly longer than the audio stream, or vice versa. For this reason the Append function does an unaligned splice. This means that the audio and video streams are independently glued end-to-end without regard for sync to each other, so any difference in the lengths of the streams in the current file will correspondingly shift sync in the appended segment. This behavior is actually desired for a split movie, because it undoes the desync that occurred when the split occurred. It's a little less desirable when you're trying to glue independent videos together.
So using this function to splice together a bunch of different clips to make a montage isn't exactly the original intended use, although admittedly it's the closest you'll find in the program.
What this means is that if you're trying to use the feature to join two different video files, you have to be a bit careful about how the first file is trimmed. You want the durations of the audio and video to match as closely as possible to minimize the desync on the second segment. One way to do this is to purposely cut a few frames off the end of the first segment and reprocess it in direct/direct mode; unless the "cut off" setting has been disabled in Video > Select Range, VirtualDub will trim the audio as closely as possible to the length of the video, assuming you cut back far enough. The finer the granularity of the audio, the better, so uncompressed audio will give you a more precise cut here.
Where the restrictions come from
I mentioned earlier that the main subsystems can't handle more than one A/V stream each. So where is the join handled? Well, it's actually done in the AVI parser by merging the indices and virtually concatenating the raw files, which is the reason why you can't append anything but AVIs, and is also the reason why you can't append Avisynth scripts — because the AVI layer isn't parsing a file in that path. The only time you can append is between two real AVI files that VirtualDub is itself parsing. I forget why I put it in the parser, but I think it's because it was a lot easier than trying to manage a lot of individual parsers and trying to consolidate the buffer caches between them (memory was a bit tighter then).
As for the other restrictions....
The exact restrictions in the current version, 1.6.11, are:
- The streams must occur in the same order and there must not be extra or missing streams.
- The corresponding streams must be of the same type.
- The frame rate fractions must be the same, numerically. 10/2 and 5/1 are considered equivalent.
- The sample sizes must be the same, except for VBR video streams (the usual case for video).
- The data formats must be the same.
The data format check trips up some people when they attempt to recompress one video to match the other. The data formats in the two streams must match exactly. VirtualDub enforces this because it has no idea what the data format block contains, since most of it is opaque and specific to each video/audio codec, and decoding a stream with the wrong format can easily cause a crash. For this reason, I do not intend to make this particular check optional. Note that there is a bug in Huffyuv 2.1.1 that can trip this check unnecessarily, since it fails to initialize a few bytes at the end of its format structure. VirtualDub pre-clears the format block memory to avoid this problem on write, but until recently this was not done in capture mode until 1.6.9+, so if you capture a Huffyuv file with 1.6.8 or earlier and attempt to merge it with another Huffyuv file, it may fail for this reason.
The frame rate check is, of course, the one featured in the WTF dialog. The first issue is that people attempt to look at the number in the error and type it in to the frame rate adjustment box to coerce to that rate, but that doesn't work, because the single fractional number you can enter for the frame rate isn't enough to specify the fraction needed to match, and there isn't UI for entering in the fraction directly. (If anyone has a good algorithm to compute the closest 32-bit/32-bit fraction given a single real number, please contact me.) The second issue is that any attempt to coerce the second stream to the frame rate of the first will change the speed of the second stream. This is the real reason behind the frame rate error. Changing the length of the video stream has two bad effects: it causes a gradual resync relative to the audio stream, and it also splits the stream ends apart in time, which then causes more desync on any subsequent appends. Neither of these effects are readily apparent in the UI. Also, very small errors become magnified after 3600+ seconds of video.
Now, the question I'm sure you're asking is: why doesn't the parser just ask whether the error is acceptable or not? Well, the AVI parser code is too low-level to display UI, and while it can queue up warnings for display later, that doesn't work for user queries. One reason for this restriction is to avoid reentrancy problems caused by running message loops in non-UI code. (This is, by the way, why it is a bad idea to spawn message boxes in the main thread from a video codec message that isn't UI related.) So, doing this is a bit inadvisable. Also, I simply don't like this workaround.
MP3 audio also can cause sample rate match errors on the audio stream, because of a quirk in the Fraunhofer-IIS MP3 codec — namely that it writes an inaccurate sample rate for 44KHz MP3 streams. Or rather, it writes the correct rate, but doesn't target it exactly when encoding, often off by ~0.5%, which is enough to cause perceptible desynchronization. VirtualDub recomputes the closest value when using an audio codec that encodes to MP3, but the correct value is frequently fractional and thus can differ by +/-1, and furthermore, its calculation might not match exactly with another program that does the same correction. This correction is also disabled during capture and when writing a segmented AVI. Therefore, it is recommended that this particular configuration be avoided when using the append feature.
What people are wanting when they run into problems with Append is an aligned splice, which can edit the streams until they can be spliced without holes in either stream. To do this the append needs to happen at a higher layer than the parser, probably in the decodable stream layer. This is probably not too bad to do for streams that do match in frame rate, but doing it for those that don't is harder because I actually need to resample one of the streams, probably the video stream. And that's complicated by the fact that in the general case, I can insert frames, but not delete them. So there's a bit of legwork to be done here to figure out how to do this smoothly, and I haven't had time to do this yet.