Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

Redesigning the VirtualDub video filtering system

The video filter system is one of the oldest remaining parts of the VirtualDub source code, now that I've rewritten the capture and project modules, and large portions of the render module. While it contributes a rather important part of VirtualDub's feature set, at this point it is also a major limiting factor in the program's development. As such, it has gotten my full attention for the next major experimental release, 1.7.0 (in case you were wondering what I've been doing for the past couple of months).

Designing a next-generation video filter system isn't easy. I've sunk a lot of time into it so far, and there are some parts of it that are nice, and other parts that are simply hairy to implement and design. Below, I'll talk about some of these and some of the implications for both users and filter authors. For brevity, I will refer to the current-generation video filter system as VF1 and the next-generation system as VF2. Some of the features I talk about here might not make it into the next version, though, depending on how well they work out, as honestly the current design is rather ambitious. I don't normally like to talk about designs-in-progress because it seems too much like a promise that I may not be able to keep, but I also don't like to work alone in the dark for too long.

Color conversion

One of the biggest limitations of the current filter API is that it only supports the 32-bit X8R8G8B8 pixel format. While this is very easy to program, and is advantageous for some types of rendering, it is inefficient with other types of image processing algorithms that can be performed directly in the YCbCr color model, which is preferred for many video compression formats. It would thus be desirable to support multiple types of formats in the chain.

The only real limitation in the API itself is that it has a single depth field that corresponds to the RGB bit depth; the remainder of the limitations were in VirtualDub's support code around that API, most notably the bitmap library. I already completed the prerequisites for this a while back with the new pixmap library in 1.6, which can convert between the common RGB and YCbCr formats, and the new display code, which can handle the formats directly. At this point all that remains with regard to API work is allowing the filters themselves to indicate their format compatibility and to see the new formats. I'm not terribly fond of simply throwing errors when direct compatibility isn't possible between filters, however, so there's some UI work to be done here (the conversion itself is just a call to VDPixmapBlt()).

As a side note, it's possible for some filter algorithms to ignore aspects of the video frame format. For instance, a vertical blur often doesn't care what colorspace or even how many channels are being used, as long as the data is arranged in byte-per-channel format. This means the exact same code path could be used for RGB24, RGB32, YUY2, UYVY, Y41P, Y8... well, you get the idea. Some of the capture filters work this way. Ideally, the video format information would have color format and channel format separate. I didn't do it this way in 1.6, though, and I'm not sure it's worth the additional rework.

I don't have this done yet, but all of the support code is in place. Note that current versions already have a bypass shunt in place for the simple case: if there are no video filters in the chain, the conversions to and from RGB32 are bypassed, even in Full Processing Mode, and a direct blt is done from the source to output format. YCbCr-to-YCbCr conversions do not go through RGB color space.

Asynchronous, out-of-order operation

Another problem with the VF1 API is that it enforces in-order processing of video frames, because it is a strict one-frame-in, one-frame-out API. This is the simplest API to support from a filter author's point of view, because it basically means your filter is a Process() function. It doesn't work well for filters that need to reference more than one source frame, however, or those which want to reorder/add/remove frames. In VF2, the filter's main Run() function is being restructured as a general frame function:

y = filter(x[0], x[1], ...);

The source frames can come from any number of input source pins and can vary on a per-frame basis. This also supercedes the frame lag feature that made it easier, but not entirely possible, for VirtualDub to identify the windowing behavior of a multi-frame-input filter.

Another issue is synchronous operation. The VF1 model can actually support multithreaded operation, because the filters expect to receive the source frame on entry, not explicitly request it. It is thus possible for the host to predict the source frame trivially and prefetch it in parallel. In the source fetch model, though, the video filter can request any number of arbitrary frames, which makes prefetching difficult. The idea in VF2 is to split away the frame fetch into a separate function:

Prefetch(output_frame) {

This function is essentially the inverse of the Run() function with frame numbers only: [x_frame[0], x_frame[1], ...] = Prefetch(y_frame). This allows the host to request frames from the upstream filters ahead of time, so that the upstream filters can work in parallel with the dependant filter on data for different output frames. For windowed filters this routine merely needs to request a few sequential frames in a loop. Many filters don't need this level of complexity, however, in which case the host can simply provide a default implementation that passes through the output frame number as the sole source frame.

The Run() function then receives an array of source frames, each tagged with the source frame number, along with a slot for the output that contains the output frame number.

While this permits multithreaded operation, it isn't my intention to force all filters to be thread-safe. That's a nasty task and could lead to lots and lots of very hard to diagnose bugs. The running threading model I am proposing instead is the following:

For most cases the Prefetch() function will only be a pure function of the output frame number, making threading issues moot.

Support for both 32-bit and 64-bit filters is also something I want to accommodate, but haven't done yet. This will most definitely require asynchronous operation, and has the additional issue of transferring video frames efficiently between processes.

The current 1.7.0 tree has multithreaded prefetch and filtering working. This has turned out to be the hairiest part of the system by far, as the race conditions are rather challenging to resolve. For instance, when I first implemented it, dragging the frame slider across the timeline caused about 400 frame requests to be queued and made the display panes massively lag. I then added support for aborting requests that were no longer wanted, but that opened a huge can of worms with regard to the frame caching and request queuing mechanisms — what happens if someone requests a frame at the same time you're trying to abort it? I think I have these worked out now, but won't be able to tell for sure until I pound on it on the HT system.

Frame streams and frame handling

VF2 filters can change the length and speed of a stream. This makes several types of filters possible which were not before, such as weave deinterlace and IVTC.

All filters with output pins have an LRU frame cache on their output. This is necessary in order to prevent excess redundant computation when downstream filters request multiple adjacent frames. The size of the cache per filter is currently fixed, although it will have to be dynamic. I don't like the idea of just allocating a huge cache that's sized at 25% or 50% of physical memory and dumping all caches in there; that chews a lot of memory and it should be possible to much more intelligently size the cache. In particular, LRU caches have the nice property that given a preset sequence of frame fetches, increasing the size of the LRU cache will never decrease its performance, i.e. increase its miss rate. Therefore, it should be possible to cheaply estimate the performance of each cache with fewer or more frame buffers, and adjust each accordingly.

Frames are reference counted and can be aliased in format. This means, for example, a bob deinterlace filter can execute with nearly zero overhead simply by aliasing the input frame buffer with a different pitch and height. I had a lot of problems getting this to work until I tracked down a nasty bug with regard to aliasing in the caches, specifically that both filters involved would try to reuse the now-shared frame buffer. In some ways this actually makes filter development easier, because if a filter only rarely changes its output, most of the time it can simply call CopyFrame() to reflect the input frame to its output without having to do any pixel pushing.

Copy-on-write has turned out to be more of a headache than I had planned. In VF1, a filter that only needs to make minor modifications to its input can run as an "in-place" filter, meaning that it works directly on the frame fed by the upstream. However, in VF2 it is possible that the upstream filter's output is going to more than one downstream, in which case writing directly into the frame isn't possible. The problem then becomes deciding whether to evict the frame out of the upstream cache (which could result in the frame having to be regenerated later), or deep copying the frame (which requires extra memory bandwidth). I think this can be decided properly by tracking evict/clone statistics over time, but I haven't gotten that working yet and currently the decision is always to deep copy. This will have to be fixed.

While working on this, I also ran into another unexpected problem: with arbitrary frame reordering and general frame functions, it is no longer possible to tell what the "source" frame should be for a given output frame. The solution I came up with was to add a special filter function that gives the best estimated preview source frame for a particular output frame. For the weave filter, this outputs the frame containing the top field. When the function is omitted, VirtualDub simply calls the Prefetch() function instead and uses the first prefetch request, if any. A similar function could be used for smart rendering as well, although it would have to be more conservative about the output frames for which it returns a valid result.

API friendliness

I firmly believe that a well-designed API leads to fewer errors. To that end, I have a few rules for how filters should have to be written:

A VF2-model filter thus only has two entry points, Create() and Main(). The Main() function is like a window procedure in that it receives all non-creation commands; dispatch to individual functions is handled by the filter. This makes it easier to object-ify filters by making the number of binding points constant and eliminating the pain of VF1's optional function callbacks. Thus, when I talk about Prefetch(), I actually mean a call to Main() with a prefetch message. I plan to ship a base class that handles dispatching to methods (I already use one internally).

Strings are going to be wide-char (Unicode) this time.

All inter-filter communication is bounced off of the host. When you request a frame from upstream, your Prefetch() queues the request, and Run() is called back with the frame. This means you can issue a VF2-style request to a VF1 filter, and not know or care that a wrapper is intervening in between.

I made the mistake last time of providing some callbacks through C++ virtual methods, which turned out to be difficult to call from GCC-compiled code since that compiler uses a different method calling convention than Visual C++. This time, all callbacks are straight __cdecl functions; they are made to appear as virtual methods through inline functions. For similar reasons, none of the functions throw exceptions anymore, including error functions. Instead, they all set an error code/string that is processed on exit from the filter.

Note that while the use of other languages might be feasible, I don't intend to officially support the authoring of filters in a language other than native C++. I have enough trouble just managing one API and I don't need the huge mess that is Microsoft COM.

Reference counting is a concern. It is currently possible to hold onto a frame across calls to Run() by adding a reference to it; the frame will then be kept alive in the upstream cache. The reference counting is handled via emulated virtual methods (virtual virtual methods?) so that VirtualDub can ensure thread safety. The problem is that manual reference counting makes it really easy to create memory leaks that are hard to diagnose, because you can't directly tell which reference was leaked. One idea I had here was to track the aggregate reference count from each filter; VirtualDub wouldn't be able to tell which frame was leaked or over-released, but could tell which filter(s) did it.

Avisynth support

(Where did the "AviSynth" spelling come from? Ben Rudiak-Gould's original versions used the name "Avisynth," with a lowercase S. "AviSynth" looks mangled output from a tool like FXCop.)

I use Avisynth a lot, mainly because it can do field-based splitting operations that VirtualDub can't do... yet. At the same time, I would like to eventually run some of the Avisynth filters directly in VirtualDub.

The big problem here is that Avisynth filters fetch source frames synchronously from their upstream by calling the upstream filter's GetFrame() function; spoofing the IClip interface is simple enough, but I can't allow the filter to block and wait for the upstream frame because that would force the entire upstream to run single-threaded. The solution is to use fibers to turn the Avisynth filter's GetFrame() function into a coroutine — when the Avisynth filter attempts to block, the adapter calls SwitchToFiber() to suspend the filter, issues a prefetch the source frame, and returns back to the video filter system with a special code that tells it to requeue the request. This does require that the video frame requests be serialized — ordinarily source video frames can arrive out of requested order due to caching, thus causing Run() to be called out of order relative to the original requests, but that can't be done here as it would require a reentrant filter and a fiber per running request.

How to abort a frame that is in-progress? Switch to the Avisynth filter's fiber, throw an AVSError exception through the filter's GetFrame() into a try/catch immediately outside, and switch back.

There are some problems with frame management here. Avisynth filters directly refcount VideoFrame objects through InterlockedIncrement() and InterlockedDecrement(), and more importantly, will attempt to delete them directly if the reference count reaches zero. That obviously can't happen given that VirtualDub would be emulating the real Avisynth structures and uses a private heap (statically linked CRT). The solution is to never let the VideoFrame reference count drop to zero; instead, VirtualDub would always hold an emulated reference and use garbage collection to detect when the VideoFrame should be deleted, then dropping a reference on the real frame. This also avoids other differences between VirtualDub's refcounting and Avisynth's refcounting, namely that some of VirtualDub's frames use split refcounting to support both strong and weak references for caching purposes.

Some features of Avisynth filters won't be possible to emulate. Audio filters can't be supported because that's an entirely different system; source filters are likely also problematic. And any filter that attempts to create and manipulate upstream filters, particularly meta-filters like Animate(), definitely won't work. Also, VirtualDub doesn't support interlaced YV12, nor does it require that scanlines be aligned to 16 byte boundaries. Finally, Avisynth video frames only specify a Y pitch and a shared Cb/Cr pitch for memory layout (two pitches), whereas VirtualDub simply has three independent planes (three pitches). I might be convinced to go two-pitch myself, though, as I haven't encountered a circumstance yet where different Cb/Cr pitches are necessary.

(I've also heard rumors of a YV24 format being proposed for straight 4:4:4 planar YCbCr in Avisynth 2.6. It just so happens I already have an internal format called nsVDPixmap::kPixFormat_YUV444_Planar, even in the already-released 1.6.10....)

I did manage to get an Avisynth-to-VF2 filter wrapper working as a proof-of-concept exercise using Donald Graft's Logo filter for Avisynth, but only hackily and with hardcoded clip parameters. It's being pushed to the back-burner for now until I get my own stuff working.

Current status

I'm currently ironing out the kinks on the integration between VF2 and the rendering engine. The ability to change the length and rate of the video stream in the filter graph has messed up a lot of other code that expects the input and output streams to be related; the biggest issue is that now the input and output timelines have to be separated, with some way to switch between them. This will mess up the UI a bit. On the other hand, it makes a lot more interesting video manipulations possible too, like ease-in/out.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.