Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

Scheduling bottlenecks in 3D filter acceleration

As I noted last time, there are reasons why VirtualDub's 3D filter acceleration has problems if a display mode switch is triggered. There are, however, also some performance bottlenecks in the implementation in 1.9.5 that I'm working on resolving. Here's an example:

This is a screenshot from VirtualDub's real-time profiler, showing CPU usage during a video analysis pass using a mix of CPU and GPU filters (warp sharp on GPU + rotate2 on CPU). The main things to notice are the long V-Filter section on the Processor and the idle times on the Filter 3D Accel thread. This is the time during which the video filter system runs. The basic problem here is that the video filter system is single-threaded and all calls into the accelerator are done as blocking calls, synchronizing the threads. The result is that the processing thread is blocked while the readback is occurring (the long operation with Poll and Readback blocks) and then the acceleration thread goes idle while the processing thread is busy doing other tasks. This limits concurrency between the CPU and the GPU.

In my current dev branch, the situation has changed a bit.

The first thing to notice... is the lack of color. That's because I'm currently redoing the profiling architecture to be lighter weight and to capture a performance log instead of just per-second snapshots. I didn't think it would make a lot of difference, but now it definitely seems harder to read without the color coding.

That aside, you can see the the accelerator thread (top thread) is much more busy in this version. There are two reasons for this. The first is that in this build the filter system has the ability to "hand off" a particular invocation of a filter instance. For various reasons the filter system cannot currently be run multithreaded, but the filter instances can -- so what the filter system does is set up a frame, hand it off for asynchronous execution, and then later closes the frame and collects the output once the filter is done. The second reason for the improvement is that the render pipeline can now queue more than one frame request in the filter system. It still isn't possible to allow a single filter instance to process multiple frames in parallel, since the filters have mutable state which generally prohibits this, but this does permit different filters to queue up behind each other, so that the accelerator can work on the warpsharp instance for one frame and then download another frame without intervention from the main processing thread. The result is a modest increase in frame rate for this chain, going from about 17 fps to 20 fps for about a 20% improvement.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.