Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
 
Other projects
   Altirra

Archives

Blog Archive

Does Hyperthreading Technology speed up VirtualDub?

"VirtualDub currently isn't multithreaded and only takes 50% of my hyperthreaded CPU. Would it run twice as fast if it were?"

No, and the premise is incorrect anyway.

VirtualDub is multithreaded -- if it weren't, the UI would lock every time it rendered a frame. Rendering operations use three threads: UI, I/O, and processing. Preview operations also create a fourth thread for timing blits. The reason for low CPU utilization on second and subsequent logical CPUs during a render is that all video operations are serialized in the processing thread; audio operations take place on the I/O thread however so if you have audio filtering or compression those will execute in parallel. To take full advantage of dual CPUs you have to balance operations across multiple threads and make sure you're not just wasting time ping-ponging data between the CPUs. VirtualDub isn't written to do this currently.

But what about Pentium 4 CPUs with Hyperthreading Technology? Everyone has one, so VirtualDub should be tuned for dual CPUs soon, right?

Well, not quite.

In a traditional symmetric multiprocessing (SMP) system, you have two separate CPUs. Each has its own set of caches, execution resources, and buses. As long as you don't have to transfer too much data between the CPUs, and the CPUs don't fight too much for the bus, and you can keep both CPUs busy, then you can get twice as much performance as a single CPU system. This is easiest when the threads are chewing through workloads that require a lot of processing over little data; it's harder in the opposite case, where the CPUs are using memory so heavily that they begin to contend for memory bandwidth.

In a hyperthreaded CPU, however, the situation is different. Here you only have one set of caches, one set of execution resources, and one bus, but you have two logical CPUs contending for those, each running its own different thread. Unlike the SMP case, in an HT CPU one of the threads can take most or all of the resources that the other thread doesn't use, including everything if there is only one thread to run. While Windows Task Manager might report that only 50% of your CPU is active, that could -- and often does -- represent one of the two logical CPUs taking 90%+ of the execution resources. Only 10% more execution power is available for another thread, and beyond that the second thread starts slowing down the first thread as it fights for resources. Hyperthreading is mainly useful for filling in the holes in one thread with another -- that is, while one thread is executing sparsely and not taking advantage of many CPU resources, another thread can sneak in and keep the execution units busy, and get another 5-15% of performance out of the CPU.

A major issue with hyperthreading is that there is a design flaw in the Northwood-core Pentium 4s that can cause the two logical CPUs to seriously interfere with each other, called 64K aliasing. Basically, incomplete tag bit encoding in the L1 cache means that two data blocks that are a multiple of 64K apart can boot each other out of the cache, making the cache's associativity useless and greatly reducing its effectiveness. Even worse, two prefetched streams that alias on top of each other can boot each other's prefetches out of the L1 cache, wasting a lot of bandwidth, and Windows allocates virtual memory on 64K boundaries, making 64K aliasing likely to happen for thread stacks. This means that Hyperthreading Technology can also get you a net loss in performance if threads execute in non-HT-friendly ways, even if they execute well on an SMP system. The 64K aliasing flaw was supposedly fixed in the new Prescott core, but I haven't heard whether it improves hyperthreading performance.

So what is Hyperthreading Technology really good for?

First of all, it improves system response similiarly to the way that SMP does; the kernel can respond faster to interrupts and program UI can react immediately while a processing thread still cranks away. If you have a program that is attempting to consume 100% of the CPU in the background, your web browser will respond much more snappily on an HT system than on a single-CPU system. Second, HT systems expose the same kinds of multithreading bugs as SMP systems. As more programmers get HT-capable systems, expect more threading bugs in programs to be resolved, and the overall stability of programs to rise, especially on SMP systems, where traditionally a lot of drivers and programs have simply crashed.

When it comes to VirtualDub, I've had enough trouble in performance-critical code with execution bandwidth on a single thread. The problem is that the Pentium 4 can only issue MMX operations on one execution port, meaning that it tends to get very badly bottlenecked when executing optimized MMX code. The situation improves considerably when SSE2 is used, because two-cycle, 128-bit operations can be issued in only one clock, and thus it is possible to keep both the multiplier and the add/shift units running in parallel -- but this can be difficult with only eight registers and with the P4's long latencies.

Overall, if you have a choice between running VirtualDub on a system with hyperthreading and on the same system with HT disabled, I would say to leave HT enabled, because if nothing else the system will run more smoothly.

Comments

This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.