Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

VSync under Windows, revisited

Something I've been experimenting with a lot lately is field-based display of video on a computer screen. I've written about this before, but to review, regular analog video isn't composed of a series of sequential frames, but alternating half-frames called fields at twice the rate. That is, instead of updating the whole screen at 30Hz, even and odd scanlines update at 60Hz. This is called interlacing, and is an attempt to increase the quality of video by getting some of the benefits of higher resolution and smoother motion.

It's also a gigantic pain.

Frequently, such video is displayed on a computer screen by simply displaying pairs of fields as frames at 30Hz. The most common objectionable result of this is intermittent combing caused by pairing fields that don't match, which can be resolved by deinterlacing. If you're dealing with film-rate material that has been upsampled from 24 fps, this may look fine. However, for material that's actually been recorded at 60 fields/second, the difference can still be significant and the 30Hz output will be considerably less fluid. The right thing to do is to upsample the video from 60 fields/second to 60 frames/second. This isn't easy, nor is there one "right" way to do it, but the result is worth the effort... and it takes a lot of effort, at least on Windows.

Here's why....

I'll ignore all of the discussions about how to upsample field rate to frame rate, such as bob/weave/adaptive, because it's irrelevant for this discussion: just displaying video reliably at 60Hz is tricky under Windows. At that rate, you're at or very near the refresh rate of the monitor, and that makes the timing a lot tighter. At 30Hz, you have approximately two refreshes or 33ms between frames, which gives you a lot of room for jitter -- it doesn't matter if you're a little bit ahead or behind on a frame here or there. At 60Hz, however, you have only 16ms per frame and have to hit every refresh exactly. Miss one, and the result is a fairly noticeable glitch. Now, you might say that 16ms is a lot of time on modern CPUs, until you realize that the default thread scheduling granularity on Windows NT based platforms is 10ms any disk access can easily take more than 100ms. There's not a whole lot of margin for error here. You can solve these problems by doing a timeBeginPeriod(1) and well, not doing disk access in your rendering thread. The third problem isn't so simple.

Vertical sync (VSync) is the third problem, and the one I've had the most issues with. In order to have a smooth 60Hz display, you also need to make sure that updates are synchronized such that they never occur in the middle of the screen, with would cause objectionable tearing and thus jerky motion. The problem is that there doesn't seem to be any good way to do this on Windows. If you're in windowed mode, the NOTEARING flag on DirectDraw blits doesn't do anything, and both overlays in DirectDraw and INTERVAL_ONE in Direct3D block, which leads to problems with the main thread getting blocked and not being able to hit 60Hz with multiple windows (they alternate). I had resorted to polling the beam in VirtualDub to get this working, and in the end it wasn't that reliable -- there were too many ways in which other processing in the main thread would cause the display code to miss the blit window and skip a frame. In 1.7.2, I moved the display code to a separate high-priority thread to try to resolve this. Unfortunately, Direct3D being the stupid non-thread-savvy API that it is, I had to move the display window another thread as well. That introduced more problems because the Win32 UI hates multithreaded window hierarchies, leading to problems such as: DestroyWindow() not being usable, mouse input being blocked, and slow updates in the cropping dialog. In 1.7.3, I'm moving the display code back to fix these issues, which unfortunately makes the timing and blocking problems worse again.

Recently, I started experimenting with Direct3D full screen mode, which is a drastic but effective way to solve the problem. In full-screen mode, you can directly "flip" the screen by exchanging screen buffers instead of copying, which is more reliable because the display adapter usually has support for doing so asynchronously from drawing. After fixing a bunch of bugs like falling back to GDI after a display mode change (a problem when you've just changed the display mode to go full screen), I got it working, and the result is smoooooooth. You have to use the FLIP present mode instead of COPY to get good vsync lock, but after doing so you can present one frame per refresh reliably. Dialog boxes don't work correctly in this mode, which is not a big problem, since the real problem is that... the application doesn't accept input for seconds at a time now, because the message loop is blocked.

I mentioned that using overlays and INTERVAL_ONE with Direct3D in windowed mode had problems with blocking the message pump. Well, the same problem happens in full-screen mode. A message pump is something that every UI program in Windows must implement, and is a section of code where the program asks Windows if any UI events have arrived and processes them. It's a very common way of delivering events in GUI systems and ensures that events are processed in an orderly fashion. The downside is that if the application stops running its message pump for a while, its UI stops responding during that time. What all of mentioned vsync techniques have in common is that they all block until vertical sync, which means that they sit in a polling loop until vertical sync arrives, during which time the message pump doesn't process messages. And since Windows prioritizes delivery of messages based on message type, it's possible that the application gets in a nasty cycle of spending most of its time waiting for vsync, the remaining time processing some internal events, and no time actually processing input. The result is an application that is happily displaying video and not responding to any mouse or keyboard events whatsoever -- which is what happens to VirtualDub since it receives video frames asynchronously from DirectShow.

There is a sneaky way to get around this problem, which is a bit called D3DPRESENT_DONOTWAIT. What this flag does, when passed to IDirect3DSwapChain9::Present(), is tell Direct3D not to block for vertical sync. Instead, it returns immediately with a status code. Upon discovering this problem again, I had constructed a framework that uses WM_TIMER messages to periodically poll until the flip can be queued again, which fixes the problem because WM_TIMER has a lower priority than input messages. The only problem is that this doesn't work, because Present() blocks anyway. I searched around a bit, and people have reported that it works on ATI cards, but NVIDIA and Intel happily block. Furthermore, the rumor is that this behavior may have been deliberately introduced to work around dumb behavior in the Direct3D runtime, which apparently does a polling loop at 100% CPU if you don't specify the DONOTWAIT flag, which is the common case. Sigh.

(And before you say this problem is fixed on Windows Vista with the DWM, I seriously doubt the DWM can render the desktop at 60 fps when I'm taking more than half the fill rate for my shader processing.)

I actually first discovered this behavior while working on another program that reuses VirtualDub's display code, and "fixed" it in that application by splitting the message loop in half, processing input messages at higher priority. That sort of worked fine for that application, although it has the severe disadvantage of still eating up valuable frame time doing unnecessary polling. I can't use it in VirtualDub, however, because modal dialogs use their own message pump and could lock up anyway. I've been searching for a solution to this without success -- one idea I had was to use Direct3D event queries to watch the length of the frame queue, but that doesn't seem to work -- even waiting for the entire pipeline to flush still causes up to a 16ms stall on Present(). If anyone has ideas about how to work around this problem, I'm all ears.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.