Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects



01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004


Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Live audio playback

My latency woe research continues, this time on how to handle playback. I've managed to write a new basic audio renderer and hook it into the DirectShow graph, but the next issue now is how to handle playback. When playing stored audio/video data, you can read the source at variable rate as necessary, so playback timing is simple: minimize the difference between the audio and video timing. In other words, the user can't really tell that you're buffering two full seconds of audio if the video is also delayed by the same amount. Where this does matter is in seek latency, because then the decoder needs to re-fill that delay before you can restart playback, but even then doing soft start or just having a fast decoder can do the trick.

However, live input -- or specifically in this case, interactive live input -- is a tougher problem. In this case the clock already starts ticking the moment the user hits a button, and therefore you're already late by the time you receive audio. This means that minimizing latency in the entire pipeline is absolutely critical. Unfortunately, as I mentioned before, the recording device I'm using sends 40ms packets, so already by the time I receive the first byte I'm already that much behind and I don't really need another 100ms of latency on the output. This means that in the playback code, the problems are:

The strategy I traditionally took for the first one, and the one which the DirectSound Renderer uses, is to slightly resample the input -- play it slightly faster to reduce buffered data, and slightly slower to increase it. The problem is that while this generally works fine for compensating for a difference in rate, it doesn't work so well for adjusting the current latency. For instance, let's say that through a glitch we suddenly have half a second of audio buffered, and thus a lot of extra latency. In order to drain this amount of audio in 30 seconds -- which is quite a long time -- we'd need to raise a 48KHz sampling rate to 48.8KHz. Problem is, in pitch terms that's a quarter of a semitone (log2(48.8 / 48)*12 = 0.29), which is noticeable. As a result I'm now leaning back toward time domain methods, i.e. chopping or duplicating audio segments. This is a crude form of time stretching, and provided that the adjustments are rare, it works better than I had expected.

That leaves the problem of determining what the minimum latency should be. There always needs to be a minimal amount of data in the sound buffer to cover delays in the output path, such as the CPU time to process the system calls and copy data into the hardware buffer, and jitter in thread scheduling. I hate just putting it in as an option and letting the user tune the audio buffer, especially when changes in app configuration and system load change the required latency. Ideally, the application should be able to monitor the audio buffer status and adaptively adjust the buffer level. I tried doing this with a waveOut-based routine, and while it worked pretty well in XP, it gave crackling in Vista. Dumping out a log of buffering stats revealed the problem (timestamps in milliseconds):

Finished 19 at 11948090
Finished 20 at 11948090
Checking at 11948090
Checking at 11948106
Checking at 11948106
Checking at 11948123
Checking at 11948140
Finished 21 at 11948140
Finished 22 at 11948140
Finished 23 at 11948140
Checking at 11948140
Checking at 11948156
Checking at 11948156
Checking at 11948173
Checking at 11948173
Checking at 11948190
Finished 24 at 11948190
Finished 25 at 11948190

In this test, I'm actually delivering 16.7ms buffers (60 frames/second), but the buffers are being marked as done by the OS in batches. More suspiciously, the notifications are occurring every 50ms. Having that amount of latency isn't a dealbreaker, but what is is that it appears that Vista is not marking buffers as completed until after they have already been copied to the hardware buffer and played. If you think about it, this is necessary for an application to wait for a sound to complete playing and avoid cutting it off. However, it also has the annoying side effect of making it impossible to tell if an underrun has occurred by buffer status alone, and it seems that the Vista user-space mixer is more likely to cause problems in this regard than the XP kernel mixer. The waveOut API doesn't have any other way to report underruns or the amount of internal delay, so as far as I can tell, the only way to deal with this is to fudge up the buffering amount. Suck.

You might be wondering why I'm still using waveOut, even though it's quite old. Well, up through Windows XP, the only other alternative is DirectSound. Unlike waveOut, which uses a straightforward streaming API and simply stops audio playback if the application buffer underruns, DirectSound uses a hardware DMA model and has the application write into a looping buffer, even if software mixing is actually used. This has the undesirable behavior that if the application blocks in some operation for too long, the mixer wraps around and keeps playing the same stuttering sound over and over like a broken record. Furthermore, it doesn't report that this has happened, so unless you have extra logic to prevent or detect this, it also screws up your buffering calculations and suddenly your output routine thinks it has a nearly full buffer. (There is a notification API that could help with this, but as usual, no one uses it because it is broken with some drivers.) The DirectSound Renderer handles these issues by queuing buffers to a separate thread that sits on the output buffer and clears or pauses it once an underflow is detected. Expecting an application to deal with all of this in order to just stream some audio is unreasonable, and since most of my audio playback isn't latency sensitive I've just stuck with waveOut.

It looks like I'll have to change my mind for this case, because DirectSound appears to work much better in Vista. I can get playback positions with much better accuracy and precision than with waveOut, and buffer underruns are detected more reliably. In addition, DirectSound reports an additional write cursor to the application, which says how far the application should write ahead of the current playback position (play cursor) to avoid underflowing. The main problem that's left is dealing with the wraparound problem, which I haven't solved yet. I think I can detect it and avoid screwing up buffering calcs by using a big buffer and checking if the system clock has advanced far enough for a wraparound to have occurred, and pre-clear sections of the buffer to reduce the artifacts if the buffer does underflow a little bit. The remaining question is whether I want to use a separate thread to manage the buffer so I can stop playback on a delay. I'd like to avoid the broken record, but I don't know if I can spare the additional latency from queuing the audio to another thread.

Somehow, things seemed simpler when I just had to enable auto-init DMA and handle some interrupts.

Finally, since I'm still mostly XP-based, I haven't bothered to look much at the new Vista API called WASAPI, which is actually what waveOut and DirectSound now map to. Originally, besides the XP issue, I had avoided looking at this much because I didn't need the new functionality. WASAPI appears to actually be easier to use than DirectSound, though, so I might have to look at writing a WASAPI-specific output path.


Comments posted:

Have you tried XAudio2, it works great on XP/Vista/7, and no silly looping buffers to deal with.

Kev - 18 09 10 - 08:35

I've used this code in production for quick speed changing. It works reasonably well, even for continuous speed changing with music.

I've always found WaveOut to be nearly useless, often giving a half-second or more of latency. I only use it as a last-ditch fallback if DirectSound fails, since it almost always works and supporting it is simple enough--and even that's probably a waste of time.

I used to use timer tricks like that for detecting wraparound, eg. trying to figure out if the read cursor of the buffer has wrapped by comparing it to the system clock. I've long since removed it all; there are too many gotchas, like the system clock moving around (even with supposedly monotonic timers). Just use a big buffer, and don't fill it all the way so it doesn't increase latency. I always use a separate mixing thread: as long as it's detached from any blocking I/O, missing a whole wraparound with a 250ms buffer or so is very unlikely.

Glenn Maynard - 18 09 10 - 13:56

What about ASIO? I think most guys who do music/video use a soundcard with ASIO drivers ...

Dstruct - 19 09 10 - 05:03

The ASIO SDK license is unfortunately not compatible with the work that I do, and the API has other undesirable characteristics like not allowing shared audio. I do want low latency, but not at the cost of exclusively controlling the device. I cheered the day that kernel mixing was added to Win9x.

From what I can gather, XAudio2 is a software mixer on top of DirectSound (XP) or WASAPI (Vista/Win7). Latency-wise it's probably pretty good -- it is intended for games -- but I'm thinking that it probably incurs extra CPU overhead due to the mixer, which I don't need or want. The deal-killer is that it requires a separate redist DLL. :(

I've actually had really good luck with waveOut on both XP and Windows 7, where I've done less than 80ms playback latency. It's just on Vista that it seems to have problems, although that might be due to the poor windowed-mode performance in general on that OS (Windows 7 runs MUCH better on this machine).

Phaeron - 19 09 10 - 07:18

Ok, didn't know about the license issue. But AFAIK the device doesn't need to be opened "exclusively". You can open just 2 channels for example (leaving the other channels free for other applications).

There's also PortAudio which has an ASIO implementation:

But I don't know how good it is. In VLC media player it didn't work properly in the last versions I've tried ...

Dstruct - 20 09 10 - 02:18

Vista/7 does have an alternative to ASIO built into the OS, its called WaveRT. Like ASIO, its low level and low latency. DirectSound is no longer hardware accelerated in Vista/7, so testing in XP and Vista/7 is critical if you decide on DirectSound output. What I really want to know is how the hardware vendors do it. My AverMedia HD DVR PCIe card can preview audio/video and capture perfectly fine with its included capture app. Generic DirectShow captures in any other application all suffer the audio timing problems.

Chris - 20 09 10 - 13:30

Opening a subset of channels in exclusive mode is still exclusive usage... and a bit odd. To be clear, I'm not writing pro sound software here, so I don't expect it to run on hardware with an octopus out the back (although my laptop does start to look like that when I put in the Audigy).

AFAIK, WaveRT is only a driver model, and doesn't have a user-space API like ASIO. The main user-space APIs are waveOut, DirectSound, and WASAPI. I hadn't heard of anyone using the kernel streaming API directly, but it makes perfect sense. The downside is that its support status is a bit shaky.

By the way, I just found out that WASAPI doesn't support sample rate conversion, so you have to do your own if your app can't always handle varying mixer formats. So much for a fresh API. :-/

Video capture drivers on Windows are largely garbage. I've bombed development machines more often working on video capture than doing anything else, with video capture drivers from multiple vendors, and by that I mean either complete system lockup, hard reboot, blue screen, or my favorite, app thread stuck in kernel mode in the driver and I have to restart anyway. One device I have was crashing in its property page UI because for some bizarre reason it exposed an audio page from its video pin, which I had to work around in VirtualDub. It doesn't surprise me at all that a capture card might work fine with the included app and break everywhere else.

Phaeron - 20 09 10 - 16:18

Speak of the devil. Avermedia released new drivers for the card today. Gotta see if they are better behaved. Overall, I never had luck with vidcap drivers. My last card, a Matrox Marvel G400TV, never received proper Windows 2000/XP drivers. The most it got was unstable hacked VfW (no DirectShow) beta drivers that only supported YUV/RGB capture. The built in Zoran MJPEG hardware compression was only usable under Win9x.

Chris - 23 09 10 - 00:15

Also looks like AVerMedia released a SDK for their card. Perhaps it might give some hints towards solving the sound problems?

Chris - 23 09 10 - 01:59

BTW: The solution I developed for the overrun/underun problem used DirectX. (Also, I just looked at my website and was horrified to discover that the main page is showing some data corruption. I need to contact my ISP and see what the heck is going on asap)

TropicalCoder (link) - 04 10 10 - 02:23

I just posted a response, but now returning to this site I see it's not there. I must have done something wrong...

I was saying that I ran into a problem like this when I was developing a component for a commercial application for audiologists. There was a device and a BlueTooth headset that needed to communicate with the PC, which provided control and a full duplex voice channel between the audiologist and the patient in the sound proof booth. The problem was independ clocks at both ends meant the devices ran at different speeds, and hence buffer overrun/underrun problems. I developed an elegant algorithm for a solution under DirectX and spent days verifying the correctness of all paths through the code. This algorithm occasionally dropped or doubled tiny buffers of audio as needed. In this application, the effect was inaudible. Since voice data if full of tiny silences, the buffers skipped or doubled were often just silence anyway.

Obviously, Windows audio solutions are simply not up to dealing with these situations, and any Windows audio solution is really a hack. The proper way to do these things is with ASIO, but I can understand the convenience of offering a Windows audio solution to your users who do not have an ASIO device and driver. Ideally, ASIO should be offered as an alternative to your better equipped users. If it is offered as a separate download or plug-in there is no concern about compatibility with the GPL. I have an ASIO engine that may be ideal for this, and the solution could be encapsulated along with my engine as a separate download. I am hereby prepared to offer this to VirtualDub free, with maybe a Creative Commons license or whatever is suitable to prevent commercial exploitation. Unfortunately I cannot open source the code, but I am offering this to you as either a DLL or lib. Just say the word and it's yours. Check it out at

TropicalCoder (link) - 04 10 10 - 04:13

Comment form