¶Brute force works a lot better than it used to
I've been playing with pitch shifting again. It's not that I've gotten bored with video, but I do like to write little DSP kernels to affect music that I listen to while coding. I could do the same with video, but I don't think I could get any work done with video playing on the side. Especially if I'm reading subtitles.
As I've said before, pitch shifters (err, pitch scalers) can work either in frequency domain or time domain. Well, after I got the frequency-domain shifter working, I got a better idea for doing a time-domain shifter. The main problem in creating a time-domain shifter is the cross-correlation step, where you find a well-matching segment within a window to overlap with the end of the last segment you used. One way to do this is to try to do some sort of hierarchical approximation to zero in on a good match. Since I had just written a radix-4 FFT/IFFT, I got the idea instead of using FFT convolution to efficiently generate cross-correlation results for all positions without approximation. FFT convolution allows you to compute all possible circular convolutions of one signal with another, and by sufficiently zero-padding the two samples and reversing one of them -- or equivalently, using the conjugate of its frequency representation -- you can do a full cross-correlation search in O(N log N) time, and then scan the resultant array for the best match. And it sort of worked.
I was still hearing some artifacts that made the convolution results suspect, so I decided to double-check the results using a brute force routine. This is trying to search for a match for a block of 512 paired stereo samples in a window of 1024 samples, every 2048 samples, at 44KHz. Amusingly enough, it still ran in real-time... using regular x87 scalar floating-point code, in Debug with optimizations disabled, and with the laptop CPU in low-speed mode. So I just listened to some music for a while, confirmed the difference, dumped the results, fixed the minor bug that was causing the problem, and was back to speedy FFT goodness.
The wonderful thing about audio signal processing is that, even at 44KHz, the data rate is so low compared to video that you can get away with writing a truly awful and unoptimized version of an algorithm, and it still has a fairly decent chance of running full-speed on a modern CPU. A few years ago I might have had to wait for a WAV file to be written out, but now CPUs are so much faster that I can just launch the reference algorithm and compare. Cool.