¶Intrinsics code generation in VC11 preview compiler
I now have the Visual Studio 11 developer preview installed in Windows 7, which makes stressing the new compiler much easier than with the Windows 8 DP in VirtualBox, which freezes for minutes at a time. The compiler version is fortunately the same: 17.00.40825.2. I happened to have a VC10 converted version of Altirra that built without problems after switching to the v110 toolset; VirtualDub required a VC8-to-VC11 conversion, which required stripping some quotes from the converted psa.props and fixing a runtime library setting mismatch. Both programs ran fine, so no big codegen problems.
A few things I've discovered about the new compiler:
1) SSE2 code generation is now the default.
This is confusing since neither the docs nor the project system UI have been updated, but if you don't specify any compiler switches or have enhanced instruction set usage set to Not Set in your project, the compiler will act as if /arch:SSE2 was set. You need to use /arch:IA32 to disable enhanced instruction set usage. (See MS response to bug 688736.)
2) Commutativity-based optimizations are now applied.
I wrote a while back that the compiler generates intrinsics exactly as you write them, so you can sometimes get extraneous moves unless you swap some parameters around. This appears to be fixed and both fold1() and fold2() generate the shorter output.
3) Intrinsics register allocation has improved.
The VC11 compiler does a better job on the SSE FIR routine example I posted earlier. It no longer generates the MOVSS orgy through temps at the top of the loop and also recognizes that zero is easily regenerated, the result being that it is able to hoist two of the four kernel vectors permanently into registers.
I browsed through the intrinsics list, and unfortunately it doesn't look like there are any new intrinsics in the existing instruction sets (still no min/max or round-to-int), but a least it looks like intrinsics code will generally run a bit faster with VC11.