Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

Auto-vectorization in the Visual Studio 11 Express preview

Okay, it's actually the Microsoft Visual Studio 11 Express for Windows Developer Preview, but that's a ridiculously long name. I hope they call it something like vs11ew internally.

One thing I didn't expect to see in the VC11 compiler is auto-vectorization:

This attempts to produce vectorized code by analyzing your scalar loops. Now, this isn't going to do miracles -- particularly with poor support in C/C++ for alignment -- and you'll still have to go to intrinsics or assembly for fastest code. However, the advantage of auto-vectorization is that the compiler can still do it when you're lazy -- which is great when you're prototyping, and can help in code you can't afford to focus on. As I've said before, I don't consider intrinsics to be very readable and it's been long since I considered manual register allocation fun, so even though I wouldn't want to have to rely on auto-vectorization I'm still in favor of it.

After doing some testing with the x86 compiler (17.00.40825.2), the first thing I can say is that at least with this early implementation you probably won't be relying on auto-vectorization for video or image processing code. I was not able to get the compiler to vectorize any code processing 8-bit or 16-bit integers. The only types I was able to vectorize with were 32-bit integers, 64-bit integers, floats, and doubles, and that excludes a huge amount of decoding/encoding/filtering code. In order to do this the target CPU needs to support SSE for floats and SSE2 for ints or doubles; however, the developer preview compiler is pretty broken and I was often able to get it to generate SSE or SSE4.1 instructions inappropriately. For now we'll overlook that and just look at the operations that it can vectorize. For ints, I was able to get these operations to vectorize:

64-bit ints don't work very well -- x+y vectorizes while x+1 doesn't. Inversion (~) didn't work, and surprisingly, neither did negation (unary minus), so 0-x runs better than -x. Probably the most disappointing is that neither conditionals nor relationals vectorize, so writing branchless mask based code isn't possible. I couldn't get min/max or masked writes out of it, either.

For floats, more operations are supported:

Unary minus, fmodf(), fabsf(), transcendentals, min/max, and relational ops failed. I got float-to-unsigned casts to vectorize, but the generated code was bad (truncated all numbers above 2^31). The auto-vectorization is thus more powerful with floats, but there are still noticeable holes in operations support.

Another issue with the current auto-vectorization implementation is that it universally emits unaligned loads and stores (movups/movdqu). I tried copying to a local array with forced alignment, but even that wasn't enough to get movaps. That's an easy gain for intrinsics/asm over the auto-vectorizer, unfortunately. It does, however, emit code that is aliasing tolerant: it checks whether the destination and source arrays overlap and branches to either vectorized or unrolled code depending on the result. __restrict wasn't effective in removing the check.

The third problem with the auto-vectorizer is that currently you can't turn it off by itself, only by reducing the global optimization level. This means a significant amount of code bloat with full optimization even if the vectorized code will never run (cases of guaranteed partial overlap). It also makes the developer preview a bit fragile since it means you can't easily escape the code generation bugs in the vectorizer. Hopefully there will be ways to control the auto-vectorizer like the inliner (command line switch + pragmas).

Anyway, it'll be interesting to see how this evolves. After Visual Studio .NET 2002, my general rule is that you should assume everything in a public Visual Studio beta is as it will ship unless it's already known to be changing, enough people complain about it, or it's clearly a showstopper. The level of codegen bugs in this compiler version is a lot higher than usual, though, so I have to assume this is earlier in the development cycle (or else the compiler team is in trouble!).


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.