Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

You can't just throw /arch:AVX to speed up your program

While searching around for some AVX docs, I happened to find a blog post on Intel's website describing how to optimize an image processing routine. The gist of the article was that you could get big gains just by throwing some VC++ compiler switches such as /arch:SSE2 or /arch:AVX to tell the compiler to use vector instructions. Presto, your code magically gets faster with less than an hour of work and without having to modify the algorithm!

Of course, my next thought was: "Yeah, until QA gives you an A-class bug the next day saying that the code now crashes on an Athlon XP or Core i7."

The documentation for the Visual C++ compiler /arch compiler switch is labeled "Minimum CPU Architecture," but should probably emphasize the ramifications of this switch. If you use this switch, your code will crash on any CPU that doesn't support the required instruction set. Unlike the Intel compiler, which has options to auto-dispatch to different code paths depending on the available instruction set, the VC++ compiler will simply blindly generate code for the target CPU. Therefore, you can also reinterpret the switches as follows:

This is not to say that the /arch switch is bad, as the compiler does actually generate faster code when it can use vector instructions. The problem is that unless you can absolutely guarantee that your EXE or DLL will never run on a CPU lower than the specified tier, you can't use those switches. Okay, so /arch:SSE is probably pretty safe at this point, and you may be able to justify /arch:SSE2. You'd be insane to throw /arch:AVX on your whole app unless you really want to require a Sandy Bridge or Bulldozer CPU (which, as of today, only one of which has shipped).

What about compiling only some of your code that way? You can pull this off if you build multiple DLLs or EXEs and switch them based on the architecture, at the cost of additional deployment and testing hassle. Compiling different modules within the same DLL or EXE with different /arch settings, though, is dangerous. Take this function:

void foo(float x, float y) {
    return std::min(x, y);

Do a little #define foo magic and #include this from a few .cpps with different /arch settings, and you can extrude out x87/SSE/SSE2/AVX versions from the same file. There's only one small problem: the call to the std::min() function. std::min is a template and in the VC++ compilation model it is compiled with each .cpp file that instantiates it, meaning that each of the platform modules compiles its own version of the std::min template specialized for x87/SSE/SSE2/AVX. Where this goes wrong is when the linker collapses all of the COMDAT records and discards all but one instantiation of std::min<float>(). You don't know or control which one it picks because they're supposed to be the same. When I tested this locally, it picked the AVX version and the program crashed on my Core i7 laptop. Oops.

What this means is that linking in modules with mixed /arch settings is broken unless you take special care not to use any inline or template functions within the arch-dependent modules, which excludes a substantial portion of the C++ standard library.

In conclusion, enabling enhanced instruction sets isn't something you can just do in an hour even if it's just a drop-down option in your project settings. You need to understand the full ramifications of the change and determine whether it also involves changes to your program's minimum required system specifications or the way you need to organize and build the affected code.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.