GCC x86 intrinsics and runtime detection

¶GCC x86 intrinsics and runtime detection

I decided on a whim to take another shot at building some of my code with MinGW (GCC). It took only 10 minutes until I hit a showstopper.

At first, it seemed to be going pretty well, since I had already done a pass with the Clang static analyzer and had already cleaned up some of the C++ transgressions that VS2010 had allowed through. A sticky point was the definition of CRITICAL_SECTION in the MinGW Win32 headers, since for some reason the MinGW headers define separate _CRITICAL_SECTION and _RTL_CRITICAL_SECTION types instead of typedef'ing one to another like the official headers do. This breaks code that manually forward declares InitializeCriticalSection() so as to avoid bringing windows.h into portable code.

The unexpected showstopper turned out to be x86 intrinsics, specifically this nice gem at the top of MinGW's version of emmintrin.h, which is the standard header for accessing SSE2 intrinsics on x86 compilers:

#ifndef __SSE2__ # error "SSE2 instruction set not enabled" #else

This blows the compile if SSE2 instructions aren't enabled in the build (-msse2). The problem with enabling that flag is that it apparently also gives the compiler license to use SSE2 instructions in any code, not just code using SSE2 intrinsics. For instance, if you have this simple function in one of your modules:

int foo(float f) { return (int)f; }

...the version of GCC I tried, 4.6.2, produces an SSE2 CVTTSS2SI instruction in code that has no explicit SSE2 usage. This is great if you're trying to build an entire executable that requires SSE2. It's not so great if you are trying to build a module that does dynamic dispatch to multiple paths based on CPU runtime detection. Apparently, the recommendation is to split your source code into multiple files and compile each of them with different settings, which is lame. First, I hit this in a CPU detection routine, so I'd rather not take a one-page function and split it across three files. Second, at least with Visual C++, doing that is a recipe for getting nasty bugs silently introduced into your program. The problem is that inlined functions and template methods can be compiled with different compile settings from different modules, and when the linker merges them it can choose a version that uses instructions not valid on all calling paths. This would be fine if the compiler and linker would work together to segregate the code paths, but as far as I know only Intel C++ does that and only for non-intrinsics based code.

I'd be interested in hearing if people have good solutions for this. Runtime CPU detection is mandatory for any vectorized code I write since I target a pretty wide range of x86 CPUs, and this would require some unacceptable contortions in source code organization. I'd like to raise the portability of my source code even though I don't plan on using anything except Visual C++ in the near future, but issues like this are a bit more than I'd like to take on. Without a solution for issues like this I can only hope that Clang will turn out more reasonable than GCC has historically been.

19 comments | Dec 28, 2011 at 17:14 | default

Current version

Navigation

Archives

¶GCC x86 intrinsics and runtime detection

Comments