Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
 
Other projects
   Altirra

Archives

Blog Archive

GCC x86 intrinsics and runtime detection

I decided on a whim to take another shot at building some of my code with MinGW (GCC). It took only 10 minutes until I hit a showstopper.

At first, it seemed to be going pretty well, since I had already done a pass with the Clang static analyzer and had already cleaned up some of the C++ transgressions that VS2010 had allowed through. A sticky point was the definition of CRITICAL_SECTION in the MinGW Win32 headers, since for some reason the MinGW headers define separate _CRITICAL_SECTION and _RTL_CRITICAL_SECTION types instead of typedef'ing one to another like the official headers do. This breaks code that manually forward declares InitializeCriticalSection() so as to avoid bringing windows.h into portable code.

The unexpected showstopper turned out to be x86 intrinsics, specifically this nice gem at the top of MinGW's version of emmintrin.h, which is the standard header for accessing SSE2 intrinsics on x86 compilers:

#ifndef __SSE2__
# error "SSE2 instruction set not enabled"
#else

This blows the compile if SSE2 instructions aren't enabled in the build (-msse2). The problem with enabling that flag is that it apparently also gives the compiler license to use SSE2 instructions in any code, not just code using SSE2 intrinsics. For instance, if you have this simple function in one of your modules:

int foo(float f) {
    return (int)f;
}

...the version of GCC I tried, 4.6.2, produces an SSE2 CVTTSS2SI instruction in code that has no explicit SSE2 usage. This is great if you're trying to build an entire executable that requires SSE2. It's not so great if you are trying to build a module that does dynamic dispatch to multiple paths based on CPU runtime detection. Apparently, the recommendation is to split your source code into multiple files and compile each of them with different settings, which is lame. First, I hit this in a CPU detection routine, so I'd rather not take a one-page function and split it across three files. Second, at least with Visual C++, doing that is a recipe for getting nasty bugs silently introduced into your program. The problem is that inlined functions and template methods can be compiled with different compile settings from different modules, and when the linker merges them it can choose a version that uses instructions not valid on all calling paths. This would be fine if the compiler and linker would work together to segregate the code paths, but as far as I know only Intel C++ does that and only for non-intrinsics based code.

I'd be interested in hearing if people have good solutions for this. Runtime CPU detection is mandatory for any vectorized code I write since I target a pretty wide range of x86 CPUs, and this would require some unacceptable contortions in source code organization. I'd like to raise the portability of my source code even though I don't plan on using anything except Visual C++ in the near future, but issues like this are a bit more than I'd like to take on. Without a solution for issues like this I can only hope that Clang will turn out more reasonable than GCC has historically been.

Comments

This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.