Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Donate
Contact info
Forum
 
Other projects
   Altirra

Search

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ You can't just throw /arch:AVX to speed up your program

While searching around for some AVX docs, I happened to find a blog post on Intel's website describing how to optimize an image processing routine. The gist of the article was that you could get big gains just by throwing some VC++ compiler switches such as /arch:SSE2 or /arch:AVX to tell the compiler to use vector instructions. Presto, your code magically gets faster with less than an hour of work and without having to modify the algorithm!

Of course, my next thought was: "Yeah, until QA gives you an A-class bug the next day saying that the code now crashes on an Athlon XP or Core i7."

The documentation for the Visual C++ compiler /arch compiler switch is labeled "Minimum CPU Architecture," but should probably emphasize the ramifications of this switch. If you use this switch, your code will crash on any CPU that doesn't support the required instruction set. Unlike the Intel compiler, which has options to auto-dispatch to different code paths depending on the available instruction set, the VC++ compiler will simply blindly generate code for the target CPU. Therefore, you can also reinterpret the switches as follows:

This is not to say that the /arch switch is bad, as the compiler does actually generate faster code when it can use vector instructions. The problem is that unless you can absolutely guarantee that your EXE or DLL will never run on a CPU lower than the specified tier, you can't use those switches. Okay, so /arch:SSE is probably pretty safe at this point, and you may be able to justify /arch:SSE2. You'd be insane to throw /arch:AVX on your whole app unless you really want to require a Sandy Bridge or Bulldozer CPU (which, as of today, only one of which has shipped).

What about compiling only some of your code that way? You can pull this off if you build multiple DLLs or EXEs and switch them based on the architecture, at the cost of additional deployment and testing hassle. Compiling different modules within the same DLL or EXE with different /arch settings, though, is dangerous. Take this function:

void foo(float x, float y) {
    return std::min(x, y);
}

Do a little #define foo magic and #include this from a few .cpps with different /arch settings, and you can extrude out x87/SSE/SSE2/AVX versions from the same file. There's only one small problem: the call to the std::min() function. std::min is a template and in the VC++ compilation model it is compiled with each .cpp file that instantiates it, meaning that each of the platform modules compiles its own version of the std::min template specialized for x87/SSE/SSE2/AVX. Where this goes wrong is when the linker collapses all of the COMDAT records and discards all but one instantiation of std::min<float>(). You don't know or control which one it picks because they're supposed to be the same. When I tested this locally, it picked the AVX version and the program crashed on my Core i7 laptop. Oops.

What this means is that linking in modules with mixed /arch settings is broken unless you take special care not to use any inline or template functions within the arch-dependent modules, which excludes a substantial portion of the C++ standard library.

In conclusion, enabling enhanced instruction sets isn't something you can just do in an hour even if it's just a drop-down option in your project settings. You need to understand the full ramifications of the change and determine whether it also involves changes to your program's minimum required system specifications or the way you need to organize and build the affected code.

Comments

Comments posted:


Interesting. As for COMDAT folding causing problems with per-file optimisation, that can be solved by turning off by NOICF.

BTW: ARCH behaves same way as in GCC, so at least some people are not confused...

Klimax - 19 08 11 - 19:18


I wonder how good the automatic AVX/SSE optimization is. It surely cannot beat manual optimizations.

tobi - 19 08 11 - 22:33


Just build everything statically - ta-daaaa!


(runs for the hills)

ggn (link) - 20 08 11 - 00:02


I wonder how good the Java jitter is at using SSE and the like? In theory because it's compiling at runtime it can automatically pick the best set of instructions for the user's CPU without having to pre-generate a whole load of different codepaths.

Torkell (link) - 20 08 11 - 04:50


Looks like hand optimized routines binded late based on the results of cpuid is still a better approach all around. A lot easier to unit test too.

gordy - 20 08 11 - 10:38


The SSE option was basically there to replace FPU instructions with SSE code (it even does the tricky uint-float conversion, check it out), other optimizations are negligible. Similar major improvements with the AVX option? Not a chance. JIT > intrinsics > /arch.

Gabest - 22 08 11 - 04:52


Phaeron, please add AVX2 support to VirtualDub.

http://software.intel.com/en-us/blogs/20..

AVX2 - 28 08 11 - 13:52


Believe it or not, in Intel Compiler /arch: also means crash like in GCC and MSVC. Dispatcher is used only if you specify /Qx[n] or /Qax[n] where [n[ are the instruction sets you want to dispatch for.

Btw, I think that we developers should stop catering to lowest common denominator and let everything below SSE2 die in peace.

Igor Levicki (link) - 07 01 12 - 14:10

Comment form


Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.
Name:  
Remember personal info?

Email (Optional):
Your email address is only revealed to the blog owner and is not shown to the public.
URL (Optional):
Comment: /

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.



Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.