¶Problem with SSE4.1 support
I have the day off from work today, so I decided to sit down and try some SSE4.1 experiments... only to discover the following:
- Intel VTune 6.1 doesn't work at all on this laptop. (Yeah, it's ancient, but it's reliable, fast, and worked great with a Pentium M, SSE2, and VC8.)
- AMD CodeAnalyst 2.76 works in timer mode, but can't disassemble past SSE4.1 instructions.
- VC8 (VS2005 SP1) can't assemble SSE4.1 instructions in inline assembly.
- MASM 8 can't assemble SSSE3 or SSE4.1 instructions.
- The VS2005 SP1 toolchain can't disassemble SSSE3 or SSE4.1 instructions.
- VC9 (VS2008) handles SSE4.1... but I only have the Express version, which doesn't have MASM, and I can't switch to VS2008 anyway for VirtualDub 1.8.x due to system requirements issues.
- Agner Fog's pentopt tome hasn't been updated for Penryn, and I can already tell that a number of SSE2 instructions are quite a bit faster than on the original Core 2 Duo.
- Intel's x86 Optimization Guide has been updated for Penryn, but the formatting is almost unusable and they don't include µop breakdowns for memory ops -- which means that I can't tell, for instance, whether LDDQU and MOVDQU are improved.
Sigh.
It's not a dealbreaker, since using MASM macros isn't bad and I'm used to RDTSC profiling, but sheesh... talk about missing prerequisites.