Current version

v1.10.4 (stable)


Main page
Archived news
Plugin SDK
Knowledge base
Contact info
Other projects


Blog Archive

Minor breaking change in MASM 9

I downloaded and installed Visual Studio 2008 Beta 2 "Orcas" a couple of days ago, and as expected, there isn't a whole lot new for C++ programmers. In fact, it looks the same as VS2005. The main new feature is file-level multiprocessor builds instead of project-level, which I didn't try out (VPC running on single core CPU), and oh, you can't build executables for any Win9x platforms. I don't know if I'd recommend upgrading, but on the other hand, less is likely to break compared to earlier upgrades.

VirtualDub 1.7.2 mostly compiles without issues on VS2008b2 after converting the solution to VS9 format. There was one line that broke. Pointer sizing was tightened slightly in MASM 9, and if you happen to have an explicit qword size on a memory operand in an MMX unpack low instruction:

punpcklbw mm0, qword ptr [eax] will now fail to assemble, whereas this was fine in MASM 8 (FDBK294468). The solution is simply to use dword ptr instead, which assembles fine on both MASM 8 and MASM 9. The inline assembler in the x86 compiler still accepts either.

Strictly speaking, the new behavior is correct -- the MMX forms of punpcklbw/wd/dq are unusual in that they take a 32-bit memory argument like MOVD. I think some Intel publications occasionally got this wrong and said these instructions took m64 instead of m32, although the current manuals are right. I first saw this difference called out in an AMD optimization guide, and it's significant with regard to misalignment penalties and page faults. You can thus safely pick up a misaligned quadword as follows:

movd mm0, dword ptr [eax]
punpckldq mm0, dword ptr [eax+4]

Note that this isn't necessarily faster than an unaligned read, because modern x86 CPUs generally only impose an alignment penalty if you cross an L1 cache line. Also, the MMX high unpack versions (punpckhbw/wd/dq) and the SSE2 integer unpack instructions still do full 64-bit and 128-bit reads, respectively.

Incidentally, the new assembler also appears to accept the new SSE4.1 opcodes, such as PMOVZXBW, if you're so inclined.


This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.