§ ¶It's good to know I'm not crazy
Going through my weekly backlog of email, I found a crash report on this assembly code:
004e19f3: 0f73d430 psrlq mm4, 30h
004e19f7: 0f7ee0 movd eax, mm4
004e19fa: 0fe504c5d89f5c pmulhw mm0, [eax*8+005c9fd8] <-- FAULT
This is the division approximation code for the temporal smoother filter in VirtualDub. It essentially computes:
result.rgb = color.rgb * div_table[sum >> 48];
The crash was an access violation, indicating a bad pointer. Problem number one is that, the way the code's structured, the table index never exceeds 128. Problem number two:
EAX = 00800000
Extracting the top 16 bits of a 64-bit unsigned quantity gave a value bigger than 0x10000. That's... not possible.
I couldn't figure out how this could happen, so I wrote back the user asking if the crash was reproducible. As it turned out, he'd already diagnosed the problem: bad RAM. My guess is that the OS had done a context switch in the middle of these instructions, giving the opportunity for EAX to be dumped to memory and be corrupted. Sometimes the impossible does actually happen... well, at least when hardware failure is involved.
My experience bringing up custom processors tells me that writing and running memory tests first saves a lot of time hunting down the cause of weird crashes like these. But I guess you can't force your users to run extensive memory tests before using your app.
pshufb - 29 06 09 - 19:42
Seeing this I immediately thought "either overclocked machine or bad RAM". Reminded me of this story about Microsoft's side of anylzing crash dumps: http://blogs.msdn.com/oldnewthing/archiv..
Marcel - 29 06 09 - 23:33
Ghost in the machine, as it were...
@pshufb: maybe not, but it should be; personally, I always carry a memtest86+ bootable CD with me just for that. If that baby doesn't catch a RAM error, then I start unscrewing things.
Mitch 74 (link) - 02 07 09 - 20:18
Hmm, in all my years working with computers, I've only seen bad memory ONCE, and that was with RDRAM (good riddance). No surprise given how hot they ran.
I've had to reseat DIMM sticks a couple of times though.
Rich - 03 07 09 - 01:45
Microsoft considers this a serious enough problem that all versions of Windows from Vista onward now have a memory diagnostic program built-in. Handy, if you don't have memtest86.
I've had bad memory twice. The first time, it was also with RDRAM, but it was a PC1066 stick that started failing at PC1066 speeds but still worked fine at PC800. The second time, it was memory in my primary laptop that had gone bad, and it went unnoticed while I couldn't figure out why programs were randomly crashing on my machine. I only figured it out months later when I had a reproducible crash launching Diablo II and got different MD5 checksums on the same file on different runs. Md5sum is not known for being a buggy app.
I started becoming a bit paranoid after a series of rather annoying and hard to track down hardware failures. I've had a couple of bad motherboards, the first one being a PII-era motherboard that would sporadically hard reset under high hard disk I/O loads, and another being a AMD64-based one that first had its secondary IDE port fail and then started hanging on POST. Based on my own experiences and of those I've talked to, I've come to the conclusion that it's not a good idea to use a motherboard with a chipset made by a company who's core business is GPUs rather than chipsets.
Phaeron - 03 07 09 - 08:31
Actually, MS had a diagnostic kit before vista (see http://oca.microsoft.com/en/windiag.asp
) so they consider it important.
Mads - 03 07 09 - 21:38
And a really big thanks to all those cheesy accountants who justified parity memory out of existence.
IanB - 07 07 09 - 01:18
"Based on my own experiences and of those I've talked to, I've come to the conclusion that it's not a good idea to use a motherboard with a chipset made by a company who's core business is GPUs rather than chipsets."
Are you refering to ATI or NVIDIA or both? And what about when ATI was brought by AMD?
Yuhong Bao - 09 07 09 - 06:19
I've had bad luck with both ATI and NVIDIA based chipsets. Intel chipsets have been solid.
Phaeron - 09 07 09 - 15:03
BTW, this led me to dig out this entry from long time ago about problems with VIA chipsets:
Yuhong Bao - 10 07 09 - 09:00
And, well, ATI (AMD) and NVIDIA are pretty much the only choices for AMD chipsets these days.
Yuhong Bao - 24 07 09 - 14:29