¶Why the Visual Studio debugger occasionally locks up the entire Windows GUI
A few days ago, I finally solved a mystery that had been annoying the heck out of me for years.
Ever since I moved to Windows XP, I had been seeing a weird problem where occasionally, when a program that I had been working on had crashed with an access violation, the Visual C++ 6.0 debugger would stop responding after I dismissed the exception dialog. Soon thereafter, nearly everything else would also lock up, except for the Alt+Tab popup and console windows (particularly command prompts). The GUI programs weren't completely dead, but they ran really slowly, to the point that I could wait over ten minutes just for Visual Studio to redraw. CPU load was not the problem, or else the laptop fans would have gone on. Killing the debuggee process didn't work; the command would go through, but nothing would happen. I only knew three solutions to the problem: spamming Shift+F5 into the debugger (took way too long), killing the debugger process using TASKKILL /F /PID (lost work), and logging off (took too long and lost work). Very frustrating.
At times I thought that DirectShow or Spy++ were to blame, since the problem seemed to occur more often when those were involved... but I couldn't nail anything down. I also thought that it was an issue with Visual C++ 6.0, since it seemed to happen less frequently with Visual Studio .NET 2003, but I had it happen on that version too. I even dragged out the kernel debugger at one point and hard broke into the system when it happened, but couldn't see anything out of the ordinary. So, basically, it was one of those seldomly occurring but intensely annoying bugs that I couldn't resolve.
Then... it happened with Visual Studio 2005. Target process, then debugger, and finally the whole system frozen. What was unusual this time was that the app that broke was HTML Help, since that's what I have VS launch when I compile VirtualDub's help file... and it hadn't crashed! By chance I thought attach NTSD to devenv.exe (ntsd -p -pv <pid>), which worked since NTSD is a console-mode app... and after dumping the thread stacks and running Sysinternals Process Explorer veeeerrrryyy sloooowwly I finally figured out what had been pissing me off all this time.
In short, the problem is caused by shared mutexes in Windows system DLLs.
The type of hang-up I ran into consistently turned out to be caused by that crappy Text Services Framework that comes with Office and Windows XP. It maintains a bunch of per-user, interprocess mutex objects with names like CTF.LBES.MutexDefaultS-1-5-21-790525478-1715562821-839522115-1003 to arbitrate access to shared memory structures. What happens is that one thread in the debuggee process happens to grab some of these mutex objects to draw text, and in the meantime, another thread hits an exception. All threads in the debuggee process, including the one holding the mutexes, are then suspended by the debugger. The debugger then decides to draw some text in its editor, and it hangs trying to get the mutexes... and other processes try to draw text, and they hang too. Except for the command prompts, which are handled by good old csrss.exe and apparently either don't use the same mutexes or the same framework. And all the rest of the processes just sit tight until the timeout on the mutex wait expires. Kill the debugger, and the problem goes away because that unblocks the debuggee, and when a thread is killed the NT kernel makes sure any mutexes it held are released.
Ugh.
The second hang-up was a bit more esoteric. HTML Help, the process I was debugging, needed to check the user Internet Zones permissions information before loading up the initial page in the help file it was viewing. To do this, it grabbed a mutex protecting the permissions data, which is held in a shared memory window. While this was happening, though, some DLLs were loaded into the hh.exe process -- possibly from another thread -- for which Visual Studio 2005 didn't have symbols. So it decided to contact the Microsoft public symbol server -- and instantly blocked on the same mutex trying to set up the HTTP query.
Working around the first one is easy: Disable Text Services Framework in Regional and Language Settings. You probably don't need it, and CTFMON.EXE is not particularly known for contributing to system stability anyway. The second one can be worked around by unchecking the Microsoft symbol server in VS2005's symbol server options after the DLL symbols have been downloaded; in that case, it will still check the local symbol cache, just not download new PDBs. This is a good idea anyway as the symbol server support has a habit of repeatedly trying to download symbols for DLLs that don't have public symbols or aren't even made by Microsoft.
Unfortunately, I don't see a good way to truly fix this problem; launching the application as a different user than the debugger should work since the mutexes involved seem to be user-specific so far, but I don't know of a good way to do that in Visual Studio. It'd be nice if programmers would stop using shared mutable memory like this, as it punches holes in the protected memory system with regard to isolating crashes, but somehow I think that otherwise they'd just add a service instead, which would be even worse. There are enough background tasks running on the average Windows system as it is.