Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
Forum
 
Other projects
   Altirra

Search

Calendar

« November 2014
S M T W T F S
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

Archives

01 Dec - 31 Dec 2013
01 Oct - 31 Oct 2013
01 Aug - 31 Aug 2013
01 May - 31 May 2013
01 Mar - 31 Mar 2013
01 Feb - 29 Feb 2013
01 Dec - 31 Dec 2012
01 Nov - 30 Nov 2012
01 Oct - 31 Oct 2012
01 Sep - 30 Sep 2012
01 Aug - 31 Aug 2012
01 June - 30 June 2012
01 May - 31 May 2012
01 Apr - 30 Apr 2012
01 Dec - 31 Dec 2011
01 Nov - 30 Nov 2011
01 Oct - 31 Oct 2011
01 Sep - 30 Sep 2011
01 Aug - 31 Aug 2011
01 Jul - 31 Jul 2011
01 June - 30 June 2011
01 May - 31 May 2011
01 Apr - 30 Apr 2011
01 Mar - 31 Mar 2011
01 Feb - 29 Feb 2011
01 Jan - 31 Jan 2011
01 Dec - 31 Dec 2010
01 Nov - 30 Nov 2010
01 Oct - 31 Oct 2010
01 Sep - 30 Sep 2010
01 Aug - 31 Aug 2010
01 Jul - 31 Jul 2010
01 June - 30 June 2010
01 May - 31 May 2010
01 Apr - 30 Apr 2010
01 Mar - 31 Mar 2010
01 Feb - 29 Feb 2010
01 Jan - 31 Jan 2010
01 Dec - 31 Dec 2009
01 Nov - 30 Nov 2009
01 Oct - 31 Oct 2009
01 Sep - 30 Sep 2009
01 Aug - 31 Aug 2009
01 Jul - 31 Jul 2009
01 June - 30 June 2009
01 May - 31 May 2009
01 Apr - 30 Apr 2009
01 Mar - 31 Mar 2009
01 Feb - 29 Feb 2009
01 Jan - 31 Jan 2009
01 Dec - 31 Dec 2008
01 Nov - 30 Nov 2008
01 Oct - 31 Oct 2008
01 Sep - 30 Sep 2008
01 Aug - 31 Aug 2008
01 Jul - 31 Jul 2008
01 June - 30 June 2008
01 May - 31 May 2008
01 Apr - 30 Apr 2008
01 Mar - 31 Mar 2008
01 Feb - 29 Feb 2008
01 Jan - 31 Jan 2008
01 Dec - 31 Dec 2007
01 Nov - 30 Nov 2007
01 Oct - 31 Oct 2007
01 Sep - 30 Sep 2007
01 Aug - 31 Aug 2007
01 Jul - 31 Jul 2007
01 June - 30 June 2007
01 May - 31 May 2007
01 Apr - 30 Apr 2007
01 Mar - 31 Mar 2007
01 Feb - 29 Feb 2007
01 Jan - 31 Jan 2007
01 Dec - 31 Dec 2006
01 Nov - 30 Nov 2006
01 Oct - 31 Oct 2006
01 Sep - 30 Sep 2006
01 Aug - 31 Aug 2006
01 Jul - 31 Jul 2006
01 June - 30 June 2006
01 May - 31 May 2006
01 Apr - 30 Apr 2006
01 Mar - 31 Mar 2006
01 Feb - 29 Feb 2006
01 Jan - 31 Jan 2006
01 Dec - 31 Dec 2005
01 Nov - 30 Nov 2005
01 Oct - 31 Oct 2005
01 Sep - 30 Sep 2005
01 Aug - 31 Aug 2005
01 Jul - 31 Jul 2005
01 June - 30 June 2005
01 May - 31 May 2005
01 Apr - 30 Apr 2005
01 Mar - 31 Mar 2005
01 Feb - 29 Feb 2005
01 Jan - 31 Jan 2005
01 Dec - 31 Dec 2004
01 Nov - 30 Nov 2004
01 Oct - 31 Oct 2004
01 Sep - 30 Sep 2004
01 Aug - 31 Aug 2004

Stuff

Powered by Pivot  
XML: RSS feed 
XML: Atom feed 

§ Altirra 1.4 released

Since things have come to a bit of a pause, I've pushed out a new release of Altirra, which is available on the Altirra main page. This version contains some major improvements to disk and sound emulation, as well as further tweaks to graphics and DMA timing for higher emulation accuracy.

Now available at that page is also the first main release of the Altirra Technical Reference Manual, which is a document containing everything I've learned about the Atari 8-bit hardware so far. It's also the first time I've tried to write a large document in OpenOffice Writer, with which I've had mixed success. I'd say OO.o Writer is definitely at the point where you can write good-sized documents in it, but not yet at the point where you can do a full book or professional-level PDFs with it. There are just too many restrictions in areas like PDF bookmark handling and outlining to make completely polished output. Nevertheless, the manual hasn't turned out too bad, and I hope it's useful to anyone still working on an 8-bit Atari or is interested in the details of how the hardware works.

A few people have reported problems with graphics output in this version with low-end integrated graphics cards. One reason for this is that, unlike VirtualDub, Altirra defaults to having the Direct3D9 display path enabled in order to gain hardware accelerated display of 8-bit data. Unfortunately, this path can be too demanding for really low-end GPUs. I need to figure out what's going on here because D3D9 is necessary for hardware accelerated display on Windows Vista and Windows 7 and I eventually want to switch the default in VirtualDub on those operating systems. In the meantime, if this happens to you, specify /ddraw or /gdi on the command line to force a lower display mode.

(Read more....)

§ How I ran AMD CodeAnalyst on an Intel CPU

It's been asked how I managed to run AMD CodeAnalyst on an Intel CPU, since the documentation and Wikipedia page says that it requires an AMD CPU. Someone suggested that I might have hacked out the CPUID check.

I actually used the following very sneaky technique:

[CodeAnalyst running on Intel CPU]

CodeAnalyst works fine on an Intel CPU, as long as you use Time-Based Sampling (TBS). It will blue-screen the machine if you use Event-Based Sampling (EBS) or Pipeline Simulation, or at least it used to. Call graph profiling might not work either, but I never use that anyway.

As for why CodeAnalyst works on Intel CPUs, only AMD knows for sure, but there are good reasons for doing so. One reason is that you can analyze runs on a different machine than the one that did the profile; another is that the vast majority of optimization benefits execution on any CPU. A third possible reason is simply that it happens to work and there's no reason to spend time breaking it. In any case, I'm glad that this is the case, because CodeAnalyst is free and easy to install, and even though it's not the fanciest sampling profiler, it works.

(Disclaimer: This is version 2.84. Might not work on some future version.)

(Read more....)

§ Optimizing for the Intel Atom CPU, part 2

There was a lot more feedback than I expected on the Atom post (http://virtualdub.org/blog/pivot/entry.php?id=286), so time for part 2.

First, some more explanation. This routine is one of the scan line renderers in Altirra, so it does indeed execute frequently -- 240 * 60 = 14,400 times per second at peak. At ~2000 cycles per iteration, that's just under 30 million cycles, which is about 1.9% of 1.6GHz. If you think this isn't much, you're right -- except that we're talking about a CPU designed for low-power netbook use. If it were just a question of attaining target performance this wouldn't be an issue at all, because Altirra already runs at full speed on the machine in question. However, when it comes to optimizing for netbooks, or even just laptops, power consumption and heat production come into play. In other words, I hate it when my laptop becomes hot and loud, and I want programs to use less CPU even if there's more available. Reducing CPU usage further allows the CPU to drop to lower performance states more often, and 30 million cycles is more of a deal when the CPU is running at 400MHz instead of 1.6GHz. Also, as I noted, this function was #2 on the sampling profile, so if I needed to reduce CPU usage lower, this function would absolutely be on the optimization list.

The second thing I should point out is that I haven't actually committed to doing any Atom optimization for either Altirra or VirtualDub. Part of the reason I stumbled upon this was that I was just curious as to what was taking up the CPU time, and launching a sampling profile is really easy. The fact that the optimization is so annoying and most likely requires dropping to assembly is enough to give me pause, and I don't have a lot of motivation other than curiosity, since I can always use more powerful machines when I need to. This isn't to say, though, that it's not worth optimization applications in general for Atom. John pointed out in the previous post that Adobe should optimize Flash for Atom, and I would be very surprised if they weren't already heavily looking at this. Inner rendering loops in Mozilla Firefox would be another good target, as web browsing in general is one of the main uses for netbooks. (Don't ask me to do this, though -- I already have enough projects as it is!)

Now, back to our little guinea pig routine.

(Read more....)

§ Optimizing for the Intel Atom CPU

I recently picked up a netbook with the Intel Atom CPU in it, and was pleasantly surprised by its performance. The Atom CPU is no rocket, but it does run at 1.6GHz and it wasn't too long ago that the fastest desktop CPUs were still well below 1GHz. Yeah, it's in-order... but so was the Pentium 120 that I had when I started writing VirtualDub, so big deal. Unsurprisingly, the old MPEG-1 files I used to test with still played just fine.

Now, I was a little bit more worried about Altirra, because its system requirements are higher and it has a strict real-time requirement. I was relieved to find out that it runs in real time on the Atom at around 20% of the CPU, but what was surprising was that one particular loop in the video subsystem was taking a tremendous amount of CPU time:

for(int i=0; i<w4; ++i) {
dst[0] = dst[1] = colorTable[priTable[src[0]]];
dst[2] = dst[3] = colorTable[priTable[src[1]]];
dst[4] = dst[5] = colorTable[priTable[src[2]]];
dst[6] = dst[7] = colorTable[priTable[src[3]]];
src += 4;
dst += 8;
}

What this loop does is translate from raw playfield and sprite data into 8-bit pixels, first going through a priority table and then a color table. The highest dot clock on the Atari is 7MHz (one-half color clock per pixel), but this handles the low-resolution modes which can only output at 3.5MHz, so each pixel is doubled up. This routine wasn't showing up hot on the systems I had tried previously, but on the Atom-based system it was #2 on the CodeAnalyst profile, right below the CPU core.

I hadn't done any Atom optimization before, so I dug around the usual sites for information. Everyone knows the Atom is an in-order core, so lots of branching and cache misses are bad news. However, the loop above is fairly well behaved because the priority table is small (256 bytes) and the color table is even smaller (23 bytes). Looking through the Intel optimization guide, however, this caught my eye:

12.3.2.2 Address Generation

The hardware optimizes the general case of instruction ready to execute must have data ready, and address generation precedes data being ready. If address generation encounters a dependency that needs data from another instruction, this dependency in address generation will incur a delay of 3 cycles.

This has dire consequences for any routine that does heavy table lookups. Address generation interlock (AGI) stalls are a consequence of CPU pipelining setups where address generation is performed by a separate stage ahead of the main execution stage; the benefit is that address generation can overlap execution instead of extending instruction time, but the downside is that a stall has to occur if the data isn't ready in time. In IA-32, this first became a problem in the 80486, where a one-clock stall occurred if you indexed using the result of the previous instruction. AGI stalls then became slightly more serious with the Pentium, where you then had to ensure that an instruction pair didn't generate an address from the result of the previous pair, usually by putting another pair of instructions between. The Atom has a much larger window of 3 cycles to cover, which is a lot harder when you only have eight GPRs.

But it gets worse.

(Read more....)