Sun also gets it

Just got back from a meeting with Sunacle [slides of my presentation as well as an abbreviated write-up of the talk can be found here], and I have to say, I couldn’t be happier. The near-term future is going to rock.

A bit of history: in the late 1980s and throughout the 1990s, Sun had system performance that was very impressive, despite having microprocessor performance that lagged behind the rest of the crowd. For example, throughout the 1990s the battle for top of the heap, as determined by SPECmarks, was fought by Hewlett-Packard (via the PA-RISC processor) and DEC (via the Alpha processor) … and occasionally IBM would get a POWER system in there.

Didn’t know that HP ever made anything besides printers? Never heard of Digital Equipment Corp before? No worries, the point is that the SPEC suite of benchmarks (now termed “SPEC CPU” to differentiate it from other benchmarks they’ve created since then) is designed to test CPU performance and not system performance: it intentionally tries to disengage the processor from the memory system, the IO system, the network, etc., and so only tests the performance of the CPU.

“Why on earth would that be valuable?” you ask.

Good question. I have no good answer, other than “it makes evaluation simpler and puts things on an evener playing field.”

“Yes, but,” you counter, “how can that be ‘even’ if you can’t run a CPU without the surrounding system? Wouldn’t the numbers be more realistic and thus more valuable if you intentionally include the effects of the surrounding system?”

Yeah, see this is the difference between science and marketing.

So, back to the story. As measured by SPECmarks, the SPARC family of processors (SuperSPARC, UltraSPARC, etc.) all fell way behind the Alpha, the PA-RISC, the POWER, the various MIPS processors, etc. For those of you just joining us, there used to be a lot of general-purpose CPUs to choose from. Then Intel ate everyone’s lunch. Anyway.

The interesting thing is that Sun systems (i.e. the boxes containing the CPU, the memory, the I/O subsystem, and running the OS) had excellent performance, right up there with the rest of the crowd. They could get excellent system performance without needing top-of-the-line processing performance. Think balance; system-level design is a balancing act, and these guys were masters at it. Still are, from what I can tell. Another data point for comparison: IBM’s BlueGene is a supercomputer built of processors that are several generations old — effectively embedded processors with low performance and low power requirements. Though IBM also has its ultra-high-performance POWER line, they don’t use those CPUs in BlueGene. Don’t have to. BlueGene systems are some of the world’s highest performing computers, while simultaneously being some of the most energy-efficient.

It’s all about balance.

So I met with the Sun guys the other day, with a bunch of Oracle people in the room as well. Sun has recently developed some amazing interconnect technology with the potential to transform the way systems are built — again, getting to the heart of the problem, which in today’s systems is the interconnect fabric: interconnect from CPU to DRAM and between CPUs.

Some links to their stuff:

- Proximity Communication (for chip-to-chip communications)

- Silicon Photonics (for system-level communications)

(apologies in advance: the second link requires access to IEEE’s library)

At the meeting I ranted about a handful of important open problems in computer memory systems today, giving the fundamental reasons for those problems — and interconnect is at the heart of most of them. So pretty much all of the problems pointed to limitations that would go away were either of Sun’s technologies to succeed. Woo!

Bottom line: the Sun research guys are attacking precisely the right problems. Expect really cool things out of Sunacle in the next few years.

More on memory systems

Probably should have mentioned this the other day when I was ranting about memory systems:

This is a mini-book we wrote recently … we were asked by the people at Morgan & Claypool to do a follow-up on Memory Systems, something like 50-100 pages instead of 1000. They call these mini-books “synthesis lectures” … neat concept.

Anyway, one of the interesting things is that the last third of the mini-book is our group’s random thoughts on what’s to come in the memory system — current trends, open problems, where we think things will have to go, where we would like to see things go, potential solutions to problems, etc. The “BOMB” architecture we propose is showing up in lots of different forms in lots of different places. Not that we invented it; it’s just a modern take on an old idea … but now a lot of people in the industry seem to have stumbled upon the same idea recently, which is cool. Personally, I think it is the right approach, and the thought that the industry could very well take this direction is exciting.

Here’s the full-on URL if you can’t click the picture above:

http://www.morganclaypool.com/doi/abs/10.2200/S00201ED1V01Y200907CAC007

Intel [finally] gets it

Just got back from a memory-systems workshop that Intel threw last week. They invited a bunch of professor types from around the country to talk to them about memory systems and how Intel might go about improving them.

This is really good news, because for more than 30 years people have been bitching about the memory system, basically saying that memory is slow and is the primary reason that computers don’t go significantly faster. 30 years. But for the past 30 years the community has remained fixated on making processors faster (frankly, it’s a much more entertaining problem to work on, I cant blame them), despite the complaints getting louder and more insistent.

To put things in perspective: if you dropped a 10x faster microprocessor into your system today, you’d be lucky to see a 10% improvement on any significant applications. Ones that are small and consume a small amount of data would fit into the cache and would see a significant speedup (like 2x or 3x but still nowhere near 10x), but the applications that matter, i.e. the ones that people pay large sums of money to run on big iron, those apps would be lucky to see a 10% speedup.

Why? The memory system — it dictates your system’s performance today, and, to a large and increasing degree it dictates your system’s power dissipation as well. This is especially true in enterprise computing, for example, where the memory typically accounts for more than half of the electrical power going into computing resources — more than the CPUs, flash, and disks combined. It is also true in consumer-level systems, where, let’s face it, we haven’t seen a marked improvement in performance for nearly a decade. In the 1990s computer performance was increasing at a ridiculous pace; while we have seen something approaching that in the portable market (cell phones), we haven’t really seen it on the desktop at all. This, by the way, for those of you not in computer architecture, is the “memory wall” … the term indicating the effect memory will ultimately have on system performance.

So people have danced around the problem for years, partly because nobody owned the memory system — we had CPU manufacturers focusing on the processor, and DRAM device manufacturers focusing on the storage technology and the I/O interfaces to their chips, but nobody was taking ownership of the vast space between these two technologies. That vast space is the domain of the memory system — this is where you ask and answer questions like, “So how do I connect that processor to that pile of DRAM devices? One big bus? A lot of little busses? Fast ones? Slow (but less power hungry) ones? Wide ones? Narrow ones? What should be the degree of banking and pipelining? What should be my granularity of access? My queueing mechanism? My scheduling policy? My mapping policy? My device-management policies? My …” etc. etc. etc. you get the point.

That’s pretty much why we wrote the book, by the way, and why every one of our research group’s PhD theses have dealt with the memory system: we think it is an important problem, and we decided to take ownership of solving it.

The good news is that industry is finally waking up. Intel really gets the problem (they hired a bunch of my students, I would hope they get it) … the workshop last week demonstrated they’ve been thinking about it for a while and are committed to finding and implementing a solution. This is enormously good news for the rest of us — a bunch of academics can only go so far, basically pointing and screaming and begging for someone to put out a fire. When you get the attention of the makers of big iron, the fire trucks are finally on their way.

There’s a similar workshop at Sun next week, and a few other companies are working on the problem. It’s a great time to be working in and using computers. I predict really marked improvements in computer performance in the next few years.