The Quest for More Processing Power, Part One: "Is the single core CPU doomed?"
by Johan De Gelas on February 8, 2005 4:00 PM EST- Posted in
- CPUs
CHAPTER 4: The Pentium 4 crash landing
The Prescott failure
The Pentium 4 "Prescott" is, despite its innovative architecture, a failure. Intel expected to scale this Pentium 4 architecture to 5 GHz, and derivatives of this architecture were supposed to come close to 10 GHz. Instead, the Prescott was only able to reach 3.8 GHz after numerous revisions. And even then, the 3.8 GHz is losing up to 115 Watt, and about 35-50% (depending on the source) is lost to leakage power.
The Prescott project failed, but that doesn't mean that the architecture itself was not any good. In fact, the philosophy behind the enhanced Netburst architecture is very innovative and even brilliant. To understand why we state this, let me quickly refresh your memory on the software side of things.
IPC unfriendly software
First, consider that the average code does not allow the CPU to process a lot of instructions in parallel. To give you an idea, we found out that video encoding achieves about 0.6-0.8 instructions per clock cycle (IPC) on modern CPUs. Secondly, note that almost 20% of the instructions are branches, and 50% of them are memory operations. In case of video encoding, you may have less than 10% branches, and about 60% memory operations. Most of the instructions that are not branches or memory operations are additions, or "ADD"s. Some of the memory operations need to make use of the same units that perform the ADD instructions.
You should also know that many algorithms contain calculations, which need the results of a previous one: a dependency. So, you cannot issue the second calculation until the first is done.
Most studies show that realistically, a sophisticated CPU would be able to reach an IPC of a little more than 2, about twice as much as CPUs today.
Up close and personal
Now, take look at the scheme of the Prescott architecture below. Let us see how Prescott solves all the problems mentioned above.
Fig 7. Prescott's architecture.
Click to enlarge.
First of all, you want to make sure that memory operations happen quickly. Therefore, the Prescott doubled the L1 (data only) and L2-cache. It has also two dedicated Address Generation Units, one for stores and one for loads.
Build for 4 GHz and more, accesses to the main RAM are going to be costly in terms of clock pulses (latency), considering that DDR-II 533 runs at a 266 MHz clock. So, Prescott tries to minimize the damage of waiting for cache misses by increasing the big store buffers of Northwood from 24 to 32, and doubling the load request buffers. So, Prescott can have a lot of cache misses simultaneously outstanding . An intelligent hardware prefetcher is another way to avoid slowdowns due to high memory latency.
To battle branch misprediction, the Prescott Branch predictor has been tuned and predicts 10% of the mispredicted branches by Northwood correctly. That results in up to 20% better performance! And of course, the trace cache makes sure that a mispredicted branch does not need to restart the decoding stages. As a result, the misprediction penalty is not 39 stages, but 31 stages. The 8 stages of decoding do not need to happen again because in most cases, the Trace cache has the decoded instruction.
65 Comments
View All Comments
Momental - Wednesday, February 9, 2005 - link
#41, I understood what he meant when he stated that AMD could only be so lucky to have something which was a technological failure, ie: Prescott, sell as well as it has. Even the article clearly summarizes that Prescott in and of itself isn't a piece of junk per se, only that is has no more room for evolution as Intel originally had hoped.#36 wasn't saying that it was a flop sales-wise, quite the contrary. The thing has sold like hotcakes!
I, like many others here, literally got dizzy as I struggled to keep up with all of the technical terminology and mathmetical formulas. My brain is, as of this moment, threatening to strike if I don't get it a better health and retirement plan along with a shorter work week. ;)
Ivo - Wednesday, February 9, 2005 - link
1. About the multiprocessing: Of coarse, there are many (important!) applications, which are more than satisfied with the existing mono-CPU performance. Some other will benefit from dual CPUs. Matrix 2CPU+2GPU combinations could be essential e.g. for stereo-visualization. Probably, desktop machines with enhanced voice/image analytical capabilities could require even more sophisticated CPU Matrices. I suppose, the mono- and multi-CPU solutions will coexist in the near future.2. About the leakage problem: New materials like SOI are part of the solution. Another part are the new techniques. Let us take a lesson from the nature: our blood-transportation system consists of tiny capillaries and much thicker arteries. Maybe it could make sense to combine 65 nm transistors e.g. in the cash memory and 90 nm transistors in the ALU?
Noli - Wednesday, February 9, 2005 - link
"Netburst architecture is very innovative and even genial"genius-like?
If by genial you mean 'having a pleasant or friendly disposition', it sounds weird. It can mean 'conducive to growth' in this context but that's not so intuitive because a) it wasn't and b) at best it was only theoretically genial.
Presumably it's not genial as in 'of or relating to the chin' :)
Agree monolithic was confusing but it was the intel dude who said it - I thought it meant 'large single unit' rather than 'old (as in technology)' as in: increasing processing power by increasing the size and complexity of a single core is now not as efficient as strapping two cores together - a duallithic unit :)
Sorry to be a pedantic twat.
Xentropy - Wednesday, February 9, 2005 - link
Some of the verbage in that final chapter makes me wonder how much better Prescott might have done if Intel had just left out everything 64-bit and developed an entirely different processor for 64-bit. Especially since we won't have a mainstream OS that'll even utilize those instructions for another few months, and it's already been about a year since release, they could have easily gotten away with putting 64-bit off for the next project. It's pretty obvious by now even the 32-bit Prescotts have those 64-bit transistors sitting around. Even if not active, they aren't exactly contributing to the power efficiency of the processor.I think one big reason Intel thinks dual core will be the savior of even the Prescott line is supposedly dual cores running at 3Ghz only require equivalent power draw to a single core at 3.6Ghz and should be just as fast in some situations (multitasking, at least). Dual core at 85% clockspeed will be slower for gaming, though, so dual core Prescott still won't close the gap with AMD for gaming enthusiasts (98% of this site's readership), and may even represent an even further drop in performance per watt. Here's hoping for Pentium-M on the desktop. :>
piroroadkill - Wednesday, February 9, 2005 - link
#36 -- You really didn't read the article and get the point of it. It wasn't a failure from a sales point of view, and this article was not written from a sales point of view, but a technical point of view, and how the Prescott helped in furthering CPU technology.Thus, a failure.
ViRGE - Wednesday, February 9, 2005 - link
Although I think I sank more than I swam, that was a very good and informative article Johan. I just have one request for a future article since I'm guessing the next one is on multi-core tech: will someone at AT run the full AT benchmark suite against a SMP Xeon machine so that we can get a good idea ahead of time what dual-core performance will be like against single core? My understanding is that the Smithfields aren't going to be doing much else new besides putting 2 cores on one die(i.e. no cache sharing or other new tech), so SMP benchmarks should be fairly close to dual-core benchmarks.Griswold - Wednesday, February 9, 2005 - link
Point and case as to why the marketing department is the most important (and powerful) part of any highly successful company. It's not the R&D labs who tell you what works and what comes next, it's the PR team.quidpro - Wednesday, February 9, 2005 - link
Someone needs to make a new Tron movie so I can understand this better.tore - Wednesday, February 9, 2005 - link
Great article, on page 3 you talk about BJT transistor with a base, collector and emitter, since all modern cpu's use mosfets should you talk about a mosfet with a gate, source and drain?Questar - Wednesday, February 9, 2005 - link
"The Pentium 4 "Prescott" is, despite its innovative architecture, a failure."AMD wishes they had a "failure" that sold like Prescott.