Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
by Anand Lal Shimpi on December 30, 2005 11:36 AM EST- Posted in
- CPUs
Literally Dual Core
One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it. By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package).
In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors. Johan's description of the utility follows:
Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield. We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results.
We're not sure why, but core-to-core communication is faster on Presler than on Smithfield. That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.
One of the major changes with Presler is that unlike Smithfield, the two cores are not a part of the same piece of silicon. Instead, you actually have a single chip with two separate die on it. By splitting the die in two, Intel can reduce total failure rates and even be far more flexible with their manufacturing (since one Presler chip is nothing more than two Cedar Mill cores on a single package).
The chip at the bottom of the image is Presler; note the two individual cores.
In order to find out if there was an appreciable increase in core-to-core communication latency, we used a tool called Cache2Cache, which Johan first used in his series on multi-core processors. Johan's description of the utility follows:
"Michael S. started this extremely interesting thread at the Ace's hardware Technical forum. The result was a little program coded by Michael S. himself, which could measure the latency of cache-to-cache data transfer between two cores or CPUs. In his own words: "it is a tool for comparison of the relative merits of different dual-cores."Armed with Cache2Cache, we looked at the added latency seen by Presler over Smithfield:
"Cache2Cache measures the propagation time from a store by one processor to a load by the other processor. The results that we publish are approximately twice the propagation time. For those interested, the source code is available here."
Cache2Cache Latency in ns (Lower is Better) | |
AMD Athlon 64 X2 4800+ | 101 |
Intel Smithfield 2.8GHz | 253.1 |
Intel Presler 2.8GHz | 244.2 |
Not only did we not find an increase in latency between the two cores on Presler, communication actually occurs faster than on Smithfield. We made sure that it had nothing to do with the faster FSB by clocking the chip at 2.8GHz with an 800MHz FSB and repeated the tests only to find consistent results.
We're not sure why, but core-to-core communication is faster on Presler than on Smithfield. That being said, a difference of less than 9ns just isn't going to be noticeable in the real world - given that we've already seen that the Athlon 64 X2's 100ns latency doesn't really help it scale better when going from one to two cores.
84 Comments
View All Comments
yacoub - Tuesday, January 3, 2006 - link
Yet no mention of the Max, where the 4800+ utterly trounces the two Intel chips. Does Max not matter (in which case why bother listing it), or does it matter but you just neglected to mention that (whether on purpose or by accident)?
jjunk - Tuesday, January 3, 2006 - link
It's right there in the chart. As for further discussion not really necessary. Screaming frame rates might look good on the chart but they don't help game play. A 10 fps min will definately be noticiable.
IntelUser2000 - Sunday, January 1, 2006 - link
I don't like that paragraph. It makes it sound like 65nm will be all that makes Presler in power consumption. It will also make people judge 65nm based on Presler, since that's the first CPU on the 65nm.
In fact its not that simple. Taking a CPU that's on a certain process like the Smithfield and putting on a smaller process won't mean instant 40-50% decrease in power consumption. That's called the dumb shrink. The reason Northwood had significantly lower power than Willamette was because Northwood was optimized to lower power consumption.
A CPU that runs well at 130nm may do bad at 90nm and even worse at 65nm for example. Presler was said to be not Intel's main focus and Intel moved their design teams to Conroe, so people who's supposed to be optimizing Presler for 65nm all went away and Presler was just done a dumb shrink.
Sleep transistor was an optional feature on 65nm, not required. So Presler may not have it. '
IntelUser2000 - Monday, January 2, 2006 - link
Why use DDR2-667 with 5-5-5-15 timings?? Most DDR2-667 can do 4-4-4-8(around there). This is gonna skew the results in AMD's favor as DDR400 used is the lowest latency possible.In reality nobody is gonna use DDR400 at 2-2-2-7 lateny or DDR2-667 at 4-4-4-8 latency. Nobody I have ever heard in outside internet uses the RAM at those timings.
Anandtech should either benchmark them all at JEDEC timings or use them all with low latency. I understand they want to be sure the new test system to work properly, but using low latency RAM for the comparison system is just not fair.
JEDEC timings for DDR400 is 3-3-3-8. Where are your DDR400 advantage over DDR2 now??
hans007 - Sunday, January 1, 2006 - link
i think that the 9xx series is a big improvement over the 8xx.i have an 8xx myself the 820 which is the lowest power. the leakage is exponential so the 955 is going to draw a much highe ramount than say a 920 will.
i bet the 920 will be a half decent cpu drawing maybe only 70 watts. which isnt TOO terrible in the grand scheme of power. the 920 would only run at 2.8 ghz and have not as high leakage percentage so i think it will be the one to get.
true intel is not better yet, but they are getting there. and their dual cores still cost less.
i also think that intel should be commended for writing the smp code for q4. that is the doom3 engine which will go into a LOT of games. and since it speeds up the amd chips as well, it is a free upgrade for everyone. sure it makes up for a large deficiency in the intel chips, but it is FREE.
and it makes the really cheap 920/820 chips very price competitive. as the 820 chips are very very cheap about $150 on ebay (which is probably near what oems get them for in bulk, this the rampant dell 820 deals going on)
jjmcwill - Saturday, December 31, 2005 - link
I do professional software development for a living, using Visual Studio 2003 to build the code for a product I work on. We have over 1000 .cpp files and over 1500 header files.On my work box: An HP xw6200 workstation with a single 3.0GHz Xeon CPU, 2MB L2 cache, 1G RAM, compilation takes 10:45 for a single project in our solution. On my home system: Socket 754 Athlon 64 3000+, 1.5G RAM, compilation takes 7:30. Both systems build the code off of the exact same, external ide hard drive in a Firewire enclosure. I use it to carry all my work back and forth between work and home.
At some point we'll be investigating Make to launch parallel compiles, and I would be VERY interested in seeing dual-core CPU comparisons which include compilation benchmarks, using Visual Studio 2003 under Windows, using Make -j2 or Make -j3 under windows, and using gcc/make under Linux.
Based on what I've seen with the Xeon, I'm leaning toward an AMD X2 or dual core Opteron for my next upgrade.
Thanks.
Calin - Tuesday, January 3, 2006 - link
I think that an Extreme Edition CPU (while much more expensive) would give better results with hyperthreading enabled than a simple Pentium D and maybe even than an Athlon64 X2 while doing several threads of compile.Brian23 - Saturday, December 31, 2005 - link
The second valuable post in this thread.I own a X2 3800 and I'm pleased with the results anand posted. I won't need to upgrade for a while.
I'm looking forward to AMD implementing something similar to Sun's design: multiple threads running simultaneously. It shouldn't be that hard to do. It's just adding GPRs and a little logic that controls the thread contexts.
Missing Ghost - Saturday, December 31, 2005 - link
Some other web sites report that the cpu becomes too hot with the stock heatsink.Gary Key - Saturday, December 31, 2005 - link
The initial press release kits that contained the Intel D975XBX motherboard had an issue that created higher than normal idle/load temperatures. We have new boards on the way from Intel. I can promise you that the first results shown in other 955EE reviews do not occur on the 975x boards from Gigabyte and Asus, nor will it occur on the production release Intel D975XBX. I highly recommend a different air cooling system than the stock heatsink but most of the reported results at this time are incorrect.