Socket-AM2 Performance Preview
Without major architectural changes to the new AM2 CPUs, we wanted a quick and easy way to showcase the performance differences between AM2 and Socket-939. What we've got is a massive table below with all of our usual CPU benchmarks and their results for the same CPU in both Socket-939 and AM2 varieties, and the performance benefit offered by AM2:
Benchmark | Socket-939 (DDR-400) | Socket-AM2 (DDR2-800) | % Advantage (Socket-AM2) |
PC WorldBench 5 | 115 | 115 | 0% |
Business Winstone 2004 | 23.3 | 23.2 | -0.4% |
Multimedia Winstone 2004 | 38.4 | 38.9 | 1.3% |
SYSMark 2004 | 220 | 224 | 1.8% |
ICC SYSMark 2004 | 282 | 286 | 1.4% |
OP SYSMark 2004 | 171 | 175 | 2.3% |
3dsmax 7 | 2.38 | 2.38 | 0% |
Adobe Premier Pro 1.5 (Export w/ Adobe Media Encoder) | 130 s | 128 s | 1.5% |
Adobe Photoshop CS2 | 210.6 s | 210.3 s | 0.1% |
DivX 6.1 | 11.6 fps | 12.0 fps | 3.4% |
WME9 | 35.2 fps | 35.6 fps | 1.1% |
Quicktime 7.0.4 (H.264) | 3.63 min | 3.63 min | 0% |
iTunes 6.0.1.4 (MP3) | 43 s | 43 s | 0% |
Quake 4 - 10x7 (SMP) | 111.3 fps | 117.4 fps | 5.5% |
Call of Duty 2 - 10x7 | 59.3 fps | 60.1 fps | 1.3% |
F.E.A.R. - 10x7 | 92 fps | 94 fps | 2.1% |
Multitasking Test (LAME + WME + Anti Virus + Zip) | 216.3 s | 213.4 s | 1.4% |
ScienceMark 2.0 (Bandwidth) | 5007 MB/s | 6805 MB/s | 36% |
ScienceMark 2.0 (Latency 512-byte stride) | 53.83 ns | 49.77 ns | 7.5% |
We'll start at the bottom of the table and go up from there. Rev F processors feature a 128-bit DDR2-800 memory controller, which works out to offer a peak theoretical bandwidth to/from memory of 12.8GB/s. As you can expect, that's twice the bandwidth of Rev E CPUs' 128-bit DDR-400 controller at 6.4GB/s. Thus to see a 36% increase in memory bandwidth according to ScienceMark is to be expected, albeit a bit on the low side. The old DDR-400 memory controller is able to deliver 5GB/s out of a maximum of 6.4GB/s, but now we're only seeing 6.8GB/s out of a maximum of 12.8GB/s with AM2. This however is a huge step for AMD, as it is the first spin of the Rev F silicon that we've been able to see such a significant advantage in theoretical memory bandwidth over previous DDR-400 cores.
What's even more important than the increase in memory bandwidth is that access latency has been reduced by 7.5% over the DDR-400 memory controller in the Rev E cores. Lower latency and more bandwidth means that, at bare minimum, performance won't go down. At least, not perceptibly: .4% slower in one test that has a 1-2% variability is nothing to worry about.
It also doesn't guaranee that performance will go up, as you can see from the results above. If we only count the overall SYSMark score and leave out the synthetic tests, the real world performance advantage averages out to a little under 1.3%. There are some special cases such as Quake 4 and DivX were performance goes up fairly reasonably, which can be expected since both of those tasks are fairly bandwidth intensive and make good use of both cores. However similar benchmarks, such as F.E.A.R. and Windows Media Encoder 9 show lower improvements, so it is very dependent on the specific application and workload.
It's important to note that until recently, AM2 samples were not able to produce scores even on par with Socket-939, so the fact that we're seeing a performance increase at all is a major step from where we were just a couple of months ago. The real question is, is this all we get?
107 Comments
View All Comments
mino - Tuesday, April 11, 2006 - link
1) 3-cycle L1 on K7/K8 is the fastest required, it goes from the internal structure if the scheduler and the pipeline that 2-cycle chache would do almost no good. Also they would have to reduce L1 size to 32k+32k which would hurt. It simply does not make sense to change L1 at all, maybe on K8L but IMHO 128k+128k would help much more than 2-cycle latency.2) 17-cycle L2 is PRETTY GOOD for 1M L2 with exclusive structure!!! IMHO it is possible to do 16-cycle, maybe 15, but nowhere near Dothan's 10-cycle. Also remember lower-latency L2 has scaling problems (that's why intel made prescott's L2 slower than NW's)
3) Concerning the memory subsystem(caches + memory) (on single-socket K8/K8L) the biggest issue is the robustness(amount of on the fly acceses to memory) and latency of the memory controller. To solve this is not trivial thing. IMHO to add 2-4M L3 with random access ~50 cycles would do.
4) In the >4 sockets front all they need is effective caching of MOESI snoops.
You are also forgot K7/K8 is mostly KISS architecture. It is just wery well balanced so has good performance in the end. However do one wrong change and you are screwed.
KISS == Keep It Simple Silly
About "weak" SIMD implementation on AMD, don't fool yourselves guys. Only x86 architecture faster than K8 on SSE/SSE2 is Netburst aka SIMD-by-intel.
About conroe, ita has twice as wide ALU's and FPU's than PIII/K7/K8, this means it has huge resources at disposal to calculate SIMD.
Same goes for K8L 2 quarters later. That said K7/K8 core has far more FP power than P6 architecture. On FP Conroe and K8 are about aquall.
but K8L will wipe the floor with K8 and Conroe on FP. Conroe will wipe K8 on INT and be still faster than K8L by decent margin.
Overall we are for another PIII vs. K7 battle with single very important change - AMD has a platform it had not back in the K7 vs. PIII days.
fitten - Thursday, April 13, 2006 - link
I find the K8L a somewhat odd strategy. I guess they are targeting the Itanium market because Opterons already have a good part of the HPC market. Given that the HPC people are the ones that really care about FPU performance and that they are still a fairly small market segment, it seems an odd target. Integer performance rules the roost for servers... web, database, and just about everything else you can think of other than number crunching simulations and the like. Desktop uses for FPU are a few like games and some mathmatical stuff. Intel is focusing on integer performance at least as much as FPU with Conroe (Conroe gets a good dose of both), which makes sense to me since so much of the work done on computers, both desktops and servers, is dominated by integer operations. K8L speculation says only FPU horsepower will be added... just doesn't seem like a sound decision to me.Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Zoomer - Monday, April 10, 2006 - link
Hey anand, could you take out 1 of the two modules and do a quick test on that?With doubled (in theory) bandwidth with ddr2, wouldn't the dual channel mem controller be even more redundant? Perhaps we'll see a new 754-ish socket? :)
Furen - Monday, April 10, 2006 - link
I dont believe we will. Even S1 will be dual-channel, and this is what would have benefited the most from being single-channel (since the pincount would be much lower the package could be much smaller).BaronMatrix - Monday, April 10, 2006 - link
Looking at the intensive timing and bus speed tweaks USING the SAME RAM as the latest XE955 article I would have expected the same kind of thing here. Anand doesn't look at lower speed lower latency for whatever chip he used. That RAM will do 3-2-2 at 667. Obviously AMD is more sensitive to latency.ChristTheGreat - Monday, April 10, 2006 - link
AMD is sensitive to latencies, cause of the memory controller. I'm sure that 3-2-2-9 DDR2 from OCZ, would give much more performance on AMD.Again, this is only a CPU that they use to test, so it's not the true CPU. They wouldn't give us the performance it gives before it's launch. That's like killing yourself right now if the performance is poor....
I saw an article, that AMD could be working on DDR2 latencies. You think that 4-4-4-12 is good timings? 12 = tRAS
"tRAS is the time required before (or delay needed) between the active and precharge commands. In other words, how long the memory must wait before the next memory access can begin."
In fact, you have better frequencies, but lower timings.... What you need, is higher frequencies, and lower timings.
So we will have to wait till they launch Socket AM2, to know the true performance of AM2.
defter - Monday, April 10, 2006 - link
4-4-4-12 are good timings, even for DDR2-667. It isn't easy to find reasonable priced DDR2-667 that works on those timing with standard voltage.
Some people forget that 99% of consumers won't be using super expensive overvolted 3-3-3-10 DDR2-800 memory just to get few percents of extra performance. And if you compare AMD CPU + super fast DDR2-800 against Intel CPU (which runs fine on DDR2-667 because of FSB limitation) then you need to take into account higher price of memory on AMD system.
Wesley Fink - Monday, April 10, 2006 - link
We are continuing to test the AM2 on different AM2 boards. On another motherboard we could run at 3-3-3 DDR2-800 with the OCZ PC2-8000 memory. Latency was a bit lower and bandwidth a bit higher, but nothing realy changed from Anand's conclusions. We have also been running DDR2-667 and DDR2-533 tests with this new super fast OCZ memory and cheaper mainstream DDR2 memory, and we will be sharing those results as soon as testing is complete.cornfedone - Monday, April 10, 2006 - link
The crap the mobo companies have been shoving out the doors the past couple years is pure garbage as any number of hardware review sites have confirmed. It looks like the AM2 mobos might be more half-baked crap. Until you can test the shipping CPUs on a quality mobo that allows proper memory timing, it's difficult to know what AMD's AM2 CPUs will or won't deliver. If I had a dollar for every bogus claim Intel has made, I'd be a Billionaire so I wouldn't hold my breath that Conroe will perform as Intel claims.