Intel's Pentium M on the Desktop - A Viable Alternative?
by Anand Lal Shimpi on February 7, 2005 4:00 PM EST- Posted in
- CPUs
Floating Point Performance
Just about a year ago, our own Johan De Gelas made an extremely interesting point about one of the weaknesses of the Pentium M - floating point performance. The theory is this - the Pentium 4, Athlon 64 and Pentium M all have very different platforms, with equally different characteristics. Unfortunately, as we've already shown, the Pentium M is quite possibly the worst off with only a single channel 333MHz DDR memory bus. It's also widely known that most floating point intensive applications are highly memory bandwidth limited, meaning that the Pentium M already has an excuse for poor floating point performance - it doesn't have enough memory bandwidth.But what if we are able to take memory bandwidth out of the equation? This is where a little benchmark called "flops" comes into play. The beauty of flops is that it executes entirely within the L1 cache of the Pentium M, meaning that the benchmark is limited by two things: the performance of the Pentium M's L1 cache, and more importantly, the performance of the Pentium M's floating point and SSE units.
The actual tests that flops runs are a mixture of floating point add, subtract, multiply and divide operations. The mix of ADD/SUB, MUL and DIV operations is listed next to each test in the table below.
We compiled flops using the latest Intel C compilers to give the Pentium M as solid of a foundation as possible using the /O3 and architecture specific flags under Visual Studio .NET. All of the results are expressed in MFLOPs, higher scores being better:
Test (% ADD, SUB, MUL, DIV) | AMD Athlon 64 3200+ (2.0GHz) | AMD Athlon 64 FX-55 (2.6GHz) | Intel Pentium 4 3.2GHz | Intel Pentium M 755 (2.0GHz) |
1 (50,0,43,7) | 1576 | 2057 | 1274 | 899 |
2 (43,29,14,14) | 856 | 1118 | 790 | 492 |
3 (35,12,53,0) | 1388 | 1802 | 2476 | 1470 |
4 (47,0,53,0) | 1244 | 1622 | 2792 | 1601 |
5 (45,0,52,3) | 1477 | 1923 | 2351 | 1019 |
6 (45,0,55,0) | 1466 | 1908 | 2762 | 1607 |
7 (25,25,25,25) | 458 | 595 | 365 | 252 |
8 (43,0,57,0) | 1585 | 2065 | 2566 | 1572 |
Average | 1256 | 1636 | 1922 | 1114 |
The first comparison to look at is the Athlon 64 3000+ vs the Pentium M 755, since both CPUs run at the same clock speed. Despite the Pentium M's improvements to enhance IPC, the Athlon 64 is still able to outperform it at a core level (without the aid of its memory controller) by almost 13%. But here's where the next Athlon 64 score comes into play - while the Pentium M will hit 2.26GHz by the end of this year, the Athlon 64 will be at or above 3.0GHz. So, the headroom of the Athlon 64's architecture gives it a huge performance advantage here in flops as you can see by the Athlon 64 FX-55 results (remember that the larger L2 cache of the FX-55 has no effect on the flops results as the program runs entirely out of L1).
Next, we have one of the slower Pentium 4s vs. the Pentium M 755. Why not compare to a 3.6GHz or the new 3.8GHz Pentium 4? Well, look at how much the Pentium 4 3.2GHz outperforms the Pentium M 755 - 72% using Intel's 8.1 C++ compiler. When running optimized SSE2/3 code, the Pentium 4 is a much stronger FP performer than what the Pentium M ever could be, which is very important for the following reason: the future of desktop applications is in very floating-point intensive media transcoding tasks, and for those applications, the Pentium M just won't cut it. So, to those who feel that Intel will soon ditch Net Burst in favor of the Pentium M's architecture, the results speak for themselves. While elements of the Pentium M architecture will undoubtedly make an appearance in the Pentium 4's successor, its dated P6 execution core will not.
77 Comments
View All Comments
fitten - Tuesday, February 8, 2005 - link
Also, it's interesting that there are many benchmarks chosen which are known to stress the weaknesses of the Pentium-M... not that it isn't interesting information. For example, there seems to be a whole lot of FPU intensive benchmarks (around 15 or so, all of which the Pentium-M should lose handily - known before they are even run) so kind of just hammering the point home I guess.Anyway, the Dothans held up pretty well from what I can see... Most of the time (except for the notable FPU intensive and memory bandwidth intensive benchmarks), the Dothan compares quite well with Athlon64s of the same clock speed that have the advantage of dual channel memory.
fitten - Tuesday, February 8, 2005 - link
The other interesting thing about the Athlon64 vs. Dothan comparison is that even with dual channel memory bandwidth on the Athlon64's side, the single channel memory bandwidth of the Dothan still keeps it very close in many of the benchmarks and can even beat the dual channel Athlon64s at 400MHz higher clock in some.Anyway, the Pentium-M family is a good start. Some tweaking here and there (improved FPU with better FPU performance and maybe another FPU execution unit, improved memory subsystem to make good use of dual channel) and it will be at least as good as the Athlon64s across the board.
I own three Athlon64 desktops, two AthlonXP desktops, and two Pentium-M laptops and the laptops are by no means "slow" at doing work.
KristopherKubicki - Tuesday, February 8, 2005 - link
teutonicknight: We purposely don't change our test platform too often. Even though we are using a slightly older version of Premiere, it is the same version we have used in our other processor analyses.Hope that helps,
Kristopher
kmmatney - Tuesday, February 8, 2005 - link
There's also a Celeron version that would have been intersting to review. The small L2 cache should hurt the performance, though. I think the celeron version using something like 7 Watts. It would make no sense to put a celeron-M in such an expensive motherboard, though.Slaimus - Tuesday, February 8, 2005 - link
I think this indirectly shows how AMD needs to update its caching architecture on the K8. They basically carried over the K7 caches, which is just too slow when paired with its memory controller. Instead of being as large as possible (as evidenced by the exclusive caches) at the expense of latency, the K8 needs faster caches. The memory bandwith of L2 vs system memory is only about 2 to 1 on the K8, which is to say the L2 cache is not helping the system memory much.sandorski - Monday, February 7, 2005 - link
I think the Pentium M mythos can now be laid to rest.mjz5 - Monday, February 7, 2005 - link
to #29:your 2800 is the 754 pin.
the 3000+ reviewed is the 939 pin which is 1.8. the 3000+ for the 754 is 2.0 ghz
kristof007 - Monday, February 7, 2005 - link
I don't know if anyone else noticed but the charts are a bit off. My A64 2800+ is running at a stock 1.8 ghz .. while in the review the A64 3000+ is running at 1.8 ... weird!knitecrow - Monday, February 7, 2005 - link
#251) Intel and AMD measure TDP differently... and TDP is not the same as actual power dissipation. The actual dissipation of 90nm A64 is pretty darn good.
2) A microprocessor is not made of Lego... you can't rearrange/tweak parts to make it faster. It takes a lot of time, energy and talent to make changes -- even then it may not work for the best. Prescott anyone?
Frankly I’ve been waiting for a good review of P-M's actual performance. I really don't trust those "other" sites.
k00kie - Monday, February 7, 2005 - link