The Dark Knight: Intel's Core i7
by Anand Lal Shimpi & Gary Key on November 3, 2008 12:00 AM EST- Posted in
- CPUs
Nehalem's Weakness: Cache
Intel opted for a very Opteron-like cache hierarchy with Nehalem, each core gets a small L2 cache and they all sit behind one large, shared L3 cache. This sort of a setup benefits large codebase applications that are also well threaded, for example the type of things you'd encounter in a database server. The problem is that the CPU launching today, the Core i7, is designed to be used in a desktop.
Let's look at a quick comparison between Nehalem and Penryn's cache setups:
Intel Nehalem | Intel Penryn | |
L1 Size / L1 Latency | 64KB / 4 cycles | 64KB / 3 cycles |
L2 Size / L2 Latency | 256KB / 11 cycles | 6MB* / 15 cycles |
L3 Size / L3 Latency | 8MB / 39 cycles | N/A |
Main Memory Latency (DDR3-1600 CAS7) | 107 cycles (33.4 ns) | 160 cycles (50.3 ns) |
*Note 6MB per 2 cores
Nehalem's L2 cache does get a bit faster, but the speed doesn't make up for the lack of size. I suspect that Intel will address the L2 size issue with the 32nm shrink, but until then most applications will have to deal with a significantly reduced L2 cache size per core. The performance impact is mitigated by two things: 1) the fast L3 cache, and 2) the very fast on die memory controller. Fortunately for Nehalem, most applications can't fit entirely within cache and thus even the large 6MB and 12MB L2 caches of its predecessors can't completely contain everything, thus giving Nehalem's L3 cache and memory controller time to level the playing field.
The end result, as you'll soon see, is that in some cases Nehalem's architecture manages to take two steps forward, and two steps back, resulting a zero net improvement over Penryn. The perfect example is 3D gaming as you can see below:
Intel Nehalem (3.2GHz) | Intel Penryn (3.2GHz) | |
Age of Conan | 123 fps | 107.9 fps |
Race Driver GRID | 102.9 fps | 103 fps |
Crysis | 40.5 fps | 41.7 fps |
Farcry 2 | 115.1 fps | 102.6 fps |
Fallout 3 | 83.2 fps | 77.2 fps |
Age of Conan and Fallout 3 show significant improvements in performance when not GPU bound, while Crysis and Race Driver GRID offer absolutely no benefit to Nehalem. It's almost Prescott-like in that Intel put in a lot of architectural innovation into a design that can, at times, offer no performance improvement over its predecessor. Where Nehalem fails to be like Prescott is in that it can offer tremendous performance increases and it's on the very opposite end of the power efficiency spectrum, but we'll get to that in a moment.
73 Comments
View All Comments
fzkl - Monday, November 3, 2008 - link
"Where Nehalem really succeeds however is in anything involving video encoding or 3D rendering"We have new CPU that does Video encoding and 3D Rendering really well while at the same time the GPU manufacturers are offloading these applications to the GPU.
The CPU Vs GPU debate heats up more.
_______________________________________________________________
www.topicbean.com
Griswold - Tuesday, November 4, 2008 - link
Wheres the product that offloads encoding to GPUs - all of them, from both makers - as a publicly available product? I havent seen that yet. Of course, we havent seen Core i7 in the wild yet either, but I bet it will be many moons before there is that single encoding suite that is ready for primetime regardless of the card that is sitting in your machine. On the other hand, I can encode my stuff right now with my current Intel or AMD products and will just move them over to the upcoming products without having to think about it.Huge difference. The debate isnt really a debate yet, if you're doing more than just talking about it.
haukionkannel - Monday, November 3, 2008 - link
Well if both CPU and GPU are better for video encoding, the better! Even now the rendering takes forever.So there is not any problem if GPU helps allready good 3d render CPU. Everything that gives more speed is just bonus!