Intel's Larrabee Architecture Disclosure: A Calculated First Move
by Anand Lal Shimpi & Derek Wilson on August 4, 2008 12:00 AM EST- Posted in
- GPUs
The Design Experiment: Could Intel Build a GPU?
Larrabee is fundamentally built out of existing Intel x86 core technology, which not only means that the chip design isn't foreign to Intel, but also has serious implications for the future of desktop microprocessors. Larrabee isn't however built on Intel's current bread and butter, the Core architecture, instead Intel turned to a much older architecture as the basis for Larrabee: the original Pentium.
The original Pentium was manufactured on a 0.80µm process, later shrinking to 0.60µm. The question Intel posed was this: could an updated version of the Pentium core, built on a modern day process and equipped with a very wide vector unit, make a solid foundation for a high-end GPU?
To first test the theory Intel took a standard Core 2 Duo, with a 4MB L2 cache at an undisclosed clock speed (somewhere in the 1.8 - 2.9GHz range I'd guess). Then, on the same manufacturing process, roughly the same die area and power consumption, Intel sought to find out how many of these modified Pentium cores it could fit. The number was 10.
So in the space of a dual-core Core 2 Duo, Intel could construct this hypothetical 10-core chip. Let's look at the stats:
Intel Core 2 Duo | Hypothetical Larrabee | |
# of CPU Cores | 2 out of order | 10 in-order |
Instructions per Issue | 4 per clock | 2 per clock |
VPU Lanes per Core | 4-wide SSE | 16-wide |
L2 Cache Size | 4MB | 4MB |
Single-Stream Throughput | 4 per clock | 2 per clock |
Vector Throughput | 8 per clock | 160 per clock |
Note that what we're comparing here are operation throughputs, not how fast it can actually execute anything, just how many operations it can retire per clock.
Running a single instruction stream (e.g. single threaded application), the Core 2 can process as many as four operations per clock, since it can issue 4-instructions per clock and it isn't execution unit constrained. The 10-core design however can only issue two instructions per clock and thus the peak execution rate for a single instruction stream is two operations per clock, half the throughput of the Core 2. That's fine however since you'll actually want to be running vector operations on this core and leave your single threaded tasks to your Core 2 CPU anyways, and here's where the proposed architecture spreads its wings.
With two cores, each with their ability to execute 4 concurrent SSE operations per clock, you've got a throughput of 8 ops per clock on Core 2. On the 10-core design? 160 ops per clock, an increase of 20x in roughly the same die area and power budget.
On paper this could actually work. If you had enough of these cores, you could get the vector throughput necessary to actually build a reasonable GPU. Of course there are issues like adapting the x86 instruction set for use in a GPU, getting all of the cores to communicate with one another and actually keeping all of these execution resources busy - but this design experiment showed that it was possible.
Thus Larrabee was born.
101 Comments
View All Comments
ocyl - Monday, August 4, 2008 - link
Larrabee will be shipped when Diablo III is, and it will mark the beginning of the end for DirectX.Calling it first here at AnandTech.
Thanks go to Anand and Derek for the very well written article. You are the ones who keep tech journalism alive.
erikespo - Monday, August 4, 2008 - link
"At 143 mm^2, Intel could fit 10 Larrabee-like cores so let's double that. Now we're at 286mm^2 (still smaller than GT200 and about the size of AMD's RV770) and 20-cores. Double that once more and we've got 40-cores and have a 572mm^2 die, virtually the same size as NVIDIA's GT200 but on a 65nm process. "this math is way off
143 mm^2 is 20449mm.. if they fit 10 there that is 2044.9 per core
286mm^2 is 81796mm.. that is 4X the space so 40 cores in 286^2
and 572mm^2 is 327184mm is 160 cores..
double length will double area.. doubling length and width will quadruple area.
bauerbrazil - Monday, August 4, 2008 - link
Hahahaha, YOUR math is way off!!!Jesus.
erikespo - Monday, August 4, 2008 - link
I see where the article and you got your math..you both did 143mm^2 / 10 and got 14.3 then divided 286^2 by 14.3 and got 20.. this math is only acting on the one number..
I know this because the area of 14.3 is 204.49 mm. 10 of those would be 2044.9mm. but the area of 143mm^2 is 20449mm.
WeaselITB - Monday, August 4, 2008 - link
Wow ... No.143mm^2 is NOT equivalent to 143^2 mm ... Your analysis is flawed.
If we use your example, 2mm^2 is NOT 2mm x 2mm ... it's actually root(2)mm x root(2)mm ... 4mm^2 is 2mm x 2mm, not 4mm x 4mm (that'd be 16mm).
Maybe you should examine in depth that Wikipedia article you linked earlier ...
Thanks,
-Weasel
MamiyaOtaru - Monday, August 4, 2008 - link
143mm^2 is NOT equivalent to 143^2 mm^^THIS
That's it in a nutshell. mm² doesn't mean you square 143, it refers to Square Millimeters, a unit of area (unlike Millimeters, a unit of distance).
Revised mspaint illustration: http://img379.imageshack.us/my.php?image=squaremmh...">http://img379.imageshack.us/my.php?image=squaremmh...
erikespo - Monday, August 4, 2008 - link
Anandtech Comment Section.. Forever record of my retardednesserikespo - Monday, August 4, 2008 - link
Dang.. Many apologies..got my square area and squared numbers confused..
WeaselITB - Monday, August 4, 2008 - link
[quote]4mm^2 is 2mm x 2mm, not 4mm x 4mm (that'd be 16mm).[/quote]Dang, that was supposed to read "(that'd be 16mm^2)."
Thanks,
-Weasel
erikespo - Monday, August 4, 2008 - link
another way to look as it is how man 143mm^2 squares does it take to make up 286mm^2?only 2 would only be 143mm x 286mm
since 10 cores fit into 143 x 143, 20 will fit into 143 x 286mm
286 x 286 (which is double that of 143 x 286mm) the 286mm^2 would fit 40