Intel's Larrabee Architecture Disclosure: A Calculated First Move
by Anand Lal Shimpi & Derek Wilson on August 4, 2008 12:00 AM EST- Posted in
- GPUs
The Design Experiment: Could Intel Build a GPU?
Larrabee is fundamentally built out of existing Intel x86 core technology, which not only means that the chip design isn't foreign to Intel, but also has serious implications for the future of desktop microprocessors. Larrabee isn't however built on Intel's current bread and butter, the Core architecture, instead Intel turned to a much older architecture as the basis for Larrabee: the original Pentium.
The original Pentium was manufactured on a 0.80µm process, later shrinking to 0.60µm. The question Intel posed was this: could an updated version of the Pentium core, built on a modern day process and equipped with a very wide vector unit, make a solid foundation for a high-end GPU?
To first test the theory Intel took a standard Core 2 Duo, with a 4MB L2 cache at an undisclosed clock speed (somewhere in the 1.8 - 2.9GHz range I'd guess). Then, on the same manufacturing process, roughly the same die area and power consumption, Intel sought to find out how many of these modified Pentium cores it could fit. The number was 10.
So in the space of a dual-core Core 2 Duo, Intel could construct this hypothetical 10-core chip. Let's look at the stats:
Intel Core 2 Duo | Hypothetical Larrabee | |
# of CPU Cores | 2 out of order | 10 in-order |
Instructions per Issue | 4 per clock | 2 per clock |
VPU Lanes per Core | 4-wide SSE | 16-wide |
L2 Cache Size | 4MB | 4MB |
Single-Stream Throughput | 4 per clock | 2 per clock |
Vector Throughput | 8 per clock | 160 per clock |
Note that what we're comparing here are operation throughputs, not how fast it can actually execute anything, just how many operations it can retire per clock.
Running a single instruction stream (e.g. single threaded application), the Core 2 can process as many as four operations per clock, since it can issue 4-instructions per clock and it isn't execution unit constrained. The 10-core design however can only issue two instructions per clock and thus the peak execution rate for a single instruction stream is two operations per clock, half the throughput of the Core 2. That's fine however since you'll actually want to be running vector operations on this core and leave your single threaded tasks to your Core 2 CPU anyways, and here's where the proposed architecture spreads its wings.
With two cores, each with their ability to execute 4 concurrent SSE operations per clock, you've got a throughput of 8 ops per clock on Core 2. On the 10-core design? 160 ops per clock, an increase of 20x in roughly the same die area and power budget.
On paper this could actually work. If you had enough of these cores, you could get the vector throughput necessary to actually build a reasonable GPU. Of course there are issues like adapting the x86 instruction set for use in a GPU, getting all of the cores to communicate with one another and actually keeping all of these execution resources busy - but this design experiment showed that it was possible.
Thus Larrabee was born.
101 Comments
View All Comments
erikespo - Monday, August 4, 2008 - link
http://en.wikipedia.org/wiki/Square_%28geometry%29">http://en.wikipedia.org/wiki/Square_%28geometry%29helpful page to take you back to first grade
and excuse my decimal point.. it is 204.49mm total per core or 14.3mm^2
erikespo - Monday, August 4, 2008 - link
Explain.lets use smaller numbers for you 2mm^2 is 2mm by 2 mm or 4 total mm
double that and it is 4mm^2 or 4 mm by 4 mm or 16mm total..
we are talking about area or 2 dimensions not 1 dimension.
Same math applies to the article
MamiyaOtaru - Monday, August 4, 2008 - link
No, you're way off. 2mm² is TWO square millimeters. (a rectangle 1x2 for example). Double that would be 4mm², which could either be 1x4 or 2x2.NUMBERmm² doesn't mean NUMBERxNUMBER mm, it means exactly what it says: NUMBER mm².
Using your smaller numbers: 2mm² is not "4 total mm"; it is TWO mm². Saying it is 4 total mm doesn't even make sense. You _can't_ measure area in millimeters. You measure it in square millimeters, and there are two of them (_2_mm²).
Here's an mspaint visual (if links work: http://img105.imageshack.us/my.php?image=squaremma...">http://img105.imageshack.us/my.php?image=squaremma...
You're so sure you're right on this, it's really depressing :(
darkequitus - Monday, August 4, 2008 - link
I did not appriciate the writer creaming over every digital page they wrote. especially when Larrabee's performance is mainl at the moment based on INtel hype and nothing real.ZootyGray - Monday, August 4, 2008 - link
THANK YOU.Somebody finally said it.
The others prefer Eutopian illusion - aka the curse aka ntel antitrust. ntel has no grafx and the fools in the public buy "inside' and nvid and ati aren't exactly friends of the curse.
welcome to the matrix. wakey wakey
ZootyGray - Monday, August 4, 2008 - link
and a 16 pager on maybe might could be should be = wannabe "employ-boy"- payday ? hooyeh. This is so disappointing for me. Credibility sags to a new low.
strikeback03 - Tuesday, August 5, 2008 - link
Someone whose two posts contain about 10 complete words and no complete thoughts says Anandtech's credibility has sagged to a new low?ZootyGray - Tuesday, August 5, 2008 - link
haha yeh - lots of room for thinking.or - if no thinkeez - ya gots der 16 pg inundation (that's a big word like marmalade) all based on nothing-is-real - you like that kind of brainwash? we don't know anything; but here's the tekspex?
btw - did u get it? the matrix idea? watch the movie. cos here it is. pardon my loaded cryptic literacy.
thx
if you don't get it - well, that's what they want - a world of sleeping mob. never mind, that's just my concern.
The Preacher - Monday, August 4, 2008 - link
I don't really care about how good it will be executing some software renderer but I feel it is going to kick ass in scientific calculations. Matrix operations, FFT/convolution, tremendous bandwidth, double precission... I may write C++/x86 assembly code directly for it and I may put this into a rack of servers and use it through MPI. Give me a compiler with vector intrinsic functions for it and my dreams just came true! :)elerick - Monday, August 4, 2008 - link
I have been a daily reader of another hardware review site for years. I ready nearly every articles that headlines and find many of them quite lacking. Today I got wind of your review for the Larabee. It was very well written and produced an amazing amount of tech knowledge not really commonly reviewed. I'm glad to have found you this site, and I never create an account but today I felt obligated to. Great work.PS: any news on that AMD / Fusion? or is that just them being intimidated by Intel's Larrabee?