DirectCompute, OpenCL, and the Future of CAL

As a journalist, GPGPU stuff is one of the more frustrating things to cover. The concept is great, but the execution makes it difficult to accurately cover, exacerbated by the fact that until now AMD and NVIDIA each had separate APIs. OpenCL and DirectCompute will unify things, but software will be slow to arrive.

As it stands, neither AMD nor NVIDIA have a complete OpenCL implementation that's shipping to end-users for Windows or Linux. NVIDIA has OpenCL working on the 8-series and later on Mac OS X Snow Leopard, and AMD has it working under the same OS for the 4800 series, but for obvious reasons we can’t test a 5870 in a Mac. As such it won’t be until later this year that we see either side get OpenCL up and running under Windows. Both NVIDIA and AMD have development versions that they're letting developers play with, and both have submitted implementations to Khronos, so hopefully we’ll have something soon.

It’s also worth noting that OpenCL is based around DirectX 10 hardware, so even after someone finally ships an implementation we’re likely to see a new version in short order. AMD is already talking about OpenCL 1.1, which would add support for the hardware features that they have from DirectX 11, such as append/consume buffers and atomic operations.

DirectCompute is in comparatively better shape. NVIDIA already supports it on their DX10 hardware, and the beta drivers we’re using for the 5870 support it on the 5000 series. The missing link at this point is AMD’s DX10 hardware; even the beta drivers we’re using don’t support it on the 2000, 3000, or 4000 series. From what we hear the final Catalyst 9.10 drivers will deliver this feature.

Going forward, one specific issue for DirectCompute development will be that there are three levels of DirectCompute, derived from DX10 (4.0), DX10.1 (4.1), and DX11 (5.0) hardware. The higher the version the more advanced the features, with DirectCompute 5.0 in particular being a big jump as it’s the first hardware generation designed with DirectCompute in mind. Among other notable differences, it’s the first version to offer double precision floating point support and atomic operations.

AMD is convinced that developers should and will target DirectCompute 5.0 due to its feature set, but we’re not sold on the idea. To say that there’s a “lot” of DX10 hardware out there is a gross understatement, and all of that hardware is capable of supporting at a minimum DirectCompute 4.0. Certainly DirectCompute 5.0 is the better API to use, but the first developers testing the waters may end up starting with DirectCompute 4.0. Releasing something written in DirectCompute 5.0 right now won’t do developers much good at the moment due to the low quantity of hardware out there that can support it.

With that in mind, there’s not much of a software situation to speak about when it comes to DirectCompute right now. Cyberlink demoed a version of PowerDirector using DirectCompute for rendering effects, but it’s the same story as most DX11 games: later this year. For AMD there isn’t as much of an incentive to push non-game software as fast or as hard as DX11 games, so we’re expecting any non-game software utilizing DirectCompute to be slow to materialize.

Given that DirectCompute is the only common GPGPU API that is currently working on both vendors’ cards, we wanted to try to use it as the basis of a proper GPGPU comparison. We did get something that would accomplish the task, unfortunately it was an NVIDIA tech demo. We have decided to run it anyhow as it’s quite literally the only thing we have right now that uses DirectCompute, but please take an appropriately sized quantity of salt – it’s not really a fair test.

NVIDIA’s ocean demo is a fairly simple proof of concept program that uses DirectCompute to run Fast Fourier transforms directly on the GPU for better performance. The FFTs in turn are used to generate the wave data, forming the wave action seen on screen as part of the ocean. This is a DirectCompute 4.0 program, as it’s intended to run on NVIDIA’s DX10 hardware.

The 5870 has no problem running the program, and in spite of whatever home field advantage that may exist for NVIDIA it easily outperforms the GTX 285. Things get a little more crazy once we start using SLI/Crossfire; the 5870 picks up speed, but the GTX 295 ends up being slower than the GTX 285. As it’s only a tech demo this shouldn’t be dwelt on too much beyond the fact that it’s proof that DirectCompute is indeed working on the 5800 series.

Wrapping things up, one of the last GPGPU projects AMD presented at their press event was a GPU implementation of Bullet Physics, an open source physics simulation library. Although they’ll never admit it, AMD is probably getting tired of being beaten over the head by NVIDIA and PhysX; Bullet Physics is AMD’s proof that they can do physics too. However we don’t expect it to go anywhere given its very low penetration in existing games and the amount of trouble NVIDIA has had in getting developers to use anything besides Havok. Our expectations for GPGPU physics remains the same: the unification will come from a middleware vendor selling a commercial physics package. If it’s not Havok, then it will be someone else.

Finally, while AMD is hitting the ground running for OpenCL and DirectCompute, their older APIs are being left behind as AMD has chosen to focus all future efforts on OpenCL and DirectCompute. Brook+, AMD’s high level language, has been put out to pasture as a Sourceforge project. Compute Abstract Layer (CAL) lives on since it’s what AMD’s OpenCL support is built upon, however it’s not going to see any further public development with the interface frozen at the current 1.4 standard. AMD is discouraging any CAL development in favor of OpenCL, although it’s likely the High Performance Computing (HPC) crowd will continue to use it in conjunction with AMD’s FireStream cards to squeeze every bit of performance out of AMD’s hardware.

The First DirectX 11 Games Eyefinity
Comments Locked

327 Comments

View All Comments

  • Ryan Smith - Wednesday, September 23, 2009 - link

    We do have Cyberlink's software, but as it uses different code paths, the results are near-useless for a hardware review. Any differences could be the result of hardware differences, or it could be that one of the code paths is better optimized. We would never be able to tell.

    Our focus will always be on benchmarking the same software on all hardware products. This is why we bent over backwards to get something that can use DirectCompute, as it's a standard API that removes code paths/optimizations from the equation (in this case we didn't do much better since it was a NVIDIA tech demo, but it's still an improvement).
  • DukeN - Wednesday, September 23, 2009 - link

    I have one of these and I know it outperforms the GTX 280 but not sure what it'd be like against one of these puppies.
  • dagamer34 - Wednesday, September 23, 2009 - link

    I need my bitstream Dolby Digital TrueHD/DTS HD Master Audio bistreaming codecs!!! :)
  • ew915 - Wednesday, September 23, 2009 - link

    I don't see this beating the GT300 as for so it should beat the GTX295 by a great margin.
  • tamalero - Wednesday, September 23, 2009 - link

    dood, you forgot the 295 is a DUAL CHIP?
  • SiliconDoc - Wednesday, September 23, 2009 - link

    roflmao - Gee no more screaming the 4850x2 and the 4870x2 are best without pointing out the two gpu's needed to get there.
    --
    Nonetheless, this 5870 is EPIC FAIL, no matter what - as we see the disappointing numbers - we all see them, and it's not good.
    ---
    Problem is, Nvidia has the MIMD multiple instructions breakthrough technology never used before that according to reports is an AWESOME advantage, lus they are moving to DDR5 with a 512 bit bus !
    --
    So what is in the works is an absolute WHOMPING coming down on ati that BIG GREEN NVIDIA is going to deliver, and the poor numbers here from what was hoped for and hyped over (although even PREDICTED by the red fan Derek himself in one portion of one sorrowful and despressed sentence on this site) are just one step closer to that nail in the coffin...
    --
    Yes I sure hope ati has something major up it's sleeve, like 512 bit mem bus increased card coming, the 5870Xmem ...
    I find the speculation that ATI "mispredicted" the bandwidth needs to be utter non-sense. They are 2-3 billion in the hole from the last few years with "all these great cards" they still lose $ on every single sale, so they either cannot go higher bit width, or they don't want to, or they are hiding it for the next "strike at NVidia" release.
  • erple2 - Friday, September 25, 2009 - link

    So you're comparing this product with a not yet release product and saying that the not yet released product is going to trounce it, without any facts to back it up? Do you have the hardware? If not, then you're simply ranting.

    Will the GT300 beat out the 5870? I dunno, probably. If it didn't, that would imply that the move from GT200 to GT300 was a major disappointment for NVidia.

    I think that EPIC FAIL is completely ludicrous. I can see "epic fail" applied to the Geforce FX series when it came out. I can also see "epic fail" for the Radeon MAXX back in the day. But I don't see the 5870 as "epic fail". If you look at the card relative to the 4870 (the card it replaces), it's quite good - solid 30% increase. That's what I would expect from a generation improvement (that's what the gt200's did over the 9800's, and what the 8800 did over the 7900, etc).

    BTW, I'm seeing the 5870 as pretty good - it beats out all single card NVidia by a reasonable and measureable amount. Sounds like ATI has done well. Or are you considering anything less than 2x the performance of the NVidia cards "epic fail"? In that case, you may be disappointed with the GT300, as well. In fact, I'll say that the GT300 is a total fail right now. I mean jeez! It scores ZERO FPS in every benchmark! That's super-epic fail. And I have the numbers to back that statement up.

    Since you are making claims about the epic fail nature of the 5870 based on yet to be released hardware, I can certainly play the same game, and epic fail anything you say based on those speculative musings.
  • SiliconDoc - Monday, September 28, 2009 - link

    Well the GT200 was 60.96% increase average. AT says so.

    http://www.anandtech.com/video/showdoc.aspx?i=3334...">http://www.anandtech.com/video/showdoc.aspx?i=3334...

    So, I guess ati lost this round terribly, as NVidia's last just beat them by more than double your 30%.

    Great, EPIC FAIL is correct, I was right, and well...
  • Finally - Wednesday, September 23, 2009 - link

    Team Green foames out of their mouthes. It's funny to watch.
  • SiliconDoc - Wednesday, September 23, 2009 - link

    Glad you are having fun.
    Just let me know when you disagree, and why. I'm certain your fun will be "gone then", since reality will finally take hold, and instead of you seeing foam, I'll be seeing drool.

Log in

Don't have an account? Sign up now