Cypress: What’s New

With our refresher out of the way, let’s discuss what’s new in Cypress.

Starting at the SPU level, AMD has added a number of new hardware instructions to the SPUs and sped up the execution of other instruction, both in order to improve performance and to meet the requirements of various APIs. Among these changes are that some dot products have been reduced to single-cycle computation when they were previously multi-cycle affairs. DirectX 11 required operations such as bit count, insert, and extract have also been added. Furthermore denormal numbers have received some much-needed attention, and can now be handled at full speed.


Perhaps the most interesting instruction added however is an instruction for Sum of Absolute Differences (SAD). SAD is an instruction of great importance in video encoding and computer vision due to its use in motion estimation, and on the RV770 the lack of a native instruction requires emulating it in no less than 12 instructions. By adding a native SAD instruction, the time to compute a SAD has been reduced to a single clock cycle, and AMD believes that it will result in a significant (>2x) speedup in video encoding.

The clincher however is that SAD not an instruction that’s part of either DirectX 11 or OpenCL, meaning DirectX programs can’t call for it, and from the perspective of OpenCL it’s an extension. However these APIs leave the hardware open to do what it wants to, so AMD’s compiler can still use the instruction, it just has to know where to use it. By identifying the aforementioned long version of a SAD in code it’s fed, the compiler can replace that code with the native SAD, offering the native SAD speedup to any program in spite of the fact that it can’t directly call the SAD. Cool, isn’t it?

Last, here is a breakdown of what a single Cypress SP can do in a single clock cycle:

  • 4 32-bit FP MAD per clock
  • 2 64-bit FP MUL or ADD per clock
  • 1 64-bit FP MAD per clock
  • 4 24-bit Int MUL or ADD per clock
  • SFU : 1 32-bit FP MAD per clock

Moving up the hierarchy, the next thing we have is the SIMD. Beyond the improvements in the SPs, the L1 texture cache located here has seen an improvement in speed. It’s now capable of fetching texture data at a blistering 1TB/sec. The actual size of the L1 texture cache has stayed at 16KB. Meanwhile a separate L1 cache has been added to the SIMDs for computational work, this one measuring 8KB. Also improving the computational performance of the SIMDs is the doubling of the local data share attached to each SIMD, which is now 32KB.


At a high level, the RV770 and Cypress SIMDs look very similar

The texture units located here have also been reworked. The first of these changes are that they can now read compressed AA color buffers, to better make use of the bandwidth they have. The second change to the texture units is to improve their interpolation speed by not doing interpolation. Interpolation has been moved to the SPs (this is part of DX11’s new Pull Model) which is much faster than having the texture unit do the job. The result is that a texture unit Cypress has a greater effective fillrate than one under RV770, and this will show up under synthetic tests in particular where the load-it and forget-it nature of the tests left RV770 interpolation bound. AMD’s specifications call for 68 billion bilinear filtered texels per second, a product of the improved texture units and the improved bandwidth to them.

Finally, if we move up another level, here is where we see the cause of the majority of Cypress’s performance advantage over RV770. AMD has doubled the number of SIMDs, moving from 10 to 20. This means twice the number of SPs and twice the number of texture units; in fact just about every statistic that has doubled between RV770 and Cypress is a result of doubling the SIMDs. It’s simple in concept, but as the SIMDs contain the most important units, it’s quite effective in boosting performance.


However with twice as many SIMDs, there comes a need to feed these additional SIMDs, and to do something with their products. To achieve this, the 4 L2 caches have been doubled from 64KB to 128KB. These large L2 caches can now feed data to L1 caches at 435GB/sec, up from 384GB/sec in RV770. Along with this the global data share has been quadrupled to 64KB.


RV770 vs...


Cypress

Next up, the ROPs have been doubled in order to meet the needs of processing data from all of those SIMDs. This brings Cypress to 32 ROPs. The ROPs themselves have also been slightly enhanced to improve their performance; they can now perform fast color clears, as it turns out some games were doing this hundreds of times between frames. They are also responsible for handling some aspects of AMD’s re-introduced Supersampling Anti-Aliasing mode, which we will get to later.

 

Last, but certainly not least, we have the changes to what AMD calls the “graphics engine”, primarily to bring it into compliance with DX11. RV770’s greatly underutilized tessellator has been upgraded to full DX11 compliance, giving it Hull Shader and Domain Shader capabilities, along with using a newer algorithm to reduce tessellation artifacts. A second rasterizer has also been added, ostensibly to feed the beast that is the 20 SIMDs.

A Quick Refresher on the RV770 DirectX11 Redux
Comments Locked

327 Comments

View All Comments

  • ClownPuncher - Wednesday, September 23, 2009 - link

    Absolutely, I can answer that for you.

    Those 2 "ports" you see are for aesthetic purposes only, the card has a shroud internally so those 2 ports neither intake nor exhaust any air, hot or otherwise.
  • Ryan Smith - Wednesday, September 23, 2009 - link

    ClownPuncher gets a cookie. This is exactly correct; the actual fan shroud is sealed so that air only goes out the front of the card to go outside of the case. The holes do serve a cooling purpose though; allow airflow to help cool the bits of the card that aren't hooked up to the main cooler; various caps and what have you.
  • SiliconDoc - Wednesday, September 23, 2009 - link

    Ok good, now we know.
    So the problem now moves to the tiny 1/2 exhaust port on the back, did you stick your hand there and see how much that is blowing ? Does it whistle through there ? lol
    Same amount of air(or a bit less) in half the exit space... that's going to strain the fan and or/reduce flow, no matter what anyone claims to the contrary.
    It sure looks like ATI is doing a big favor to aftermarket cooler vendors.

  • GhandiInstinct - Wednesday, September 23, 2009 - link

    Ryan,

    Developers arent pushing graphics anymore. Its not economnical, PC game supports is slowing down, everything is console now which is DX9. what purpose does this ATI serve with DX11 and all this other technology that won't even make use of games 2 years from now?

    Waste of money..
  • ClownPuncher - Wednesday, September 23, 2009 - link

    Clearly he should stop reviewing computer technology like this because people like you are content with gaming on their Wii and iPhone.

    This message has been brought to you by Sarcasm.
  • Griswold - Wednesday, September 23, 2009 - link

    So you're echoing what nvidia recently said, when they claimed dx11/gaming on the PC isnt all that (anymore)? I guess nvidia can close shop (at least the gaming relevant part of it) now and focus on GPGPU. Why wait for GT300 as a gamer?

    Oh right, its gonna be blasting past the 5xxx and suddenly dx11 will be the holy grail again... I see how it is.
  • SiliconDoc - Wednesday, September 23, 2009 - link

    rofl- It's great to see red roosters not crowing and hopping around flapping their wings and screaming nvidia is going down.
    Don't take any of this personal except the compliments, you're doing a fine job.
    It's nice to see you doing my usual job, albiet from the other side, so allow me to compliment your fine perceptions. Sweltering smart.
    But, now, let's not forget how ambient occlusion got poo-pooed here and shading in the game was said to be "an irritant" when Nvidia cards rendered it with just driver changes for the hardware. lol
    Then of course we heard endless crowing about "tesselation" for ati.
    Now it's what, SSAA (rebirthed), and Eyefinity, and we'll hear how great it is for some time to come. Let's not forget the endless screeching about how terrible and useless PhysX is by Nvidia, but boy when "open standards" finally gets "Havok and Ati" cranking away, wow the sky is the limit for in game destruction and water movement and shooting and bouncing, and on and on....
    Of course it was "Nvidia's fault" that "open havok" didn't happen.
    I'm wondering if 30" top resolution will now be "all there is!" for the next month or two until Nvidia comes out with their next generation - because that was quite a trick switching from top rez 30" DOWN to 1920x when Nvidia put out their 2560x GTX275 driver and it whomped Ati's card at 30" 2560x, but switched places at 1920x, which was then of course "the winning rez" since Ati was stuck there.
    I could go on but you're probably fuming already and will just make an insult back so let the spam posting IZ2000 or whatever it's name will be this time handle it.
    BTW there's a load of bias in the article and I'll be glad to point it out in another post, but the reason the red rooster rooting is not going beyond any sane notion of "truthful" or even truthiness, is because this 5870 Ati card is already percieved as " EPIC FAIL" !
    I cannot imagine this is all Ati has, and if it is they are in deep trouble I believe.
    I suspect some further releases with more power soon.



  • Finally - Wednesday, September 23, 2009 - link

    Team Green - full foam ahead!
    *hands over towel*
    There you go. Keep on foaming, I'm all amused :)
  • araczynski - Wednesday, September 23, 2009 - link

    is DirectX11 going to be as worthless as 10? in terms of being used in any meaningful way in a meaningful amount of games?

    my 2 4850's are still keeping me very happy in my 'ancient' E8500.

    curious to see how this compares to whatever nvidia rolls out, probably more of the same, better in some, worse in others, bottom line will be the price.... maybe in a year or two i'll build a new system.

    of course by that time these'll be worthless too.
  • SiliconDoc - Wednesday, September 23, 2009 - link

    Well it's certainly going to be less useful than PhysX, which is here said to be worthless, but of course DX11 won't get that kind of dissing, at least not for the next two months or so, before NVidia joins in.
    Since there's only 1 game "kinda ready" with DX11, I suppose all the hype and heady talk will have to wait until... until... uhh.. the 5870's are actually available and not just listed on the egg and tiger.
    Here's something else in the article I found so very heartwarming:
    ---
    " Wrapping things up, one of the last GPGPU projects AMD presented at their press event was a GPU implementation of Bullet Physics, an open source physics simulation library. Although they’ll never admit it, AMD is probably getting tired of being beaten over the head by NVIDIA and PhysX; Bullet Physics is AMD’s proof that they can do physics too. "
    ---
    Unfortunately for this place,one of my friends pointed me to this little expose' that show ATI uses NVIDIA CARDS to develope "Bullet Physics" - ROFLMAO
    -
    " We have seen a presentation where Nvidia claims that Mr. Erwin Coumans, the creator of Bullet Physics Engine, said that he developed Bullet physics on Geforce cards. The bad thing for ATI is that they are betting on this open standard physics tech as the one that they want to accelerate on their GPUs.

    "ATI’s Bullet GPU acceleration via Open CL will work with any compliant drivers, we use NVIDIA Geforce cards for our development and even use code from their OpenCL SDK, they are a great technology partner. “ said Erwin.

    This means that Bullet physics is being developed on Nvidia Geforce cards even though ATI is supposed to get driver and hardware acceleration for Bullet Physics."
    ---
    rofl - hahahahahha now that takes the cake!
    http://www.fudzilla.com/content/view/15642/34/">http://www.fudzilla.com/content/view/15642/34/
    --
    Boy do we "hate PhysX" as ati fans, but then again... why not use the nvidia PhysX card to whip up some B Physics, folks I couldn't make this stuff up.

Log in

Don't have an account? Sign up now