A Quick Refresher on the RV770

As Cypress is a direct evolution of the RV770 design, before we talk about what’s new with Cypress we are going to go over a quick rehash of RV770’s internal workings. As it’s necessary to understand how RV770 was built to understand what Cypress changes, if you’re completely unfamiliar with RV770, please take a look at our expanded discussion of RV770 from last year. For the rest of you, let’s get started.

At the center of the RV770 is the Stream Processing Unit (SPU), a single arithmetic logic unit. The RV770 has 800 of these, and they are packaged together in groups of 5 and are what we call a Streaming Processor (SP). A SP contains a register file, a branch predictor, and the aforementioned 5 SPUs, with the 5th SPU being a more complex unit capable of transcendental functions along with the base functions of an ALU. The SP is the smallest unit that can do individual work; every SPU in an SP must execute the same instruction.

For every 16 SPs, AMD groups them together with texture units, L1 cache, shared memory, and controlling logic. This combined block is what AMD calls a SIMD, and RV770 has 10 of them. These 10 SIMDs form the core computational power of the RV770, and in the chip work with various specialized units such as ROPs, rasterizers, L2 cache, and tesselators to form a complete chip.

To utilize the computational power of the hardware, instruction threads are issued to the SPs. These threads are grouped into wavefronts, where there are 64 threads per wavefront. To maximize the utilization of the GPU, threads need to be organized so that they can feed all 5 SPUs in a SP an instruction every clock cycle. Doing this requires extracting instruction level parallelism (ILP) out of programs being passed to the GPU, which is difficult task of AMD’s compiler.

If SPUs go unused, then the performance of the chip suffers due to underutilization. This design gives AMD a great deal of theoretical computational power, but it is always a challenge to fully exploit it.

Meet the Rest of the Evergreen Family Cypress: What’s New
Comments Locked

327 Comments

View All Comments

  • poohbear - Wednesday, September 23, 2009 - link

    is it just me or is anyone else disappointed? next gen cards used to double the performance of previous gen cards, this card beats em by a measly 30-40%. *sigh* times change i guess.
  • AznBoi36 - Wednesday, September 23, 2009 - link

    It's just you.

    The next generations never doubled in performance. Rather they offered a bump in framerates (15-40%) along with better texture filtering, AA, AF etc...

    I'd rather my games look AMAZING at 60fps rather than crappy graphics at 100fps.
  • SiliconDoc - Monday, September 28, 2009 - link

    Golly, another red rooster lie, they just NEVER stop.
    Let's take it right from this site, so your whining about it being nv zone or fudzilla or whatever shows ati is a failure in the very terms claimed is not your next, dishonest move.
    ---
    NVIDIA w/ GT200 spanks their prior generation by 60.96% !

    That's nearly 61% average increase at HIGHEST RESOLUTION and HIGHEST AA AF settings, and it right here @ AT - LOL -

    - and they matched the clock settings JUST TO BE OVERTLY UNFAIR ! ROFLMAO AND NVIDIA'S NEXT GEN LEAP STILL BEAT THE CRAP OUT OF THIS LOUSY ati 5870 EPIC FAIL !
    http://www.anandtech.com/video/showdoc.aspx?i=3334...">http://www.anandtech.com/video/showdoc.aspx?i=3334...
    --
    roflmao - that 426.70/7 = 60.96 % INCREASE FROM THE LAST GEN AT THE SAME SPEEDS, MATCHED FOR MAKING CERTAIN IT WOULD BE AS LOW AS POSSIBLE ! ROFLMAO NICE TRY BUT NVIDIA KICKED BUTT !
    ---
    Sorry, the "usual" is not 15-30% - lol
    ---
    NVIDIA's last usual was !!!!!!!!!!!! 60.69% INCREASE AT HIGHEST SETTINGS !
    -
    Now, once again, please, no lying.
  • piroroadkill - Wednesday, September 23, 2009 - link

    No, it's definitely just you
  • Griswold - Wednesday, September 23, 2009 - link

    Its just you. Go buy a clue.
  • ET - Wednesday, September 23, 2009 - link

    Should probably be removed...

    Nice article. The 5870 doesn't really impress. It's the price of two 4890 cards, so for rendering power that's probably the way to go. I'll be looking forward to the 5850 reviews.
  • Zingam - Wednesday, September 23, 2009 - link

    Good but as seen it doesn't play Crysis once again... :D

    We shall wait for 8Gb RAM DDR 7, 16 nm Graphics card to play this damned game!

  • BoFox - Wednesday, September 23, 2009 - link

    Great article!

    Re: Shader Aliasing nowhere to be found in DX9 games--
    Shader aliasing is present all over the Unreal3 engine games (UT3, Bioshock, Batman, R6:Vegas, Mass Effect, etc..). I can imagine where SSAA would be extremely useful in those games.

    Also, I cannot help but wonder if SSAA would work in games that use deferred shading instead of allowing MSAA to work (examples: Dead Space, STALKER, Wanted, Bionic Commando, etc..), if ATI would implement brute-force SSAA support in the drivers for those games in particular.

    I am amazed at the perfectly circular AF method, but would have liked to see 32x AF in addition. With 32x AF, we'd probably be seeing more of a difference. If we're awed by seeing 16x AA or 24x CFAA, then why not 32x AF also (given that the increase from 8 to 16x AF only costs like 1% performance hit)?

    Why did ATI make the card so long? It's even longer than a GTX 295 or a 4870X2. I am completely baffled at this. It only has 8 memory chips, uses a 256-bit bus, unlike a more complex 512-bit bus and 16 chips found on a much, much shorter HD2900XT. There seems to be so much space wasted on the end of the PCB. Perhaps some of the vendors will develop non-reference PCB's that are a couple inches shorter real soon. It could be that ATI rushed out the design (hence the extremely long PCB draft design), or that ATI deliberately did this to allow 3rd-party vendors to make far more attractive designs that will keep us interested in the 5870 right around the time of GT300 release.

    Regarding the memory bandwidth bottleneck, I completely agree with you that it certainly seems to be a severe bottleneck (although not too severe that it only performs 33% better than a HD4890). A 5870 has exactly 2x the specifications of a 4890, yet it generally performs slower than a 4870X2, let alone dual-4890 in Xfire. A 4870 is slower than a 4890 to begin with, and is dependent on Crossfire.

    Overall, ATI is correct in saying that a 5870 is generally 60% faster than a 4870 in current games, but theoretically, a 5870 should be exactly 100% faster than a 4890. Only if ATI could have used 512-bit memory bandwidth with GDDR5 chips (even if it requires the use of a 1024-bit ringbus) would the total memory bandwidth be doubled. The performance would have been at least as good as two 4890's in crossfire, and also at least as good as a GTX295.

    I am guessing that ATI wants to roll out the 5870X2 as soon as possible and realized that doing it with a 512-bit bus would take up too much time/resources/cost, etc.. and that it's better to just beat NV to the punch a few months in advance. Perhaps ATI will do a 5970 card with 512-bit memory a few months after a 5870X2 is released, to give GT300 cards a run for its money? Perhaps it is to "pacify" Nvidia's strategy with its upcoming next-gen that carry great promises with a completely revamped architecture and 512 shaders, so that NV does not see the need to make its GT300 exceed the 5870 by far too much? Then ATI would be able to counter right afterwards without having to resort to making a much bigger chip?

    Speculation.. speculation...
  • Lakku - Wednesday, September 23, 2009 - link

    Read some of the other 5780 articles that cover SSAA image quality. It actually makes most modern games look worse, but that is through no fault of ATi, just the nature of the SS method that literally AA's everything, and in the process, can/does blur textures.
  • strikeback03 - Wednesday, September 23, 2009 - link

    I don't know much about video games, but in photography it is known that reducing the size of an image reduces the appearance of sharpness as well, so final sharpening should be done at the output size.

Log in

Don't have an account? Sign up now