A Quick Refresher on the RV770

As Cypress is a direct evolution of the RV770 design, before we talk about what’s new with Cypress we are going to go over a quick rehash of RV770’s internal workings. As it’s necessary to understand how RV770 was built to understand what Cypress changes, if you’re completely unfamiliar with RV770, please take a look at our expanded discussion of RV770 from last year. For the rest of you, let’s get started.

At the center of the RV770 is the Stream Processing Unit (SPU), a single arithmetic logic unit. The RV770 has 800 of these, and they are packaged together in groups of 5 and are what we call a Streaming Processor (SP). A SP contains a register file, a branch predictor, and the aforementioned 5 SPUs, with the 5th SPU being a more complex unit capable of transcendental functions along with the base functions of an ALU. The SP is the smallest unit that can do individual work; every SPU in an SP must execute the same instruction.

For every 16 SPs, AMD groups them together with texture units, L1 cache, shared memory, and controlling logic. This combined block is what AMD calls a SIMD, and RV770 has 10 of them. These 10 SIMDs form the core computational power of the RV770, and in the chip work with various specialized units such as ROPs, rasterizers, L2 cache, and tesselators to form a complete chip.

To utilize the computational power of the hardware, instruction threads are issued to the SPs. These threads are grouped into wavefronts, where there are 64 threads per wavefront. To maximize the utilization of the GPU, threads need to be organized so that they can feed all 5 SPUs in a SP an instruction every clock cycle. Doing this requires extracting instruction level parallelism (ILP) out of programs being passed to the GPU, which is difficult task of AMD’s compiler.

If SPUs go unused, then the performance of the chip suffers due to underutilization. This design gives AMD a great deal of theoretical computational power, but it is always a challenge to fully exploit it.

Meet the Rest of the Evergreen Family Cypress: What’s New
Comments Locked

327 Comments

View All Comments

  • SiliconDoc - Sunday, September 27, 2009 - link

    I'll be watching you for the very same conclusion when NVidia launches soft and paper.
    I'll bet ten thousand bucks you don't say it.
    I'll bet a duplicate amount you're a red rager fan, otherwise YOU'D BE HONEST, NOT HOSTILE !
  • rennya - Thursday, September 24, 2009 - link

    It may be paper-launch in the US, but here somewhere in South East Asia I can already grab a Powercolor 5870 1GB if I so desire. Powercolor is quite aggresive here promoting their ATI 5xxx wares just like Sapphire does when the 4xxx series comes out.
  • SiliconDoc - Thursday, September 24, 2009 - link

    I believe you. I've also seen various flavors of cards not available here in the USA, banned by the import export deals and global market and manufacturer and vendor controls and the powers that be, and it doesn't surprise me when it goes the other way.
    Congratulations on actually having a non fake launch.
  • Spoelie - Wednesday, September 23, 2009 - link

    "The engine allows for complete hardware offload of all H.264, MPEG-2 and VC1 decoding".

    This has afaik never been true for any previous card of ATi, and I doubt it has been tested to be true this time as well.

    I have detailed this problem several times before in the comment section and never got a reply, so I'll summarize: ATi's UVD only decodes level 4 AVC (i.e. bluray) streams, if you have a stream with >4 reference frames, you're out of luck. NVIDIA does not have this limitation.
  • lopri - Wednesday, September 23, 2009 - link

    Yeah and my GTX 280 has to run full throttle (3D frequency) just to play a 720p content and temp climbs the same as if it were a 3D game. Yeah it can decode some *underground* clips from Japan, big deal. Oh and it does that for only H.264. No VC-1 love there. I am sure you'd think that is not a big deal, but the same applies to those funky clips with 13+ reference frames. Not a big deal. Especially when AMD can decode all 3 major codecs effortlessly (performance 2D frequency instead of 3D frequency)
  • rennya - Thursday, September 24, 2009 - link

    G98 GPUs (like 8400GS discrete or 9400 chipset) or GT220/G210 can also do MPEG2/VC-1/AVC video decoding.

    The GPU doesn't have to run full throttle either, as long as you stick to the 18x.xx drivers.
  • SJD - Wednesday, September 23, 2009 - link

    Ryan,

    Great article, but there is an inconsistancy. You say that thanks to there only being 2 TDMS controllers, you can't use both DVI connectors at the same time as the HDMI output for three displays, but then go onto say later that you can use the DVI(x2), DP and HDMI in any combination to drive 3 displays. Which is correct?

    Also, can you play HDCP protected content (a Blu-Ray disc for example) over a panel connected to a Display Port connector?

    Otherwise, thanks for the review!
  • Ryan Smith - Wednesday, September 23, 2009 - link

    It's the former that is correct: you can only drive two TDMS devices. The article has been corrected.

    And DP supports HDCP, so yes, protected content will play over DP.
  • SJD - Friday, September 25, 2009 - link

    Thanks for clarifying that Ryan - It confirms what I thought.. :-)
  • chowmanga - Wednesday, September 23, 2009 - link

    I'd like to see a benchmark using an amd cpu. I think it was the Athlon II 620 article that pointed out how Nvidia hardware ran better on AMD cpus and AMD/ATI cards ran better on Intel cpus. It would be interesting to see if the 5870 stacks up against Nv's current gen with other setups.

Log in

Don't have an account? Sign up now