Better Image Quality: CSAA & TMAA

NVIDIA’s next big trick for image quality is that they’ve revised Coverage Sample Anti-Aliasing. CSAA, which was originally introduced with the G80, is a lightweight method of better determining how much of a polygon actually covers a pixel. By merely testing polygon coverage and storing the results, the ROP can get more information without the expense of fetching and storing additional color and Z data as done with a regular sample under MSAA. The quality improvement isn’t as pronounced as just using more multisamples, but coverage samples are much, much cheaper.


32x CSAA sampling pattern

For the G80 and GT200, CSAA could only test polygon edges. That’s great for resolving aliasing at polygon edges, but it doesn’t solve other kinds of aliasing. In particular, GF100 will be waging a war on billboards – flat geometry that uses textures with transparency to simulate what would otherwise require complex geometry. Fences, leaves, and patches of grass in fields are three very common uses of billboards, as they are “minor” visual effects that would be very expensive to do with real geometry, and would benefit little from the quality improvement.

Since billboards are faking geometry, regular MSAA techniques do not remove the aliasing within the billboard. To resolve that DX10 introduced alpha to coverage functionality, which allows MSAA to anti-alias the fake geometry by using the alpha mask as a coverage mask for the MSAA process. The end result of this process is that the GPU creates varying levels of transparency around the fake geometry, so that it blends better with its surroundings.

It’s a great technique, but it wasn’t done all that well by the G80 and GT200. In order to determine the level of transparency to use on an alpha to coverage sampled pixel, the anti-aliasing hardware on those GPUs used MSAA samples to test the coverage. With up to 8 samples (8xQ MSAA mode), the hardware could only compute 9 levels of transparency, which isn’t nearly enough to establish a smooth gradient. The result was that while alpha to coverage testing allowed for some anti-aliasing of billboards, the result wasn’t great. The only way to achieve really good results was to use super-sampling on billboards through Transparency Super-Sample Anti-Aliasing, which was ridiculously expensive given that when billboards are used, they usually cover most of the screen.

For GF100, NVIDIA has made two tweaks to CSAA. First, additional CSAA modes have been unlocked – GF100 can do up to 24 coverage samples per pixel as opposed 16. The second change is that the CSAA hardware can now participate in alpha to coverage testing, a natural extension of CSAA’s coverage testing capabilities. With this ability CSAA can test the coverage of the fake geometry in a billboard along with MSAA samples, allowing the anti-aliasing hardware to fetch up to 32 samples per pixel. This gives the hardware the ability to compute 33 levels of transparency, which while not perfect allows for much smoother gradients.

The example NVIDIA has given us for this is a pair of screenshots taken from a field in Age of Conan, a DX10 game. The first screenshot is from a GT200 based video card running the game with NVIDIA’s 16xQ anti-aliasing mode, which is composed of 8 MSAA samples and 8 CSAA samples. Since the GT200 can’t do alpha to coverage testing using the CSAA samples, the resulting grass blades are only blended with 9 levels of transparency based on the 8 MSAA samples, giving them a dithered look.


Age of Conan grass, GT200 16x AA

The second screenshot is from GF100 running in NVIDIA’s new 32x anti-aliasing mode, which is composed of 8 MSAA samples and 24 CSAA samples. Here the CSAA and MSAA samples can be used in alpha to coverage, giving the hardware 32 samples from which to compute 33 levels of transparency. The result is that the blades of grass are still somewhat banded, but overall much smoother than what the GT200 produced. Bear in mind that since 8x MSAA is faster on the GF100 than it was GT200, and CSAA has very little overhead in comparison (NVIDIA estimates 32x has 93% of the performance of 8xQ), the entire process should be faster on GF100 even if it were running at the same speeds as GT200. Image quality improved, and at the same time the performance improved too.


Age of Conan grass, GF100 32x AA

The ability to use CSAA on billboards left us with a question however: isn’t this what Transparency Anti-Aliasing was for? The answer as it turns out is both yes and no.

Transparency Anti-Aliasing was introduced on the G70 (GeForce 7800GTX) and was intended to help remove aliasing on billboards, exactly what NVIDIA is doing today with MSAA. The difference is that while DX10 has alpha to coverage, DX9 does not – and DX9 was all there was when G70 was released. Transparency Multi-Sample Anti-Aliasing (TMAA) as implemented today is effectively a shader replacement routine to make up for what DX9 lacks. With it, DX9 games can have alpha to coverage testing done on their billboards in spite of DX9 not having this feature, allowing for image quality improvements on games still using DX9. Under DX10 TMAA is superseded by alpha to coverage in the API, but TMAA is still alive and well due to the large number of older games using DX9 and the large number of games yet to come that will still use DX9.

Because TMAA is functionally just enabling alpha to coverage on DX9 games, all of the changes we just mentioned to the CSAA hardware filter down to TMAA. This is excellent news, as TMAA has delivered lackluster results in the past – it was better than nothing, but only Transparency Super-Sample Anti-Aliasing (TSAA) really fixed billboard aliasing, and only at a high cost. Ultimately this means that a number of cases in the past where only TSAA was suitable are suddenly opened up to using the much faster TMAA, in essence making good billboard anti-aliasing finally affordable on newer DX9 games on NVIDIA hardware.

As a consequence of this change, TMAA’s tendency to have fake geometry on billboards pop in and out of existence is also solved. Here we have a set of screenshots from Left 4 Dead 2 showcasing this in action. The GF100 with TMAA generates softer edges on the vertical bars in this picture, which is what stops the popping from the GT200.


Left 4 Dead 2: TMAA on GT200


Left 4 Dead 2: TMAA on GF100

Better Image Quality: Jittered Sampling & Faster Anti-Aliasing Applications of GF100’s Compute Hardware
Comments Locked

115 Comments

View All Comments

  • dentatus - Monday, January 18, 2010 - link

    " Im sure ATi could pull out the biggest, most expensive, hottest and fastest card in the world"- they have, its called the radeon HD5970.

    Really, in my Australia, the ATI DX11 hardware represents nothing close to value. The "biggest, most expensive, hottest and fastest card in the world" a.k.a HD5970 weighs in at a ridiculous AUD 1150. In the meantime the HD5850 jumped up from AUD 350 to AUD 450 on average here.

    The "smaller, more affordable, better value" line I was used to associating with ATI went out the window the minute their hardware didn't have to compete with nVidia DX11 hardware.

    Really, I'm not buying any new hardware until there's some viable alternatives at the top and some competition to burst ATI's pricing bubble. That's why it'd be good to see GF100 make a "G80" impression.
  • mcnabney - Monday, January 18, 2010 - link

    You have no idea what a market economy is.

    If demand outstrips supply prices WILL go up. They have to.
  • nafhan - Monday, January 18, 2010 - link

    It's mentioned in the article, but nvidia being late to market is why prices on ATI's cards are high. Based on transistor count, etc. There's plenty of room for ATI to drop prices once they have some competition.
  • Griswold - Wednesday, January 20, 2010 - link

    And thats where the article is dead wrong. For the most part, the ridiculous prices were dictated by low supply vs. high demand. Now, we finally arrived at decent supply vs. high demand and prices are dropping. The next stage may be good supply vs normal demand. That, and no second earlier, is when AMD themselves could willingly start price gouging due to no competition.

    However, the situation will be like this long after Thermi launched for the simple reason, that there is no reason to believe that Thermi wont have yield issues for quite some time after they have been sorted out for AMD - its the size of chipzilla that will give it a rough time for the first couple of months, regardless of its capabilities.
  • chizow - Monday, January 18, 2010 - link

    I'm sure ATI would've if they could've instead of settling for 2nd place most of the past 3 years, but GF100 isn't just about the performance crown, its clearly setting the table for future variants based on its design changes for a broader target audience (think G92).
  • bupkus - Monday, January 18, 2010 - link

    So why does NVIDIA want so much geometry performance? Because with tessellation, it allows them to take the same assets from the same games as AMD and generate something that will look better. With more geometry power, NVIDIA can use tessellation and displacement mapping to generate more complex characters, objects, and scenery than AMD can at the same level of performance. And this is why NVIDIA has 16 PolyMorph Engines and 4 Raster Engines, because they need a lot of hardware to generate and process that much geometry.

    Are you saying that ATI's viability and funding resources for R&D are not supported by the majority of sales which traditionally fall into the lower priced hardware which btw requires smaller and cheaper GPUs?
  • Targon - Wednesday, January 20, 2010 - link

    Why do people not understand that with a six month lead in the DX11 arena, AMD/ATI will be able to come out with a refresh card that could easily exceed what Fermi ends up being? Remember, AMD has been dealing with the TSMC issues for longer, and by the time Fermi comes out, the production problems SHOULD be done. Now, how long do you think it will take to work the kinks out of Fermi? How about product availability(something AMD has been dealing with for the past few months). Just because a product is released does NOT mean you will be able to find it for sale.

    The refresh from AMD could also mean that in addition to a faster part, it will also be cheaper. So while the 5870 is selling for $400 today, it may be down to $300 by the time Fermi is finally available for sale, with the refresh part(same performance as Fermi) available for $400. Hmmm, same performance for $100 less, and with no games available to take advantage of any improved image quality of Fermi, you see a better deal with the AMD part. We also don't know what the performance will be from the refresh from AMD, so a lot of this needs to take a wait and see approach.

    We have also seen that Fermi is CLEARLY not even available for some leaked information on the performance, which implies that it may be six MORE months before the card is really ready. Showing a demo isn't the same as letting reviewers tinker with the part themselves. Really, if it will be available for purchase in March, then shouldn't it be ready NOW, since it will take weeks to go from ready to shipping(packaging and such)?

    AMD is winning this round, and they will be in the position where developers will have been using their cards for development since NVIDIA clearly can't. AMD will also be able to make SURE that their cards are the dominant DX11 cards as a result.

  • Targon - Wednesday, January 20, 2010 - link

    Why do people not understand that with a six month lead in the DX11 arena, AMD/ATI will be able to come out with a refresh card that could easily exceed what Fermi ends up being? Remember, AMD has been dealing with the TSMC issues for longer, and by the time Fermi comes out, the production problems SHOULD be done. Now, how long do you think it will take to work the kinks out of Fermi? How about product availability(something AMD has been dealing with for the past few months). Just because a product is released does NOT mean you will be able to find it for sale.

    The refresh from AMD could also mean that in addition to a faster part, it will also be cheaper. So while the 5870 is selling for $400 today, it may be down to $300 by the time Fermi is finally available for sale, with the refresh part(same performance as Fermi) available for $400. Hmmm, same performance for $100 less, and with no games available to take advantage of any improved image quality of Fermi, you see a better deal with the AMD part. We also don't know what the performance will be from the refresh from AMD, so a lot of this needs to take a wait and see approach.

    We have also seen that Fermi is CLEARLY not even available for some leaked information on the performance, which implies that it may be six MORE months before the card is really ready. Showing a demo isn't the same as letting reviewers tinker with the part themselves. Really, if it will be available for purchase in March, then shouldn't it be ready NOW, since it will take weeks to go from ready to shipping(packaging and such)?

    AMD is winning this round, and they will be in the position where developers will have been using their cards for development since NVIDIA clearly can't. AMD will also be able to make SURE that their cards are the dominant DX11 cards as a result.

  • chizow - Monday, January 18, 2010 - link

    @bupkus, no, but I can see a monster strawman coming from a mile away.
  • Calin - Monday, January 18, 2010 - link

    "Because with tessellation, it allows them to take the same assets from the same games as AMD and generate something that will look better"

    No it won't.
    If the game will ship with the "high resolution" displacement mappings, NVidia could make use of them (and AMD might not, because of the geometry power involved). If the game won't ship with the "high resolution" displacement maps to use for tesselation, then NVidia will only have a lot of geometry power going to waste, and the same graphical quality as AMD is having.

    Remember that in big graphic game engines, there are multiple "video paths" for multiple GPU's - DirectX 8, DirectX 9, DirectX 10, and NVidia and AMD both have optimised execution paths.

Log in

Don't have an account? Sign up now