Lucid's Multi-GPU Wonder: More Information on the Hydra 100
by Derek Wilson on August 22, 2008 4:00 PM EST- Posted in
- GPUs
Barriers to Entry and Final Words
Depending on the patents Lucid has, neither NVIDIA nor ATI may be able to build a competing bit of hardware / software for use in their own solutions. And then there is the quesetion: what will NVIDIA and ATI attempt to do in order to be anticompetitive (err, I mean to continue to promote their own solutions to or platforms surrounding multi-GPU).
Because of the fact that both NVIDIA and ATI already participate in anti-competitive practices by artificially limiting the functionality of their hardware on competing platforms, it doesn't seem like a stretch to think they'll try something here as well. But can they break it?
Maybe and maybe not. At a really crappy level they could detect whether or not the hardware is in the system and refuse to do anything 3D. If they're a little nicer they could detect whether the Hydra driver is running and refuse to play 3D while it is active. Beyond that it doesn't seem like there is really much room to do anything like they've been doing. The Lucid software and hardware is completely transparent to the game, the graphics driver and the hardware. None of those components need to know anything for this to work.
As AMD and NVIDIA have to work closely with graphics card and motherboard vendors, they could try and strong arm Lucid out of the market by threatening either (overtly or not) the supply of their silicon to certain OEMs. This could be devastating to Lucid, as we've already see what the fear of an implication can do to software companies in the situation with Assassin's Creed (when faced with the option of applying an already available fix or pulling support for DX10.1 which only AMD supports, they pulled it). This type of thing seems the largest unknown to us.
Of course, while it seems like an all or nothing situation that would serve no purpose but to destroy the experience of end users, NVIDIA and ATI have lots of resources to work on this sort of "problem" and I'm sure they'll try their best to come up with something. Maybe one day they'll wake up and realize (especially if one starts to dominate over the other other) that Microsoft and Intel got slammed with antitrust suits for very similar practices.
Beyond this, they do still need to get motherboard OEMs to place the Hydra 100 on their boards. Or they need to get graphics hardware vendors to build boards with the hardware on them. This increases cost, and OEMs are really sensitive to cost increases. At the same time, a platform that can run both AMD and NVIDIA solutions in multi-GPU configurations has added value. As does a single card multi-GPU solution that gets better performance than even the ones from AMD and NVIDIA.
The parts these guys sell will still have to compete in the retail market, so they can't price themselves out of competition. More performance is great, but they have to worry about price/performance and their own cost. We think this will be more attractive to high end motherboard vendors than anyone else. And we really hope Intel adopts it and uses instead of nForce 100 or nForce 200 chips to enable flexible multi-GPU. Assuming it works of course.
Anyway, Lucid's Hyrda 100 is a really cool idea. And we really hope it works like Lucid says it will. Most of the theory seems sound, and while we've seen it in action, we need to put it to the test and look hard at latency and scaling. And we really really want to get excited. So we really really need hardware.
57 Comments
View All Comments
haplo602 - Sunday, August 24, 2008 - link
The more I am reading about this Hydra thing, the more I believe it will turn out to be a hoax. Look at the thing in a logical way.1. we want to achieve multi-gpu scaling as best as possible
2. we cannot manipulate the scene data, since we don't know what the scene rendered actualy is (we can't identify object in a reasonable way)
3. the existing cards are already fast enough in actualy renderingthe scene
This boils down to an engine that offloads the actual scene set-up. If you look at the current SLI/CF mechanics, they either work in AFR mode or in split render mode. ATI/NVIDIA know enough about graphics to get to the same ideas Lucid did. However they abandoned the approach for some reason. That reason is consistency.
You cannot pick objects from a scene in any reliable way. Of course there are ways to separate objects. After all the programmer will usualy send one stream of rendering commands for one object etc. But that is not the rule.
You cannot do scene set-up on separate objects (things like removing not visible objects or parts of them) unless you are using some kind of z-buffer manipulation at the end.
I know very little about shader programs to tell how they work, but they also seem like a major issue in splitting a scene.
ATI/NVIDIA approach is the only reasonable one, and the only reason why they don't scale linarly is the scene set-up step. Each card has to do the same scene set-up every frame, thus this is the one thing that cannot be paralelised in a reasonable way and is lowering the gain in performance.
If Lucid found a way to do a scene set-up only once and split it to relevant parts for each card, they will have grave issues with optimised rendering paths for different DX/OGL/card versions. At one time, they will exhibit the same issues current CF/SLI does.
ATI/NVIDIA can simply implement this in software by making a GPU hypervisor engine.
Clauzii - Sunday, August 24, 2008 - link
Good post! Thumbs up :)pool1892 - Sunday, August 24, 2008 - link
ya, to me it is sort of the other way round - and still i agree. i am not sure what to expect, this is a technique i could imagine working.but it seems to be a job for a much stronger hardware - there is pattern recognition, on the fly optimization and balancing (different games will cleary be limited by different stages of the hardware rendering pipeline), qos (no latencies and sync) and many other things.
i have a hard time believing that this little programmable chip can do that amount of work without utilizing the cpu and without a local memory besides 16+16k L1, while it has to handle massive throughput.
so either they have found a REALLY clever trick or amd and nvidia could do the same, from a much better position, being in control of the complete environment. and well: why haven't they?
LOPOPO - Sunday, August 24, 2008 - link
If this thing works at it claims...... I would not be surprised. We know the problem with SLI/CS. Management pure and simple. The fact that we are all so astounded by this box speaks volumes to how much we are used to being screwed by Nvidia and ATI/AMD. It is obvious that Hydra allocates system resources far better that current solutions. The fact that it can do this and draw 5w (supposedly) just goes to show you how flawed SLI/CS really are.This seemingly, impending paradigm shift is occurring because card makers have a one track mind -bigger is better-. Add more memory...add more speed...more stream processors throw in ridiculous names then that equals success, bu not really. For them(AMD/Nvidia) yes, for you...somewhat... depending on how you shop. Nowadays performance demands are higher than ever and AMD/Nvidia solutions always = more power draw which creates more heat which must be dissipated which of course necessitates a larger profile card and cooler. Extremely inefficient.
It appears as if these newcomers are not trying to fit a square peg in a round hole. Can or could established card makers do this or something like this solution? Of course. But why when the consumer is perfectly happy spending ridiculous amounts of money for an extra 10 fps...AMD/Nvidia keep costs down and maximize profit it's all good for them. Consumers on the other hand rarely see the big picture. Such is the way this sector of the economy works, faster, more memory, die shrinks... never smarter, leaner, more efficient and the ever elusive: dynamic software/hardware architecture that adjust to given tasks. Those are my two cents and all of the above is contingent on the validity of Lucid's claims. I hope they are more valid than Nvidia's claims of 60% scaling in Crysis.
jeff4321 - Saturday, August 23, 2008 - link
C'mon, how can they perform better than AMD's Crossfire or NVIDIA's SLI? Teams at AMD and NVIDIA know the intimate details of their boards. They know what they're doing.Besides, someone could implement this kind of solution w/o hardware (the hardware is probably there to prevent folks from running the software w/o the Company getting revenue). Most likely what this hardware and software is doing is that their API interception code is directing all of the underlying cards to render parts of the frame to a surface on the framebuffer. The framebuffer is transferred to system memory. And then, depending on how you want to do things, you composite in system memory, or you direct the video card that is driving the video buffer to treat the system memory surface as an overlay surface.
All of this doesn't require magic hardware (unless you want to go really fast). This is how SLI and Crossfire work. Since AMD and NVIDIA designed their hardware and software, they can add hardware acceleration magic (things like synchronizing the two boards' scanout, directly transferring scanout data through the sli or crossfire cable, or making groups of boards look like one). Unfortunately for Lucid, I doubt that AMD or NVIDIA gave them any secret sauce so Lucid cannot leverage the hardware acceleration.
Their ASIC is just a PCIe switch with an endpoint device for software security.
whatthehey - Saturday, August 23, 2008 - link
I'm glad you're so incredibly knowledgeable that you can say what something does and how it works without ever seeing it or working on the project. Obviously nVidia and ATI don't want to give away their secrets, just like Lucid isn't going to give away theirs. Will this work? We don't know for sure yet. Is it better than SLI and Crossfire? We don't know that either. What I do know for certain is that there are plenty of games that are GPU limited that still don't get better than 30 to 50% scaling with current SLI/Crossfire. More than that, I know that most games don't come anywhere near even 50% scaling when going from dual GPUs to quad GPUs.I think the whole point of this chip is to do the compositing and splitting up of rendering tasks "really fast". I also think that the current ATI and nVidia solutions are less than ideal, given we need custom profiles for every game in order to see any benefit. What I'm most worried about is that the Lucid chip will just transfer the need for custom profiles from nVidia and ATI over to Lucid - a completely unproven company at this point.
For now, I'm interested in seeing concrete numbers and independent testing. The world is full of successful inventions that were deemed impossible or "smoke and mirrors" by dullards that just couldn't think outside the box. This Hydra chip may turn out to be exactly what you state, but I'm more inclined to wait and see rather than trusting on people like you to tell us what can and can't be done.
shin0bi272 - Saturday, August 23, 2008 - link
Im with Whatthehey. You are lucky to get 40 or 50% performance boost with current multi-gpu solutions and IIRC the game has to support either crossfire or sli. So if you are running say UT3 and have crossfire you are SOL for getting ANY boost if you are using AMD's crossfire. BUUUUT if the hydra tech works as advertised (or even close to it) it will be night and day to current solutions.If this chip is even exclusive to intel's mobos it will outperform either solution from amd/nvidia since it isnt alternating screens or portions of the screen via hardware over a tiny bridge (which adds latency). This chip is sort of like the hardware Xor chip on a raid5 card in that it just makes a decision on what card to send data to. The hydra's ONLY job is to intercept a data command being sent to the graphics card(s) and send it to the one that's not working as hard or is ready for a new operation. That doesnt take a lot of power or time as long as the software is efficient in telling the chip what graphics card(s) you have.
I read another comment that said: "the hydra is a tensilica diamond based programmable risc controller with custom logic around it running at 225mhz. it uses about 5watt."
For an explanation of RISC vs CISC visit: http://cse.stanford.edu/class/sophomore-college/pr...">http://cse.stanford.edu/class/sophomore-college/pr...
This chip does essentially 1 thing and does it very very very fast.
pool1892 - Sunday, August 24, 2008 - link
i made the tensilica 5watt risc chip comment - and the thing that is most interesting to me is that it is programmable to an extend. it is maybe best to imagine a dsp with a multitude of presets, each of which accelerates a different load. if i understand it correctly, hydra will autooptimize itself to suit different applications. this way you get near dsp throughput for many different usage models (that is different games) and you do not need the spezial units big fpga chips have.i just wonder where this optimization takes place, since hydra only has 16+16k of memory - and liquid talks about very low cpu utilization. (we are talking about a basic KI engine or really large table lookups)
risc v cisc is no business here, there are no real cisc chips left in the market (macro/micro ops and so on - this is gone since pentiumpro and the "weird shift from alpha to athlon"TM^^)
jeff4321 - Saturday, August 23, 2008 - link
If it is strictly software solution (where they call into DX for the multiple boards and eventually the rendered data makes it into system memory and the master board outputs the frame from system memory), of course it will work. Will it be fast and responsive? I don't know. If it is, you will see the same improvement in SLI or Crossfire because NVIDIA or ATI will figure out how the Lucid software is configuring their device. If you look at the block diagrams in the article, Lucid uses application profiles to determine how to configure the devices.A good comparison to Lucid's system is ATI's Software Crossfire (the Crossfire solution after the master-slave boards, but before Crossfire X cable like NVIDIA's SLI). Since ATI no longer runs this way, the Crossfire X solution is probably better. I doubt that ATI would stop using the software approach to multi-GPU solutions unless there were a benefit; the Crossfire X port makes the silicon bigger and it makes the board cost more because of the board traces and physical port.
I doubt that their hardware does any compositing for the video stream. That would involve reverse engineering how each device driver talks to the board. Not impossible, just unlikely because of the effort. (Also, interacting with the ATI and NVIDIA device drivr would be quite dangerous because each device driver assumes that it is in control of the hardware. The Lucid hardware or software, if it talks to the hardware directly, would make the driver and the board incoherent and lead to system crash)
The smoke and mirrors to this is the requirement for their ASIC. The actual approach is the tried and true solution for graphics hardware: the computation for the color values for each pixel is (mostly) independent of an adjacent pixel; therefore, you just add more hardware to make it faster.
JarredWalton - Sunday, August 24, 2008 - link
You know, doing it in software makes SLI and CF more CPU limited than single GPUs, so unless you're really GPU limited scaling isn't as good as it could be. The whole point of this ASIC seems to be to handle the compositing and assignment of tasks in hardware, thus making it faster and alleviating the CPU of handling such tasks. That's not smoke and mirrors to me... at least, not if it works.It seems like we're still six months or so away from seeing actual hardware in our hands. My impression is also that their goal is to get the hardware to split up generic DX/OGL streams even if it doesn't have a profile, though with a profile it could do a better job. Also, judging by the http://www.dailytech.com/Chipmaker+Hydras+Stunning...">images we've been shown (http://www.pcper.com/article.php?aid=607">more details here, the breaking up of tasks and compositing is FAR more involved than what SLI and CF are doing, and probably makes more sense. (I wasn't at IDF, so I didn't see this in person.)
"Tried and true" has a few synonyms you might want to put in there instead. "Conservative" is one, and so is "stagnation". Just like AMD stagnated with Athlon 64, NVIDIA and ATI seem to be dragging their heels when it comes to true innovation in the GPU industry. GPGPU is the most interesting thing to come out in the past few years, and what do we get? Two proprietary approaches to GPGPU, so that developers need to code for either NVIDIA *or* ATI -- or do twice as much work to support both.
That's a lot like SLI, where NVIDIA wants us to use their GPUs with *their* chipset, and they have been aggressive in preventing other companies from supporting SLI without help from NVIDIA. (ATI is only marginally better - unless something has changed and CF now runs on SLI chipsets without a custom BIOS? But at least ATI will license the tech to Intel.) It would hardly be surprising if a third party were to come out and say "*BEEP* you guys! I'm going to do this in an agnostic fashion and let the users decide."
Whether or not the Lucid Hydra chip works, I can't imagine anyone outside of NVIDIA and ATI employees actually wanting it to fail. You might as well bury your head in the sand and scream loudly that you want all competition and progress to stop. (It won't, of course, but at least if your head is buried you won't be able to tell the difference.)