A very early look at the future of Catalyst
AMD let us spend some time with a very early prototype driver that attempts to implement a software frame pacing algorithm.
Today is a very interesting day for AMD. It marks both the release of the reference design of the Radeon HD 7990 graphics card, a dual-GPU Tahiti behemoth, and the first sample of a change to the CrossFire technology that will improve animation performance across the board. Both stories are incredibly interesting and as it turns out both feed off of each other in a very important way: the HD 7990 depends on CrossFire and CrossFire depends on this driver.
If you already read our review (or any review that is using the FCAT / frame capture system) of the Radeon HD 7990, you likely came away somewhat unimpressed. The combination of a two AMD Tahiti GPUs on a single PCB with 6GB of frame buffer SHOULD have been an incredibly exciting release for us and would likely have become the single fastest graphics card on the planet. That didn't happen though and our results clearly state why that is the case: AMD CrossFire technology has some serious issues with animation smoothness, runt frames and giving users what they are promised.
Our first results using our Frame Rating performance analysis method were shown during the release of the NVIDIA GeForce GTX Titan card in February. Since then we have been in constant talks with the folks at AMD to figure out what was wrong, how they could fix it, and what it would mean to gamers to implement frame metering technology. We followed that story up with several more that showed the current state of performance on the GPU market using Frame Rating that painted CrossFire in a very negative light. Even though we were accused by some outlets of being biased or that AMD wasn't doing anything incorrectly, we stuck by our results and as it turns out, so does AMD.
Today's preview of a very early prototype driver shows that the company is serious about fixing the problems we discovered.
If you are just catching up on the story, you really need some background information. The best place to start is our article published in late March that goes into detail about how game engines work, how our completely new testing methods work and the problems with AMD CrossFire technology very specifically. From that piece:
It will become painfully apparent as we dive through the benchmark results on the following pages, but I feel that addressing the issues that CrossFire and Eyefinity are creating up front will make the results easier to understand. We showed you for the first time in Frame Rating Part 3, AMD CrossFire configurations have a tendency to produce a lot of runt frames, and in many cases nearly perfectly in an alternating pattern. Not only does this mean that frame time variance will be high, but it also tells me that the value of performance gained by of adding a second GPU is completely useless in this case. Obviously the story would become then, “In Battlefield 3, does it even make sense to use a CrossFire configuration?” My answer based on the below graph would be no.
An example of a runt frame in a CrossFire configuration
NVIDIA's solution for getting around this potential problem with SLI was to integrate frame metering, a technology that balances frame presentation to the user and to the game engine in a way that enabled smoother, more consistent frame times and thus smoother animations on the screen. For GeForce cards, frame metering began as a software solution but was actually integrated as a hardware function on the Fermi design, taking some load off of the driver.
Until today, AMD did not integrate any kind of frame metering on multi-GPU solutions and simply rendered frames as quickly as possible when the game engine asked them to. That might seem like the best answer without doing any analysis and that is likely the same conclusion AMD came to. But as it turns out, as we have proven in our various benchmark results and video comparison, that just isn't true. All animations are not created equal.
AMD came to me last week with a prototype driver that integrates a software frame metering or frame pacing technology. What is important here is that AMD is having to rebuild the driver pipeline around this software model and as such it is going to take some time to get it 100% correct. Also, because the company started work on this over a month ago, the base driver version for this prototype driver is something in the 13.2 stack – not the 13.5 used in our Radeon HD 7990 review.
What changes in the new driver? A new algorithm is being implemented that measures frame render times on a continuous basis to determine how long that frame should be displayed on the screen. AMD is calling this measurement the game's "heartbeat" and that information is used to insert a delay into the Present() call return going back to the game. The Present() call is used by the game to know when a frame has been rendered and another is ready to be taken by the GPU for work.
Previously, in GPU-bound instances, AMD was actually sending Present() complete calls at almost the same time, to which the game replied with data that was similarly close together. When both GPUs rendered the frames, they rendered it about the same speed (since the scenes are so similar) and thus they were presented in a nearly completely overlapped way, resulting in the very small slivers of frames shown on the screen: runts. Essentially adding an offset to frames being rendered.
In this diagram, the unmetered display output shows runts because of unevenly paced frames. The metered output adds a little delay but produces a better overall animation.
As the workload changes AMD is able to update the frame delay offset in real time. If frames begin to take longer to render due to a change in scenery, then the driver will add more delay into the next present call in preparation to have balanced frame presentation on the screen. It can seem counter-intuitive to introduce latency into the game engine pipe to make things smoother, but in truth we are oversimplifying the problem in our explanation.
I asked AMD about a "polling time" associated with this new measurement and was told that in fact it was continuous because of its complete integration into the rendering pipeline. This will likely add some CPU overhead in the driver but it would appear pretty minimal compared to the work that a typical GPU driver is handling already.
There is still a lot of work to be done on the prototype driver that AMD is showing here today that includes tweaking the algorithm for individual games and fine tuning of the implementation. But for a first attempt and a very quick turnaround, we are pretty impressed with the results on the following pages.
Many more results on the coming pages…
AMD is still planning on releasing this driver in a beta form in the summer but I wouldn't be surprised to see the schedule moved up a bit with some pressure with the Radeon HD 7990 release and better than expected results thus far. AMD continues to promise the ability to enable and disable this feature in the control panel as well as to enable it on a per-game basis, something that NVIDIA hasn't done yet. There are debates on whether or not there are actually input latency benefits to AMD's current method and we are still finding a way to test that at PC Perspective.
Download the 250MB MP4 from Mega.co.nz
Reports from most users are telling us that you NEED to download these files for a solid comparison!
Crysis 3 – 13.5 beta vs Prototype 2 Comparison
One thing to note: this fix does not yet address Eyefinity + CrossFire problems. The prototype and the current implementation of the fix are only going to address single monitor configurations due to the differences in how the multiple rendered images are composited. Resolutions up to 2560×1600 are handled by a hardware compositor while the 5760×1080 and above Eyefinity resolution use a software implementation that is apparently much more complex (and causes quite a few graphical issues we'll dive into later).
How We Tested
Our testing was done with the exact same setup as our recently published Radeon HD 7990 review. Except this time I have dropped the results from the Radeon HD 7970s in CrossFire in favor of the new HD 7990 results with the Prototype 2 driver. Due to limited time and the fact that the Eyefinity results were unaffected, you are only going to see 2560×1440 results for now!
Test System Setup | |
CPU | Intel Core i7-3960X Sandy Bridge-E |
Motherboard | ASUS P9X79 Deluxe |
Memory | Corsair Dominator DDR3-1600 16GB |
Hard Drive | OCZ Agility 4 256GB SSD |
Sound Card | On-board |
Graphics Card |
AMD Radeon HD 7990 6GB NVIDIA GeForce GTX TITAN 6GB NVIDIA GeForce GTX 690 4GB |
Graphics Drivers |
AMD: 13.5 beta (HD 7990) AMD: Frame Pacing Prototype 2 (HD 7990) NVIDIA: 314.07 |
Power Supply | Corsair AX1200i |
Operating System | Windows 8 Pro x64 |
What you should be watching for
- HD 7990 13.5 beta vs HD 7990 Prototype 2 – Here's the big question – what changes and by how much? Ideally we want to see more consistent frame times in our Frame Rating system.
- HD 7990 Prototype 2 vs GTX 690 – If the driver works as we were promised, how does it affect the performance compared to the GTX 690?
- HD 7990 Prototype 2 vs GTX Titan – Same here for the Titan!
I feel honored that were on
I feel honored that were on the eve of having a new and much better way of measure perofrmance.
Ryan, i dont doubt that eventually you’ll be known as the father of frame rating.
Keep up the awesome work, and ignore teh bias comments.
What about 120 hz
What about 120 hz monitors.with a 120 hz output i suppose results should be better, since shorter frames would be visualized in a single refresh cycle instead of create tearing
Have you asked them about all
Have you asked them about all the tearing?
Obviously the fluidity of the game play is significantly improved, however it seems like tearing is much worse.
BF3 was tearing the entire time it seemed.
What do you mean “tearing is
What do you mean “tearing is much worse”?
“Tearing” is essential part of game playing experience. More tears – better experience.
I mean download the BF3 vid
I mean download the BF3 vid and watch for tearing at 50% and 20% speeds. Its bad, and i would definitely argue that tearing like in this example is degrading the game experience.
Well, the thing is that you
Well, the thing is that you either have vsync on preventing the tearing or you have max framerate.
The issue with vsync is that your performance will have to be in direct relation to your displays frequency, so with a standard 60hz display your fps would either be 60, 30, 20, 15, 12, 10… and with a 120hz display you would have 120, 60, 40, 30, 24, 20, 17.14, 15, 10.91, 10 ….
With vsync on WYSIWYG, meaning that the shown and perceived framerate is identical as each time a full frame will be shown, so no “runt” will ever exist.
After saying all this i now notice that you didn’t understand properly the problem discussed with multigpu setups, look at the “monitor output by dual gpus” graphic and notice that the frame metered solution doesn’t show a full frame because of it’s nature, what it does is that the ammount of frame shown from each gpu is cut to be aproximately of the same size, and this more homogeneization of frame time is what gives fluidity to the experience.
would this be same for Hd
would this be same for Hd 7970 crossfire? if so i’m going upgrade to them xD
Kudos due where its due.
Kudos due where its due. Great review.
Unlike Hardocp who didn’t even mention the prototype driver and slammed the 7990 in its current driver state, atleast pcper mentioned the future fix, and they even went further showing results in a seperate review. Thats about as fair as you can be.
Even though this driver will be years late for non vsync xfire users (vsync + rp dfc fixed this issue long ago for me), but better late than never for those who game without that configuration and really one of nvidias biggest multi gpu advantages is about to dissapear , as results even this early are very encouraging.
i think that you didn’t
i think that you didn’t understand very well the issue…first of all indeed it’s not only a crossfire issue, the new drivers will fix everything about the frame generation, after that the frame is put into the frame buffer from the pipeline…so it will be also a single config improvement. And also the v-sync function will work better and it will benefit, as it’s always related to the frame buffer, from which vsync pulls the frames from.
What was the average load
What was the average load power consumption with the prototype driver?
I would expect it to be lower, since a metered gpu setup could be doing only 50-60% of the work of an unmetered setup, as shown in the first page of the article.
I didn’t test exactly, but it
I didn't test exactly, but it should be the same. We are rendering the same number of frames, they are just spaced differently.
You are rendering less
You are rendering less frames, because the metered GPU will try not render any frames that will be runt (in a best case scenario).
In your example image, in page 1, the unmetered GPU will render ~6 frames per draw time vs ~4 frames on the metered GPU per draw time.
Instead of having both GPUs at 100% on an unmetered setup, the metered setup works more in tandem, which should reflect less power consumption. 🙂
Since the prototype driver is
Since the prototype driver is not effecting Fraps FPS much at all, it is not effecting total frames rendered much at all, and in return, the power usage is going to hardly be effected.
Frame metering doesn’t require a lot of adjustments to work, at least in their current state, so I doubt there is much of a difference in power usage.
The prototype driver is
The prototype driver is affecting “Observed FPS”.
The difference between drivers in “Observed FPS” can be used as an estimate for the amount of discarded or runt frames. The only way to eliminate those is by having one GPU wait until the other is almost done rendering a frame, a “tandem” setup. If you are having one GPU waiting, it’s not consuming power.
You guys are looking at frame averages (fps) without looking at the frame distribution underneath. The prototype driver achieves the same fps numbers doing *less* work than the normal driver, since its *efficiency* is higher.
Looking at the frame variance charts, we can estimate the time each GPU is waiting in the prototype driver — e.g., for BF3, each will wait around 10ms. In the same chart, we can see that the normal driver keeps both GPU always at 100%, that’s why frames vary ~[0,20]ms.
So, not only does the normal driver spends time rendering frames that won’t be seen, it renders them inconsistently.
Bottom line, from the moment a driver introduces any kind of time delay into the pipeline, there *has* to be reduced power usage, since Watt equals to Joule/second.
You should be using total FPS
You should be using total FPS from Fraps, when talking about the work the GPU does. It doesn’t matter if it is a runt or not, the GPU still rendered the whole frame. It just gets overwritten by the 2nd GPU’s image. In terms of power usage, Fraps has the more accurate picture.
In terms of what we find useful, that is where Observable FPS comes into play.
Yes, you are correct! 🙂 The
Yes, you are correct! 🙂 The unmetered vs metered pic on the first page of the article is misleading and led me to believe there could be more delays introduced than there really are.
As you said, and is shown by total fps from fraps, both setups are rendering the same amount of frames (so, doing the same work), the metered setup just has a different “starting point” (see my comment below).
Still, I guess most of the
Still, I guess most of the time the power consumption between both (metered vs unmetered) will be equal, as when rendering a constant framerate both GPUs will be at 100% usage with one starting rendering when the other has its frame ~50% complete.
Only when there are more complex scenes where the framerate starts dipping (more time to render each frame) there is a need to pause one of the GPUs untill the other is at ~50% frame completion. The reverse case is also an interesting conundrum. 🙂
You guys have been doing such
You guys have been doing such an impressive work. Hands down…
Are you guys able to get
Are you guys able to get 120hz through your testing setup with reduced blanking?
From what I understand,
From what I understand, saving the data from a 60hz run, requires a RAID 0 setup with SSD’s, otherwise the storage can’t keep up. I believe THG required 5 SSD’s in RAID 0 just to be able to store the data fast enough so they can analyze it later.
I’m guessing there are technical roadblocks for 120hz to be analyzed like this. The FCAT card may also not be capable of taking in data that fast either.
Well, they managed to do it
Well, they managed to do it with a 4K display, so it should be possible on the storage side, but the FCAT card still has to be able to handle 120hz.
Interesting, thanks.
Interesting, thanks.
Is there anyway to get a hold
Is there anyway to get a hold of the prototype driver?
I don’t want to give you a
I don’t want to give you a big head or anything, but might you have helped AMD with a vital bit of debugging/testing that they never thought of? What does this do to 3x 7970s? I also want to know if they will backport the update to 6870/6970/6990s if this is possible. If you can do anything else like this to give them more data to help out AMD card users, we will be eternally grateful, though I’m not sure NVidia will be happy about it.
I’ve read another review
I’ve read another review about 7990’s performance, FCAT, prototype driver, etc, basically the same thing, although there was something i noticed.
In their benchmarks the hardware (FRAPS) FPS was also quite bigger for the prototype d. compared to the basic catalyst 7990. While for the observable FPS this difference is easy understandable, i found no explanation for the extra FPS.
Prototype driver just “arranges” the frames in a smoother way, but why it appears to produce more frames?
The review is on Tom’s
The review is on Tom’s hardware. You can notice that there are a lot of differences in frames rates (both observable and FRAPS) between the 7990’s prot. driver and normal one.
The review is on Tom’s
The review is on Tom’s hardware. You can notice that there are a lot of differences in frames rates (both observable and FRAPS) between the 7990’s prot. driver and normal one.
The review is on Tom’s
The review is on Tom’s hardware. You can notice that there are a lot of differences in frames rates (both observable and FRAPS) between the 7990’s prot. driver and normal one.
it is possible to compare a
it is possible to compare a video of a 690 vs. 7990
I installed my 7990 last
I installed my 7990 last night and went straight to BF3…Frame Rates sucked! I was expecting much more from this card. I am reading where it is a driver issue and that the driver to fix these issues should have been released already. Is it out? How do I fix this issue so I get the card to work better than the GTX 570 I used to have??