Overclocking Intel's New 45nm QX9650: The Rules Have Changed
by Kris Boughton on December 19, 2007 2:00 AM EST- Posted in
- CPUs
The Origins of Static Read Control Delay (tRD)
With over a year of experience overclocking the Core 2 family of processors, we have learned a thing or two. One of the most important items we've learned is that higher FSB settings do not necessarily mean better performance. Understandably, this may come as a shock to some. For whatever reason, even a lot of well-regarded, seasoned overclockers seem to place great value in achieving the highest possible FSB. Based on what we know, we always establish our base target MCH overclock at the same spot - 400MHz FSB with a tRD of 6. The only other potential base MCH target value even worth considering is 450MHz with a tRD of 7, which should only be used when extra memory speed is needed or when a low maximum CPU multiplier becomes a limiting factor. Without getting into too much detail, let's examine what we mean by this.
When it comes to overclocking, the MCH functions as a hybrid of sorts. Like a CPU, it has an upper frequency limit and more voltage can often raise this limit. On the other hand, since it interfaces with memory it also behaves somewhat like memory with internal "timings" whose absolute values derive from the established FSB.
Consider the case of memory rated to run DDR-800 at CAS 3. We can calculate the absolute CAS (Column Address Strobe) delay in a few quick steps. DDR-800, which is in fact double date rate as the name suggests, runs at a base frequency of 400MHz or 400 million cycles per second. Inverting this value tells us the number of seconds per cycle (2.50ns). Finally, multiplying this by the CAS rating tells us the total delay time of 7.5ns (3 x 2.5ns). Likewise, setting a CAS value of 4 results in an absolute CAS delay of 10ns. We can see now why higher CAS values give way to lower memory bandwidths - in the case described above the MCH spends more time "waiting" for data to become available when the memory is set to CAS 4.
tRD in hiding…we promise we didn't make up the horrible "Performance Level" moniker
Arguably, the most important MCH setting when it comes to performance tweaking is the Static Read Control Delay (tRD) value. Like the memory CAS (CL), setting this value is relative to FSB. Case in point, a tRD value of 6, calculated in the same manner as used before, tells us that MCH sets a read delay of 15ns at an FSB of 400MHz. This means that in addition to the time required for the CPU to issue a request for data in memory to the MCH, the time the MCH spends translating and issuing the command to the memory, and the time the memory requires in retrieving the requested data, the MCH will spend an additional 15ns simply waiting for valid data to become available before fulfilling the CPU's original read request. Obviously, anything that can minimize this wait will be beneficial in improving memory read bandwidth and quite possibly overall system performance.
Until recently, direct tRD manipulation by the user was not even possible. In fact, for the longest time BIOS engineers had no choice but to accept this setting as essentially "hard-coded", making MCH performance rather lackluster. The only way to increase memory subsystem performance was to run at higher FSB settings or tighten primary memory timings. At some point, the MCH design teams got tired of the CPU people hogging all the glory and in a well-calculated effort to boost MCH performance exposed this setting for external programming.
The outside world's first introduction to variable tRD settings came when a few overclockers noticed that setting lower MCH "straps" allowed for higher memory bandwidths. What they didn't know at the time was that they had unintentionally stumbled upon tRD. Tricking the motherboard into detecting an installed CPU as an 800 FSB (200MHz) part forced the MCH into setting a lower tRD value than if the FSB were 1066 (266MHz). Consequently, overclocking the system to the same higher FSB value with the lower strap setting yielded higher memory performance. Often times the effect was significant enough that real-world performance was higher even with a lower final FSB. The tradeoff was apparent however: a lower strap meant a lower maximum FSB. The MCH tRD value, just like a memory timing, must eventually be loosened in order to scale higher. What's more, as is the case with memory, additional voltage can sometimes allow the MCH to run with tighter "timings" at higher speeds.
Eventually the inevitable next step in memory performance tuning became a reality. The option to adjust tRD independent of MCH strap selection became part of every overclocker's arsenal. Nowadays the MCH strap setting does little more than determine which memory multiplier ratios are available for use. Although tRD adjustments are now possible in many BIOS implementations, some motherboard manufactures choose to obfuscate their true nature by giving the setting confusing, proprietary names like "Transaction Booster" and the like. Don't let these names fool you; in the end they all do the same thing: manipulate tRD.
56 Comments
View All Comments
Aivas47a - Wednesday, December 19, 2007 - link
Great article. You guys have really been distinguishing yourselves with in-depth work on overclocking the last few months: exploring obscure bios settings, tinkering with "extreme" cooling -- keep it up!My experience with a qx9650 so far is very similar to yours: easy scaling to 4 ghz, difficult scaling after that with 4.2 ghz being the practical max for regular operation (folding, etc.).
One issue I will be interested to see you address in the future is fsb overclocking on yorkfield. So far I am seeing yorkfield top out at lower fsb (450-460) than was possible for kentsfield on a comparable P35 or X38 platform. That is not so significant for the unlocked Extreme Edition chips, but could make it difficult to achieve the magic 4 ghz with the q9550 and especially the q9450.
Aivas47a - Wednesday, December 19, 2007 - link
Great article. You guys have really been distinguishing yourselves with in-depth work on overclocking the last few months: exploring obscure bios settings, tinkering with "extreme" cooling -- keep it up!My experience with a qx9650 so far is very similar to yours: easy scaling to 4 ghz, difficult scaling after that with 4.2 ghz being the practical max for regular operation (folding, etc.).
One issue I will be interested to see you address in the future is fsb overclocking on yorkfield. So far I am seeing yorkfield top out at lower fsb (450-460) than was possible for kentsfield on a comparable P35 or X38 platform. That is not so significant for the unlocked Extreme Edition chips, but could make it difficult to achieve the magic 4 ghz with the q9550 and especially the q9450.
Doormat - Wednesday, December 19, 2007 - link
Though its somewhat disappointing on the rumors that Intel has postponed the launch of their QuadCore desktop chips from January to March.Sunrise089 - Wednesday, December 19, 2007 - link
I agree with everyone else - really top notch stuff here.1 glaring typo though, from the first page: "Moving to a smaller node process technology allows for the potential of one or two things to happen. " - the "or" should be an "of"
ChronoReverse - Wednesday, December 19, 2007 - link
It seems that ATI cards have less of a drop going from XP to Vista (down to zero and even negative sometimes). It might be instructive to use that for the charts that compare Vista to XP for 3D (e.g., the 3Dmark06 benchmark).melgross - Wednesday, December 19, 2007 - link
Capacitors have their capacitance turned into reactance at higher frequencies. Anything that qualifies, in a circuit, as a capacitor, such as two wires riding in parallel, will have, to a greater or lesser extent, the same problem in the design.Reactance rolls off high frequencies. More power is required to offset that.
This is the same problem whether dealing with low frequencies in an audio circuit (where it may be less of a problem), or a high performing computer. It's almost impossible to eliminate all stray capacitance from a circuit, and more circuitry becomes capacitive at higher frequencies. This will only increase as a problem as we get to smaller processes, such as 32nm.
andyleung - Wednesday, December 19, 2007 - link
I am very interested in the performance of these new CPUs. They are Quad-Core and they are good enough to perform some heavy duty business tasks. Wondering how they work with JEE performance.BLHealthy4life - Wednesday, December 19, 2007 - link
This article is a perfect example of what makes Anandtech so great. Anandtech has the most brilliant and most technically savvy guys on the internet.Very rarely will you fine any other website review pieces of hardware with such intricate detail for hardware specs and the technology behind it.
Great work guys!
BL
kkak52 - Wednesday, December 19, 2007 - link
really an informative article.... good work!Bozo Galora - Wednesday, December 19, 2007 - link
A 10+ article, especially the vdroop section.Its nice to see something on AT like the old days thats cuts through the BS and actually gives real usable info.
Quite a tour de force.
Nice work.