Intel's Pentium Extreme Edition 955: 65nm, 4 threads and 376M transistors
by Anand Lal Shimpi on December 30, 2005 11:36 AM EST- Posted in
- CPUs
Dual Core and Hyper Threading: Detriment or Not?
A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance. To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance.
For multithreaded application performance, we can now turn to a number of benchmarks. We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):
Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler. The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):
Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications. The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads.
Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:
Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data. Quake 4, on the other hand, shows no difference in performance with SMP on or off.
From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor. The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all).
The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor? To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time. We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not.
We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled. Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more.
We tested four different scenarios:
As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled. However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads. But to get the full picture, we have to measure one last data point: Splinter Cell performance.
In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell. The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system. However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:
The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.
While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications. The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system.
We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor. The usable limit, even for today's applications and usage models, is far from just two threads.
A question that we've always had is whether or not the inclusion of Hyper Threading support on Intel's dual-core Extreme Edition processors actually improves performance. To answer that question, we have to look at two separate situations: multithreaded application performance and multitasking performance.
For multithreaded application performance, we can now turn to a number of benchmarks. We'll start off with 3dsmax 7 (higher numbers are better for the composite score, lower numbers are better for the rest of the numbers):
3dsmax 7 | Composite Score | 3dsmax 5 rays | CBALLS2 | SinglePipe2 | UnderWater |
HT Enabled | 3.0 | 12.922s | 17.297s | 83.515s | 119.641s |
HT Disabled | 2.51 | 14.937s | 21.141s | 102.734s | 141.641s |
Here, the performance advantage is clear - enabling Hyper Threading provides Intel with another 14-19% over the base dual core Presler. The same applies to almost all of the media encoding tests (if minutes or seconds are specified, lower numbers mean better performance):
Media Encoding | DVD Shrink | WME9 | H.264 | iTunes |
HT Enabled | 7.1m | 46.5fps | 9.96m | 38s |
HT Disabled | 8.0m | 38.6fps | 8.53m | 40s |
Our Quicktime 7 H.264 encoding test is, generally speaking, an outlier from what we've seen of the impact of HT on multithreaded applications. The rest of the applications show a clear benefit to being able to execute four threads simultaneously, even if the execution resources of the cores are shared with the remaining two threads.
Armed with the latest SMP patches for Call of Duty 2 and Quake 4 (SMP was enabled in both games), we can also take a look at the impact of HT on Presler:
Gaming | Call of Duty 2 | Quake 4 |
HT Enabled | 68.4 | 142.3 |
HT Disabled | 69.3 | 142.3 |
Call of Duty 2 is another example where HT actually reduces performance, but given that enabling SMP itself reduces performance, we'd venture a guess that you shouldn't really be drawing any conclusions based on its data. Quake 4, on the other hand, shows no difference in performance with SMP on or off.
From what we've seen, with most individual multithreaded applications, enabling HT will improve performance even if, you have a dual core processor. The degree of performance improvement will vary from application to application, but generally speaking, it's going to be positive (if anything at all).
The more interesting situation is what happens when you're multitasking - does Hyper Threading really help on top of the inherent benefits of a dual core processor? To find out, we put together a couple of multitasking scenarios aided by a tool that Intel provided us to help all of the applications start at the exact same time. We're not necessarily concerned with the actual performance of these applications, but rather with the impact that the number of simultaneous applications has on each other and how that varies with HT being enabled or not.
We took five applications (Grisoft AVG Anti-Virus 7, Lame MP3 Encoder 3.97a, Windows Media Encoder 9, Info-ZIP extraction utility and Splinter Cell: Chaos Theory) and used various combinations of them to try to figure out if there are multitasking benefits to a dual core processor with Hyper Threading enabled. Note that some of these applications are multithreaded themselves, so just because we chose five applications doesn't mean that there are only five threads of execution; in reality, there are many more.
We tested four different scenarios:
- A virus scan + MP3 encode
- The first scenario + a Windows Media encode
- The second scenario + unzipping files, and
- The third scenario + our Splinter Cell: CT benchmark.
AMD Athlon 64 X2 4800+ | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 22.9s | 13.8s | 36.7s | ||
AVG + LAME + WME | 35.5s | 24.9s | 29.5s | 90.0s | |
AVG + LAME + WME + ZIP | 41.6s | 38.2s | 40.9s | 56.6s | 177.3s |
AVG + LAME + WME + ZIP + SCCT | 42.8s | 42.2s | 46.6s | 65.9s | 197.5s |
Intel Pentium EE 955 (no HT) | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 24.8s | 13.7s | 38.5s | ||
AVG + LAME + WME | 39.2s | 22.5s | 32.0s | 93.7s | |
AVG + LAME + WME + ZIP | 47.1s | 37.3s | 45.0s | 62.0s | 191.4s |
AVG + LAME + WME + ZIP + SCCT | 40.3s | 47.7s | 58.6s | 83.3s | 229.9s |
Intel Pentium EE 955 (HT Enabled) | AVG | LAME | WME | ZIP | Total |
AVG + LAME | 25.0s | 13.3s | 38.3s | ||
AVG + LAME + WME | 34.4s | 21.6s | 30.2s | 86.2s | |
AVG + LAME + WME + ZIP | 41.5s | 28.1s | 37.7s | 54.2s | 161.5s |
AVG + LAME + WME + ZIP + SCCT | 51.4s | 33.0s | 45.3s | 71.1s | 200.8s |
As you can see, the Presler setup with HT enabled takes less time to complete the tasks as soon as you get beyond two simultaneous applications than the Presler system without HT enabled. However, including the Athlon 64 X2 4800+ in the picture, we see that despite only being able to execute two threads at the same time, it does just as good of a job as the Presler HT system that can execute twice as many threads. But to get the full picture, we have to measure one last data point: Splinter Cell performance.
In the fourth scenario, we ran a total of five applications: AVG, Lame, WME, InfoZip and Splinter Cell. The first four applications took a total of 197.5 seconds to complete on the Athlon 64 X2 4800+ system, ever so slightly quicker than the 200.8 seconds of the Presler HT system. However, that does not take into account Splinter Cell performance - now let's see how our fifth application fared:
Splinter Cell: CT | Average | Min | Max |
Intel Pentium EE 955 (no HT) | 71.0 fps | 27.8 fps | 128.1 fps |
Intel Pentium EE 955 (HT enabled) | 77.2 fps | 32.5 fps | 139.6 fps |
AMD Athlon 64 X2 4800+ | 66.9 fps | 10.5 fps | 185.0 fps |
The Athlon 64 X2 4800+ actually is faster in the Splinter Cell: CT benchmark without anything else running, but here we see a very different story. Although its 66 fps average frame rate is reasonably competitive with the Presler HT system, its minimum frame rate is barely over 10 fps - approximately 1/3 that of the Presler HT.
While the regular Presler setup without HT managed to pull in higher frame rates than the AMD system, it did so while performing significantly worse in the remaining four applications. The Presler HT vs. Athlon 64 X2 comparison is important because the two are virtually tied in the performance of the first four applications - but juggling all five of the applications is better done on the Presler HT system.
We would say that if implemented properly, the benefits of a SMT system like Hyper Threading are definitely a good companion to a dual core desktop processor. The usable limit, even for today's applications and usage models, is far from just two threads.
84 Comments
View All Comments
Betwon - Saturday, December 31, 2005 - link
NO.Don't You think that Future versions of the patch will be written by intel.
Viditor - Saturday, December 31, 2005 - link
Doubtful (but who knows)...I can't see Intel spending 100s of millions with every developer (or even 1 developer) for the long term, just to keep tweaking their patches. It's just not a very smart long term strategy (and Intel is quite smart).
Betwon - Saturday, December 31, 2005 - link
You just guess it.We find that the good quality codes can provide better performance for both AMD and Intel.
Intel can often benefit more, because the performance potential of Intel is high.
Now, You can not find another SMP-game which can make fps of SMP CPU improve so much great.
If you find it, please tell us.
There is no one who found it.
Viditor - Saturday, December 31, 2005 - link
Now it's you who's guessing...
Betwon - Saturday, December 31, 2005 - link
NO.It is true.
Viditor - Saturday, December 31, 2005 - link
OK...prove it!
Betwon - Saturday, December 31, 2005 - link
For example:we saw a test(from anandtech)
With the good quality codes, AMD become faster than before, but Intel become much faster than before.
They use Intel's compiler.
Betwon - Saturday, December 31, 2005 - link
When not use the intel's compiler, AMD become slow.Viditor - Saturday, December 31, 2005 - link
I know you've often quoted from the spec.org site...
I suggest you revisit there and look at the difference between AMD systems using Intel compilers and the PathScale or Sun compilers. In general, the Spec scores for AMD improve by as much as 30% when not using an Intel compiler...especially in FP.
http://www.swallowtail.org/naughty-intel.html">http://www.swallowtail.org/naughty-intel.html
defter - Saturday, December 31, 2005 - link
This is not true, for example:
FX-57, Intel compiler, SpecInt base 1862:
http://www.spec.org/osg/cpu2000/results/res2005q2/...">http://www.spec.org/osg/cpu2000/results/res2005q2/...
FX-57, Pathscale compiler, 1745: http://www.spec.org/osg/cpu2000/results/res2005q2/...">http://www.spec.org/osg/cpu2000/results/res2005q2/...
Opteron 2.8GHz, Intel compiler, SpecInt base 1837: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz, Sun compiler, SpecInt base 1660: http://www.spec.org/osg/cpu2000/results/res2005q4/...">http://www.spec.org/osg/cpu2000/results/res2005q4/...
In SpecFP Intel compiler produces slightly slower results, but the difference isn't 30%:
Opteron 2.8GHz (HP hardware), Intel compiler, SpecFP base 1805: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz (HP hardware), Pathscale compiler, SpecFP base 2052: http://www.spec.org/osg/cpu2000/results/res2005q3/...">http://www.spec.org/osg/cpu2000/results/res2005q3/...
Opteron 2.8GHz (Sun hardware), Sun compiler, SpecFP base 2132: http://www.spec.org/osg/cpu2000/results/res2005q4/...">http://www.spec.org/osg/cpu2000/results/res2005q4/...
So let's see:
Intel vs Sun compiler:
- Intel complier is 10.7% faster in SpecINT
- Sun compiler is 18.1% faster in SpecFP
Intel vs Pathscale compiler:
- Intel compiler is 6.7% faster in SpecInt
- Pathscale compiler is 13.7% faster is SpecFP
It is quite suprising that Intel's compiler gives best results for AMD's processors in many situations.