AMD submits suboptimal SPECpower benchmarks for Intel
Performance benchmarks are the equivalent of the Indianapolis 500 for the computer industry, if not more important. Rarely are the top performance numbers attainable in the real world, but winning is a critical symbolic victory in the war of perception. The winner of the benchmarks will be perceived to have a technological lead across their entire line of products while the loser is perceived to be inferior across their entire line of products.
Benchmarks are like races and the winner is always determined by the best performance. The only fair way to conduct benchmarks is to have each player put forth their best player to achieve the highest scores possible within a common set of rules and parameters. By this measure, AMD’s recent submission of SPECpower energy efficiency benchmarks on behalf of Intel servers which portray Intel in a sub-optimal light while ignoring superior scores for Intel is highly inappropriate.
AMD says that their system gets a score of 731 while the Intel system that AMD submitted gets a score of 561. AMD claims this is a legitimate comparison because the systems share many commonalities and they defend their behavior by saying:
“if we were trying to show worst case scenario we would have turned off Power Management on the Intel server”
But the fact that AMD could have turned in even worse performance numbers for the Intel system is totally irrelevant. If anything, submitting plausible performance numbers on the Intel system is far more insidious because it is a more effective deceit.
The obvious problem with AMD’s explanation is that the Intel system submitted by AMD shows a poorly optimized hardware configuration for Intel in terms of energy efficiency. The system uses Fully Buffered memory DIMMs which are notoriously inefficient for the Intel system when the most efficient Intel systems use unbuffered DIMMs. Intel launched the San Clemente 5100 series chipset exactly one year ago which uses the same unbuffered memory as AMD. While it’s true that Intel initially went with Fully Buffered memory two years ago, it was obviously a mistake given the power consumption and Intel quickly fixed that mistake last year. With Nehalem-EP two-socket servers coming at the end of this year, Intel will only use ECC unregistered or registered DIMMs and they will completely shun FBDIMMs. The bottom line is that anyone looking to build an efficient Intel-based server today should use unbuffered memory and AMD conveniently forgot to do that.
The other possible problem in AMD’s explanation is that they used the “same JVM command line options” for the Server Side Java (SSJ) benchmark. While this sounds like a fair comparison, it’s common for different CPUs from different vendors to require different command line options to achieve optimum results. This may explain why the AMD-submitted Intel server was 5% slower than comparable Intel systems submitted by other vendors. Combined with the suboptimal Intel hardware and possibly suboptimal software configuration, we can see the likely reason why the AMD-submitted Intel system did so poorly.
To get an accurate picture of what’s really going on, we need to look at the best possible SPECpower scores for AMD and Intel to determine who the actual winner is. The table below compares the official SPECpower_ssj2008 energy efficiency benchmarks of the top dual-processor servers from Intel and AMD.
| System (SPECpower_ssj2008) | Vendor | DIMMs | Peak SSJ | Peak watt | Score |
| Intel L5430 2.66_GHz | PowerLeader | 2 | 285,970 | 161 | 1,135 |
| Intel L5420 2.5_GHz | SuperMicro | 2 | 279,209 | 174 | 990 |
| Intel L5420 2.5_GHz | HP | 2 | 282,281 | 189 | 930 |
| Intel L5430 2.66_GHz | Fujitsu Siemens | 4 | 293,162 | 220 | 827 |
| AMD 2384 2.7_GHz | AMD | 4 | 338,577 | 264 | 860 |
| AMD 2384 2.7_GHz | AMD | 4 | 335,116 | 264 | 827 |
Because the SPECpower rules don’t really specify the minimal number of memory DIMMs that a server should have, AMD should have submitted servers with 2 DIMMs like everyone else to get the best scores. Instead, AMD submitted servers with 4 memory DIMMs instead of 2 which gives them a 3.58 idle watts to 8.42 peaks watt handicap. However, AMD neutralized that handicap using the newest Western Digital GreenPower 3.5″ hard drive which saves 4 idle watts to 6 active watts compared to the hard drives used by the other systems.
But for the sake of comparison, I estimated the scores of the 3 Intel systems that used only 2 DIMMs and added 3.58 watts in idle and gradually increased up to 8.42 watts at peak Server Side Java (SSJ) loads to simulate power consumption for a 4-DIMM server. But when these servers get upgraded to 4 DIMMs, they have higher memory performance which translates to higher overall performance. Based on the peak SSJ scores in the table above, I estimated a 2.5% boost in peak SSJ performance which slightly counteracts the negative effects of the higher power consumption in terms of performance per unit energy. Based on this estimate (which should NOT be taken as official SPECpower_ssj2008 scores), I calculated that the top three systems in the table above would have achieved a score of 1006, 889, and 836 which is still higher than the AMD servers. But if those top three Intel servers had used the same energy efficient hard drives used in the AMD servers, the score reverts back to something similar to the unadjusted numbers.
UPDATE 11/13/2008 – The AMD results were actually not Barcelona, but AMD Shanghai results. I had thought they were Barcelona but I didn’t realize that I was looking at yet-to-be-launched Shanghai performance numbers.
In conclusion, AMD has made huge strides to nearly close the SPECpower gap, but they’re behaving inappropriately by comparing their products to suboptimal benchmarks that they themselves submitted. That’s unfortunate because this new controversy has overshadowed the huge progress made by AMD. Had AMD launched these mid 2 GHz Barcelona processors on time a year earlier, they would have been extremely competitive all through 2008 but it was not to be and they suffered for it.
AMD made some huge clock-for-clock core-for-core performance gains with their quad-core Barcelona chips in Server Side Java (SSJ) performance compared to their older dual-core Opteron chips. Even when I factor out the clock speed difference, a Barcelona quad-core 2.7 is still 3.14 times faster than an Opteron dual-core 2.4 GHz system in Server Side Java performance. These gains have allowed AMD to become very competitive against Intel’s aging Penryn chips although AMD cannot claim the title. It’s also interesting to note that AMD Barcelona also made huge improvements in web server performance and it is beating Intel Penryn servers on dual-socket SPECweb_2005. However, Intel Penryn class CPUs still win by a large margin in the single-processor server market.
The reason for this disparity between single- and dual-socket servers is that Intel’s memory architecture is constraining their dual-socket performance, but that limitation will soon disappear with Intel’s Nehalem Microarchitecture server CPUs which should launch by the end of this year. Intel’s Nehalem CPUs have completely closed the memory performance gap with QuickPath architecture and they’ve even managed to leapfrog AMD’s memory architecture with 50% more memory channels and higher performance DDR3 memory. Coupled with the improvements in the Nehalem execution engine, there is little doubt that Intel will be regaining a comfortable lead in the Server and Desktop markets.
AMD will narrow that gap with their Shanghai processors if they can launch on time but few analysts predict Shanghai will come close to beating Nehalem. Whether or not AMD can launch Shanghai on time or get close enough to Intel Nehalem to be reasonably competitive remains to be seen.
Great call, George!
There’s a heated debate going on in the Yahoo AMD finance boards by AMD fanatics who are somewhat unsuccessfully trying to defend AMD’s tactics.
Hopefully, respected technology and analysts will see through this deception.
This reminds me of the Barcelona simulated AMD benchmark fiasco.
Those simulations weren’t off in terms of performance; but they were deceptive by comparing late 2006 Intel products to mid 2009 AMD products.
Also, AMD failed to disclose the fact that the left out Intel auto-parallelization enhanced scores when they launched Barcelona. It was an unfortunate choice given that they should have focused on their real and huge gains in SPECjbb and SPECweb instead of cherry picked ones in SPECint.
I don’t think AMD was aiming for the title, they probably did a test for their customers. Customers like honest companies
Banana, there’s nothing honest about submitting suboptimal scores from your competitor to fool the public in to believing that the inferior scores are somehow painting an accurate picture.
Performance evaluation is a tricky matter and it gets even trickier when you try to compare entirely different hardware architectures and software setups, like SPEC’s benchmarks do. That’s why the full story always lies in the combination of result and disclosure of the test setup. There’s no single right way to apply the published results alone for a platform comparison. Most testers (being vendors of the tested hardware) try to achieve the best results for that one specific benchmark run alone and thus use setups which are seldom realistic. For the SPECCPU benchmarks, for example, the setups always have all memory slots populated, because power consumption is of no importance there. Therefor the "best vs. best" comparison doesn’t have much practical value either.
When interested in CPU performance, for example, a casual observer may come to the conclusion that Intel’s Xeon X5482 beat itself in SPECint_rate_base performance by as much as 19% through old age alone recently:
http://www.spec.org/cpu2006/results/res2007q4/cpu2006-20071115-02621.html
http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20080915-05362.html
Both tests used identical chipsets and memory setups – it’s just the compiler which was bumped by one major revision.
On a side note, regarding SPECweb_2005: The real king of the hill there is Sun’s UltraSPARC T2. I seem to remember hinting at that back when only a few SPECCPU results were available for that processor and you tried to use them as evidence for the Xeons’ superiority above that newcomer
.
I don’t think that trait ranks high in a marketing executive’s agenda
. AMD has of course chosen the scenario of similar setups in which their product looks best. If someone gets fooled by the numbers despite the tester’s prominent name, the full configuration disclosure and the large amount of different results from other testers, they don’t deserve any better. On the other hand, the attempt to compare nearly identical setups bears some validity in itself and I wish there were independent testers using the SPEC benchmarks in a similar fashion.
"In conclusion, AMD has made huge strides to nearly close the SPECpower gap, but they’re behaving inappropriately by comparing their products to suboptimal benchmarks that they themselves submitted. "
Bologna. AMD didn’t erase Intel’s previous scores on SPEC page or something like that. The best Intel scores are still there for comparison.
There’s absolutely nothing wrong with what AMD did and I think the score they submitted is curious in the least since Intel’s partners avoid FBDIMM for SPECpower scores like the plague.
Also, configurations with the best scores tend to be the most esoteric and are not representative of typical usage. Good to see AMD submitted a configuration that is closer to real usage patterns.
Intel is avoiding FBDIMM like the plague now and it’s gone from all future product lines in the one and two socket market. Intel has had unbuffered DDR2 solutions for a year and they only use the FBDIMM solutions for very high end like Stoakley which uses DDR2-800 FBDIMM with very high DIMM count. The tradeoff with the Stoakley is that you get much higher performance which offsets the additional power consumption. But AMD didn’t use a modern Stoakley FBDIMM solution and their command line options were optimized for AMD and not Intel so it was hardly an hoptimal FBDIMM comparison.
However, those systems are becoming rare though as people simply don’t need that many DIMMs.
Having a best versus best comparison with a given set of rules and parameters is the fairest way to determine a victor. It’s not perfect, but it’s superior to arbitrarily cherry picking the best system from your own product line and cherry picking a suboptimal system from your competitor’s product line which is essentially what AMD did.
Once again George you miss the mark. Just what do you have against AMD?
Using your own words "The only fair way to conduct benchmarks is to have each player put forth their best player to achieve the highest scores possible within a common set of rules and parameters. By this measure, AMD’s recent submission of SPECpower energy efficiency benchmarks on behalf of Intel servers which portray Intel in a sub-optimal light while ignoring superior scores for Intel is highly inappropriate."
After looking through the results myself, the systems seem to be configured as identically as possible for an apples-to-apples comparison. That sounds like a common set of rules and parameters to me. How are you going to state that ignoring superior Intel scores is inappropriate when you are comparing those 1000+ results on an OEM highly optimized box vs a generic Supermicro platform? Is that fair and balanced? No it’s not. Comparing 1U vs 2U systems is not fair, they are targeted at different segments. 2U systems usually include larger power supplies because of larger case/system fans, greater exansion capability, and more drives. You and many others miss the whole goal of the SPECpower benchmark as I see it. This is one case where the best score is not the performance king. It is intended to show you the power efficieny of a platform and comparing two completely different platform types is just wrong.
You left out that while the Intel system uses FB-DIMM, you don’t mention they are of the low power variant. You also complain that "AMD should have submitted servers with 2 DIMMs like everyone else to get the best scores." Because of their IMC and NUMA architecture, running 2 DIMMS (equal to 1 per socket) would kill the performance of the AMD system, not achieve the "best scores" like your buddies at Intel. Nehalem is going to be the same way because of it’s IMC also. If you look at the other Intel published results, you’ll also see that the JVM flags used are pretty representative of what everyone else is using for their submissions. AMD did appear to throw more memory at the JVM but JVMs love more memory so you can’t say they crippled it. The performance still seems to be low but maybe that’s the best they could get on that platform? If these results were so unfair
What about that San Clemente platform anyway? I couldn’t find any Dell or HP systems using it. If the major OEM’s aren’t even offering them then the volume must be pretty low.
Me, look again and tell me if there’s an HP in it.
I don’t care if AMD used low voltage FBD. They didn’t:
1. Use FBDIMM DDR2-800 with a Stoakley platform chipset
2. Optimize SSJ software for Intel.
3. Didn’t use an unbuffered system which is Intel’s most efficient platform.
Yes George, we all know you don’t care about fair and impartial reporting. In case you didn’t notice, AMD didn’t use DDR2-800 memory on their platform as well so in both cases the performance is lower. How did they not optimize the software for Intel? I see these flags used
-Xms1700m -Xmx1700m -Xns1500m -XXaggressive -Xlargepages -Xgc:genpar -XXcallprofiling -XXgcthreads=2 -XXtlasize:min=4k,preferred=1024k -XXthroughputcompaction
on pretty much every Intel submission with a few variations in the heap sizes.
AMD used these flags
-Xms3600m -Xns3200m -Xmx3600m -XXaggressive -XXcallprofiling -XXthroughputCompaction -Xgc:genpar -XXgcthreads:4 -XXtlasize:min=4k,preferred=512k
I see them giving more memory to the JVM because they used 16GB of memory instead of 8 like a lot of the other submissions. They didn’t use the -Xlargepages command but if you look at the jrockit JVM documentation, the way they implemented the -XXaggressive flag means that the JVM uses large pages anyway. Those are the two things that would have the most impact on performance so I don’t see how they didn’t optimize the platform. Almost everything else is identical.
Again, the use of an unbuffered platform is irrelevant, they are comparing two similar systems with identical configs. Though Intel is avoiding it at all costs now, FB-DIMM systems are the overwhelming majority of Intel systems out in the wild now. San Clemente is a niche market. No one is going to move their infrastructure onto the new platform with Nehalem so close.
It’s not Intel’s fault for using DDR2-800 when AMD can’t. It’s not Intel’s fault for making a more efficient unbuffered system than AMD. It’s not Intel’s fault Barcelona was so late. It’s not Intel’s fault for making a faster processor. I’m not going to play this game of trying to cherry pick just the right Intel system to make AMD look good.
I don’t condemn your reasons, Mr. Ou. AMD may pull a bad trick overnight (heck even the players with more proven record do it sometimes) and you are right on their heels but for now it’s all ok. It’s AMD’s right to publish benchmark data which shows their competitors’ weaknesses, because, as you have said, they aren’t their fault. If Intel feels they should do the same, it’s all ok too. That’s called competition.
All AMD is showing with those benchmarks is not to buy an obsolete 2 year old motherboard chipset, older DDR2-667 FBDIMMs, and which command line options to avoid. That would be like Intel building an older AMD dual-core Opteron system with suboptimal software configuration and then trying to pass it off as a fair comparison.
AMD should focus on their own products rather than trying to attack older Intel platforms and end up looking desperate and dishonest. AMD should focus on their positive points which I high lighted in the end of my article. AMD should focus on their Shanghai launch.
Fairness is certainly one of the major concerns for SPEC and they achieve it quite elegantly through the full disclosure rules. They neither demand nor encourage a comparison policy anywhere and that’s for a good reason: an attempt to maintain fairness that way would be futile. Insisting on "best vs. best" being the only fair type of comparison is, with all due respect, kind of silly, given that the benchmark vendor doesn’t even suggest anything like that anywhere in the documentation.
AMD’s tests were undoubtedly valid (otherwise they wouldn’t have been published) with the unsurprising result that their platform emerged as the winner from the given comparison of setups. Since the full configuration disclosure is there for everybody to read, they can decide for themselves if they’re more likely to run their server with 2 RDIMMs totalling 8GB of RAM and excessive user privileges or 8 low-power FBDIMMs totalling 16GB and a more conservative set of optimizations.
That’s how SPEC benchmarks work and your problem is that the best results aren’t based on configurations which are any more realistic or less cherry-picked than AMD’s. In fact, I think, they are often worse.
To suggest anything other than best-versus-best in a benchmark is ludicrous. We can sit here all day long coming up with suboptimal configurations for Intel or AMD. But when AMD puts out optimal scores for AMD and suboptimal scores for Intel and then puts the two side by side in a comparison that they’re showing the press or the public (sometimes via full page ad), it’s straight up dishonest.
SPEC has rules for disclosure on comparisons and AMD has flaunted those rules in the past.
http://blogs.zdnet.com/Ou/?p=753
AMD has a much better optimized configuration published than the one they’re comparing to the Intel setup. They’re not comparing best vs. worst, but almost identical configurations. The best results on the other hand are hardly based on realistic configurations you would find in the wild.
Refering to your sensationalist reporting on some past incident doesn’t help you build a case, but only shows you having a history of picking up every opportunity you get to attack AMD, casting reasonable doubt on your impartiality.
I thought that may have had something to do with your ZDNet contract, but I realize now that it’s a personal matter.
"AMD has a much better optimized configuration published than the one they’re comparing to the Intel setup."
You admit it’s AMD’s best versus a mediocre Intel system that AMD picked. So you’re essentially admitting that I am right so your criticism of me is hypocritical.
The point is that you can’t have identical systems and if they wanted identical, they should have used the same type of memory since both vendors support the same memory and have done so for at least a year. The point is that if we’re going to do a comparison, a competitive analysis, you cannot have one vendor pick the best system for himself and pick the mediocre system from his competitor. If you’re going to have a comparison, a contest, you need to let both vendors pick their optimized systems. For this reason, AMD’s comparison is dishonest.
I don’t see why this should be even considered.
I thought I understood SPEC to be a third party. The competing parties should both configure the best of what they have, submit to SPEC, then SPEC should handle publishing.
Is this to simplified? Does SPEC not have the resources to manage this? Do they need additional capital?
Can AMD not pull the best of the best from this central repository of results hosted by SPEC?
Obviously I think AMD’s marketing is trying to pull every trick in the book to sell the world on the greatness of a subpar product.
Anyone can submit any results, but it’s dubious to compare your own best system against your competitor’s mediocre system which you submitted.
So AMD is allowed to submit results on behalf of Intel or vice versa if everything was disclosed. The problem is that what they’ve disclosed shows that AMD put up a mediocre Intel system at best with mediocre optimizations while they turned in the best possible results for the AMD server.
How do you read "AMD’s best" into a sentence that states they have better? Where have I said that Intel’s system is just mediocre compared to that?
It is your own self-established premise that the SPEC benchmarks should only be used in a "best vs. best" (or in your own words "Indy-500-type") contest. AMD doesn’t claim to enter that contest with the results in question nor does SPEC state anywhere that this is the purpose of their benchmark suites. It is this self-righteous definition of ground rules alone which justifies your accusation of deceit.
Real customers may very well choose not to opt for the 5100 chipset. It’s currently impossible to equip it with more than 24GB of memory and it obviously isn’t the best performer, since Intel continues to use the 5400 or 5000P chipsets for benchmarks in which power consumption is not measured. It is almost certain that real customers will equip their systems with more than the 8GB of memory that testers use for the best power efficiency results. It is also very unlikely they will allow memory page locking for the JVM account. Those real customers may very well be interested in a comparison between systems which better resemble their actual setup and therefor AMD’s submission is completely valid.
It’s laughable to accuse AMD of deceit when the full configuration disclosure is there for anyone to examine and more than exposing any dishonesty on AMD’s side, you’re accusing what you call "the public" of ignorance on a large scale.
Apparently those who haven’t been exposed to years of technology should learn the following concepts.
1. Never accept literature from a specific company as gospel truth.
2. Ask for demonstrations of a product before accepting of a product’s promises and all of the product’s promises.
3. Hit the forums to see other reviews of a companies products. If they have had a bad rep in the past, they might be doomed to have one again.
Example being, I keep trying to turn down Panda Security because they spammed their home users with their own Anti-spam product.
I think that AMD has some newer products in the pipe which while begin to make them competitive with the Intel of today, while the i7 series now has Intel looking untouchable once again. For the moment, AMD is playing the marketing game and losing at it.
George, thanks for giving AMD some credit for their strides they have made, I while they need to be slapped around a bit for the repeated offenses on submitting old Intel results, may be if we complement them for what they are doing right, they will continue to strive to be better again instead of paying the poor schmucks in the advertising department.
Any benchmark should point a true strong point thing. Look to the benchmarks you did to get a conclusion. Like you compare MS Office wih OpenOffice. Startup time and speed of opening a document have nothing with all values that have OpenOffice. Like price, customization, cheap (no cost) upgrades, somehow a mature product, support for all languages. If you compare the price to make a computer with Linux, OpenOffice and buying a fully compatible hardware with Linux, against a Vista + Office 2007 counterpart, probably Linux will give a better machine (on matter of configuration, not experience, which is subjective).
I do not want to say that you are endorsed with Microsoft, but your benchmark will point some facts.
AMD right now have not enough raw performance to compete with Intel. But the benchmarks they did are around the area they are better: power management. At their frequency they compete better with Intel. Probably the AMD products will have similar pricing (or a bit bigger) than an old CPU from Intel and the market they address is much lower. AMD Ati 4850 is very known to be *almost* "top of the hill" video card but with very good pricing. The market that AMD address is server with lower power consumption.
And I am completely sure that at that point AMD do not want to trick, only to prove that Intel on that range of products are loosely performers. For sure AMD cannot compete with Nehalem architecture, but it have not why to do it.
It is fair to admit that the benchmark shown by AMD was right on the low-range 2-socket CPUs, and not necesarily on raw performance, but on power consumption.
As seen so, you appear biased on your comments, but are ok as a sensational title and content
Shame on AMD and its marketing department for trying to deceive.
Shame on AMD and its management for condoning this behavior.
Hooray to tech journalists like Sylvie and George for calling AMD out on the carpet.