Performance benchmarks are the equivalent of the Indianapolis 500 for the computer industry, if not more important. Rarely are the top performance numbers attainable in the real world, but winning is a critical symbolic victory in the war of perception. The winner of the benchmarks will be perceived to have a technological lead across their entire line of products while the loser is perceived to be inferior across their entire line of products.
Benchmarks are like races and the winner is always determined by the best performance. The only fair way to conduct benchmarks is to have each player put forth their best player to achieve the highest scores possible within a common set of rules and parameters. By this measure, AMD’s recent submission of SPECpower energy efficiency benchmarks on behalf of Intel servers which portray Intel in a sub-optimal light while ignoring superior scores for Intel is highly inappropriate.
AMD says that their system gets a score of 731 while the Intel system that AMD submitted gets a score of 561. AMD claims this is a legitimate comparison because the systems share many commonalities and they defend their behavior by saying:
“if we were trying to show worst case scenario we would have turned off Power Management on the Intel server”
But the fact that AMD could have turned in even worse performance numbers for the Intel system is totally irrelevant. If anything, submitting plausible performance numbers on the Intel system is far more insidious because it is a more effective deceit.
The obvious problem with AMD’s explanation is that the Intel system submitted by AMD shows a poorly optimized hardware configuration for Intel in terms of energy efficiency. The system uses Fully Buffered memory DIMMs which are notoriously inefficient for the Intel system when the most efficient Intel systems use unbuffered DIMMs. Intel launched the San Clemente 5100 series chipset exactly one year ago which uses the same unbuffered memory as AMD. While it’s true that Intel initially went with Fully Buffered memory two years ago, it was obviously a mistake given the power consumption and Intel quickly fixed that mistake last year. With Nehalem-EP two-socket servers coming at the end of this year, Intel will only use ECC unregistered or registered DIMMs and they will completely shun FBDIMMs. The bottom line is that anyone looking to build an efficient Intel-based server today should use unbuffered memory and AMD conveniently forgot to do that.
The other possible problem in AMD’s explanation is that they used the “same JVM command line options” for the Server Side Java (SSJ) benchmark. While this sounds like a fair comparison, it’s common for different CPUs from different vendors to require different command line options to achieve optimum results. This may explain why the AMD-submitted Intel server was 5% slower than comparable Intel systems submitted by other vendors. Combined with the suboptimal Intel hardware and possibly suboptimal software configuration, we can see the likely reason why the AMD-submitted Intel system did so poorly.
To get an accurate picture of what’s really going on, we need to look at the best possible SPECpower scores for AMD and Intel to determine who the actual winner is. The table below compares the official SPECpower_ssj2008 energy efficiency benchmarks of the top dual-processor servers from Intel and AMD.
Because the SPECpower rules don’t really specify the minimal number of memory DIMMs that a server should have, AMD should have submitted servers with 2 DIMMs like everyone else to get the best scores. Instead, AMD submitted servers with 4 memory DIMMs instead of 2 which gives them a 3.58 idle watts to 8.42 peaks watt handicap. However, AMD neutralized that handicap using the newest Western Digital GreenPower 3.5″ hard drive which saves 4 idle watts to 6 active watts compared to the hard drives used by the other systems.
But for the sake of comparison, I estimated the scores of the 3 Intel systems that used only 2 DIMMs and added 3.58 watts in idle and gradually increased up to 8.42 watts at peak Server Side Java (SSJ) loads to simulate power consumption for a 4-DIMM server. But when these servers get upgraded to 4 DIMMs, they have higher memory performance which translates to higher overall performance. Based on the peak SSJ scores in the table above, I estimated a 2.5% boost in peak SSJ performance which slightly counteracts the negative effects of the higher power consumption in terms of performance per unit energy. Based on this estimate (which should NOT be taken as official SPECpower_ssj2008 scores), I calculated that the top three systems in the table above would have achieved a score of 1006, 889, and 836 which is still higher than the AMD servers. But if those top three Intel servers had used the same energy efficient hard drives used in the AMD servers, the score reverts back to something similar to the unadjusted numbers.
UPDATE 11/13/2008 – The AMD results were actually not Barcelona, but AMD Shanghai results. I had thought they were Barcelona but I didn’t realize that I was looking at yet-to-be-launched Shanghai performance numbers.
In conclusion, AMD has made huge strides to nearly close the SPECpower gap, but they’re behaving inappropriately by comparing their products to suboptimal benchmarks that they themselves submitted. That’s unfortunate because this new controversy has overshadowed the huge progress made by AMD. Had AMD launched these mid 2 GHz Barcelona processors on time a year earlier, they would have been extremely competitive all through 2008 but it was not to be and they suffered for it.
AMD made some huge clock-for-clock core-for-core performance gains with their quad-core Barcelona chips in Server Side Java (SSJ) performance compared to their older dual-core Opteron chips. Even when I factor out the clock speed difference, a Barcelona quad-core 2.7 is still 3.14 times faster than an Opteron dual-core 2.4 GHz system in Server Side Java performance. These gains have allowed AMD to become very competitive against Intel’s aging Penryn chips although AMD cannot claim the title. It’s also interesting to note that AMD Barcelona also made huge improvements in web server performance and it is beating Intel Penryn servers on dual-socket SPECweb_2005. However, Intel Penryn class CPUs still win by a large margin in the single-processor server market.
The reason for this disparity between single- and dual-socket servers is that Intel’s memory architecture is constraining their dual-socket performance, but that limitation will soon disappear with Intel’s Nehalem Microarchitecture server CPUs which should launch by the end of this year. Intel’s Nehalem CPUs have completely closed the memory performance gap with QuickPath architecture and they’ve even managed to leapfrog AMD’s memory architecture with 50% more memory channels and higher performance DDR3 memory. Coupled with the improvements in the Nehalem execution engine, there is little doubt that Intel will be regaining a comfortable lead in the Server and Desktop markets.
AMD will narrow that gap with their Shanghai processors if they can launch on time but few analysts predict Shanghai will come close to beating Nehalem. Whether or not AMD can launch Shanghai on time or get close enough to Intel Nehalem to be reasonably competitive remains to be seen.