Archive

Archive for the ‘Intel’ Category

The facts about AMD’s ACP power rating

November 14th, 2008 20 comments

One issue that I’m frequently asked about is AMD’s alternative energy efficiency rating called Average CPU Power (ACP).  AMD created this new standard on its own last year and despite controversy, they’ve managed to get the media to accept this new definition of energy efficiency.  AMD has told the press that AMD’s ACP rating is equivalent to Intel’s TDP rating and I know this because this is what AMD spokesperson John Taylor told me when I was Editor at Large at ZDNet.  Here’s an excerpt of what he emailed me on September 10th 2007:

“We believe that for the datacenter customer, AMD ACP is a more useful metric when configuring/budgeting. As Intel TDP is the only available metric for Intel, it is the most comparable to AMD ACP. The trick is: how does intel formulate its TDP? My sense is it is defined very closely to AMD ACP, which reinforces a “yes” answer to your question. But i’ d’ prefer Intel confirm that.”

Well I’ve not only spoke to Intel which emphatically denies this, I know for a fact that this comparison is not correct and I’ve also gotten David Kanter (one of the leading microprocessor analysts in the world) to corroborate this.  I asked David Kanter to review AMD’s claim that Intel’s TDP rating was “most comparable” to AMD ACP and he explained:

“It’s pretty hard to justify the comparison between the two from a technical perspective.  AMD’s ACP is defined in a very different way from Intel’s TDP, according to my understanding.”

We can easily verify that AMD’s ACP rating is not comparable to Intel’s TDP rating by looking at the actual performance of the newest AMD Shanghai based servers versus Intel’s current servers.  When we look at the official published SPECpower measurements between an AMD “Shanghai” 2384 2.7 GHz system with an ACP rating of 75 watts and an Intel L5430 with a TDP rating of 50 watts with comparable components in the rest of the server, we would expect a power difference of roughly 50 watts (25W per processor) if AMD’s claim that AMD’s ACP was most comparable to Intel’s TDP rating.  But according the official SPECpower benchmarks which is optimized for low power consumption, the Intel L5430 server peaks at 161 watts while the AMD 2384 based server peaks at 264 watts.

That’s more than a 100 watt delta and when we account for the fact that the Intel server has an additional North Bridge memory controller to deal with, the actual difference between the CPUs is even greater than 100 watts.  We can negate the fact that the AMD server has two more memory DIMMs which consume an additional 8.4 watts of power because AMD uses hard drives that use 6 watts less power than the Intel system.  This strongly suggests that an AMD TDP rating of 95 watts is far more likely to explain the 100 watt more power consumption than the Intel system with 50 watt TDP processors, so calling the “Shanghai” 2384 processor a 75 watt part simply doesn’t reflect the actual efficiency of the chip.

Based on what the experts say and what the evidence suggests, AMD claiming that their ACP rating is comparable to Intel’s TDP rating is simply incorrect and the ACP rating is effectively overstating the energy efficiency AMD processors.  This is the equivalent of a car company that came up with its own Miles Per Gallon (MPG) fuel efficiency rating to inflate its actual MPG rating.

Now to be clear, TDP was never meant to be an energy efficiency rating but it is in effect exactly that when it’s used in the context of advertising and press releases.  TDP is actually an engineering metric for server manufactures to figure out what kind of cooling they have to design in to their system to accommodate a CPU.  A far more appropriate measure that people should be looking at for servers is the SPECpower rating.  But if the media insists on quoting CPU wattage ratings, they should use a consistent measurement and the only one that is comparable and accepted by the entire industry is TDP.

Categories: AMD, Benchmarks, Energy efficiency, Intel Tags:

AMD Shanghai 45nm launch – Server benchmarks roundup

November 14th, 2008 4 comments

AMD launched its 45nm Shanghai processors today for the server market ahead of Intel’s Nehalem processor launch.  AMD lists a series of benchmarks here but they omit many of the better results from Intel.  To get the full official results, here are the benchmarks based on the best scores available from AMD and Intel as of November 13th 2008.

CPU GHz Socket Cores SPECint SPECfp SPECjbb SPECweb SPECpower SAP
Intel X5482 3.2 2 4 156 93.4        
AMD 2384 2.7 2 4 136 118 352700   860  
Intel L5430 2.66 2 4         1135  
Intel X5470 3.33 2 4     316728      
Intel X7460 2.66 4 6 294 156        
AMD 8384 2.7 4 4 249 210        
AMD 2356 2.3 2 4       30007    
Intel X5460 3.166 2 4       29591    
Intel X7460 2.66 8 6           9200
AMD 8384 2.7 8 4           7010

It appears that AMD has made some important gains and it has taken the lead in SPECjbb, SPECweb, expanded its lead in SPECfp, and Virtualization (due to Nested Paging).  AMD still trails in SPECint, SPECpower, and SAP but this is an important victory for AMD which has been plagued with delays in 2007 and 2008 with its previous Barcelona processor.  Shanghai is a major milestone for AMD because it required a shift to a whole new 45nm immersion lithography process and it shows that AMD can launch a product on time and put Barcelona behind them.

However, Intel is expected to launch its Nehalem-EP processors for the mainstream two-socket server market within a few months and Nehalem is expected to be a huge jump in performance on the server platform.  While the new triple-channel DDR3 unbuffered memory subsystem doesn’t do too much for the Intel i7 Nehalem desktop processors, it is expected to unleash a huge increase in performance for Intel Nehalem.

Intel’s Penryn class of processors launched last year will be the last generation of Intel processors to use the Front Side Bus (FSB) and a single North Bridge memory controller located on the motherboard.  The FSB and single North Bridge memory controller wasn’t a problem for most of Intel’s product line but it significantly hampered Penryn’s performance in the two-socket market at higher clock speeds.  But this wasn’t a problem since AMD had trouble launching its Barcelona products early enough and at high enough clock speeds to threaten Penryn on the high end, and only now are the newest AMD Shanghai processors competing head to head with Intel Penryn.  Some people wondered why Intel stuck with the FSB architecture for so long, but the timing seems to have been appropriate given the outcome in the last two years.

Intel’s Nehalem will be too fast to run on FSB architecture which is why Intel designed it with QuickPath.  Nehalem will have no such memory bandwidth limitations as it transitions to QuickPath architecture with a memory controller on each microprocessor.  Nehalem also catches up with AMD on nested paging support for improved virtualization, but the late timing doesn’t seem to have hurt Intel given the fact that virtualization hypervisors are only now beginning to support nested paging.  So the race is on to see how quickly AMD can ramp up Shanghai processors and how long it takes Intel to launch Nehalem.

Categories: AMD, Benchmarks, Intel, Processors, Virtualization Tags:

SAP benchmarks for AMD Shanghai released

November 12th, 2008 6 comments

I just came across some certified SAP server benchmarks for the soon to be launched AMD 45nm Shanghai processor in comparison to some Intel Tigerton 65nm processor and Intel Dunnington 45nm processor.

System CPU Cores Sockets Clock Score
HP ProLiant DL785 G5 Opteron 8384 4 8 2.7 GHz 7010
IBM System x3950 M2 Xeon X7350 4 8 2.93 GHz 6615
IBM System x3950 M2 Xeon X7460 6 8 2.66 GHz 9200

While I’m not 100% certain that the Opteron 8384 is a Shanghai processor, the fact that it has 128 KB L1 cache and 512 KB L2 cache per core, and 6 MB L3 cache per processor is a pretty good indicator that it is Shanghai.

The Intel Dunnington server probably holds the edge because of its single die monolithic 6-core CPU whereas the Shanghai only has 4-cores.

Also, more Shanghai results here.

Categories: AMD, Benchmarks, Intel Tags:

AMD submits suboptimal SPECpower benchmarks for Intel

November 9th, 2008 25 comments

Performance benchmarks are the equivalent of the Indianapolis 500 for the computer industry, if not more important.  Rarely are the top performance numbers attainable in the real world, but winning is a critical symbolic victory in the war of perception.  The winner of the benchmarks will be perceived to have a technological lead across their entire line of products while the loser is perceived to be inferior across their entire line of products.

Benchmarks are like races and the winner is always determined by the best performance.  The only fair way to conduct benchmarks is to have each player put forth their best player to achieve the highest scores possible within a common set of rules and parameters.  By this measure, AMD’s recent submission of SPECpower energy efficiency benchmarks on behalf of Intel servers which portray Intel in a sub-optimal light while ignoring superior scores for Intel is highly inappropriate.

AMD says that an AMD server gets a score of 731 while the Intel server that AMD submitted on behalf of Intel gets a score of 561.  AMD claims this is a legitimate comparison because the systems share many commonalities and they defend their behavior by saying:

“if we were trying to show worst case scenario we would have turned off Power Management on the Intel server”

But the fact that AMD could have turned in even worse performance numbers for the Intel system is totally irrelevant.  If anything, submitting plausible performance numbers on the Intel system is far more insidious because it is a more effective deceit.

The obvious problem with AMD’s explanation is that the Intel system submitted by AMD shows a poorly optimized hardware configuration for Intel in terms of energy efficiency.  The system uses Fully Buffered memory DIMMs which are notoriously inefficient for the Intel system when the most efficient Intel systems use unbuffered DIMMs.  Intel launched the San Clemente 5100 series chipset exactly one year ago which uses the same unbuffered memory as AMD.  While it’s true that Intel initially went with Fully Buffered memory two years ago, it was obviously a mistake given the power consumption and Intel quickly fixed that mistake last year.  With Nehalem-EP two-socket servers coming at the end of this year, Intel will only use ECC unregistered or registered DIMMs and they will completely shun FBDIMMs.  The bottom line is that anyone looking to build an efficient Intel-based server today should use unbuffered memory and AMD conveniently forgot to do that.

The other possible problem in AMD’s explanation is that they used the “same JVM command line options” for the Server Side Java (SSJ) benchmark.  While this sounds like a fair comparison, it’s common for different CPUs from different vendors to require different command line options to achieve optimum results.  This may explain why the AMD-submitted Intel server was 5% slower than comparable Intel systems submitted by other vendors.  Combined with the suboptimal Intel hardware and possibly suboptimal software configuration, we can see the likely reason why the AMD-submitted Intel system did so poorly.

To get an accurate picture of what’s really going on, we need to look at the best possible SPECpower scores for AMD and Intel to determine who the actual winner is.  The table below compares the official SPECpower_ssj2008 energy efficiency benchmarks of the top dual-processor servers from Intel and AMD.

System (SPECpower_ssj2008) Vendor DIMMs Peak SSJ Peak watt Score
Intel L5430 2.66_GHz PowerLeader 2 285,970 161 1,135
Intel L5420 2.5_GHz SuperMicro 2 279,209 174 990
Intel L5420 2.5_GHz HP 2 282,281 189 930
Intel L5430 2.66_GHz Fujitsu Siemens 4 293,162 220 827
AMD 2384 2.7_GHz AMD 4 338,577 264 860
AMD 2384 2.7_GHz AMD 4 335,116 264 827

Because the SPECpower rules don’t really specify the minimal number of memory DIMMs that a server should have, AMD should have submitted servers with 2 DIMMs like everyone else to get the best scores.  Instead, AMD submitted servers with 4 memory DIMMs instead of 2 which gives them a 3.58 idle watts to 8.42 peaks watt handicap.  However, AMD neutralized that handicap using the newest Western Digital GreenPower 3.5″ hard drive which saves 4 idle watts to 6 active watts compared to the hard drives used by the other systems.

But for the sake of comparison, I estimated the scores of the 3 Intel systems that used only 2 DIMMs and added 3.58 watts in idle and gradually increased up to 8.42 watts at peak Server Side Java (SSJ) loads to simulate power consumption for a 4-DIMM server.  But when these servers get upgraded to 4 DIMMs, they have higher memory performance which translates to higher overall performance.  Based on the peak SSJ scores in the table above, I estimated a 2.5% boost in peak SSJ performance which slightly counteracts the negative effects of the higher power consumption in terms of performance per unit energy.  Based on this estimate (which should NOT be taken as official SPECpower_ssj2008 scores), I calculated that the top three systems in the table above would have achieved a score of 1006, 889, and 836 which is still higher than the AMD servers.  But if those top three Intel servers had used the same energy efficient hard drives used in the AMD servers, the score reverts back to something similar to the unadjusted numbers.

UPDATE 11/13/2008 – The AMD results were actually not Barcelona, but AMD Shanghai results.  I had thought they were Barcelona but I didn’t realize that I was looking at yet-to-be-launched Shanghai performance numbers.

In conclusion, AMD has made huge strides to nearly close the SPECpower gap, but they’re behaving inappropriately by comparing their products to suboptimal benchmarks that they themselves submitted.  That’s unfortunate because this new controversy has overshadowed the huge progress made by AMD.  Had AMD launched these mid 2 GHz Barcelona processors on time a year earlier, they would have been extremely competitive all through 2008 but it was not to be and they suffered for it.

AMD made some huge clock-for-clock core-for-core performance gains with their quad-core Barcelona chips in Server Side Java (SSJ) performance compared to their older dual-core Opteron chips.  Even when I factor out the clock speed difference, a Barcelona quad-core 2.7 is still 3.14 times faster than an Opteron dual-core 2.4 GHz system in Server Side Java performance. These gains have allowed AMD to become very competitive against Intel’s aging Penryn chips although AMD cannot claim the title.  It’s also interesting to note that AMD Barcelona also made huge improvements in web server performance and it is beating Intel Penryn servers on dual-socket SPECweb_2005.  However, Intel Penryn class CPUs still win by a large margin in the single-processor server market.

The reason for this disparity between single- and dual-socket servers is that Intel’s memory architecture is constraining their dual-socket performance, but that limitation will soon disappear with Intel’s Nehalem Microarchitecture server CPUs which should launch by the end of this year. Intel’s Nehalem CPUs have completely closed the memory performance gap with QuickPath architecture and they’ve even managed to leapfrog AMD’s memory architecture with 50% more memory channels and higher performance DDR3 memory.  Coupled with the improvements in the Nehalem execution engine, there is little doubt that Intel will be regaining a comfortable lead in the Server and Desktop markets.

AMD will narrow that gap with their Shanghai processors if they can launch on time but few analysts predict Shanghai will come close to beating Nehalem.  Whether or not AMD can launch Shanghai on time or get close enough to Intel Nehalem to be reasonably competitive remains to be seen.

Tomshardware botches Intel Atom energy efficiency tests badly

July 14th, 2008 46 comments

Update 8/20/2008 – I need to fix my mistake because I quoted some initial informal numbers from Jack that didn’t include the 3.5″ HDD or dual-threaded tests.  Now that the I have his full data set, I’m need to make some corrections.  I apologize for my mistake and I will fix it below, but my insistance that Tomshardware is off by a LOT has not changed though I was wrong about saying the Intel 945GC/Atom board could run on 802.3af.  Tomshardware on the other hand only did NOT correct the mistake despite acknowledging my emails to them, they went ahead and and published more results using an 850W power supply which is horrendously stupid.

For those who are going to say that Thorn who critized me in the comment section was right all along despite the fact that he did not run ANY tests, he was still wrong by a long shot because he insisted that the Tomshardware numbers aren’t that far off.  It turns out that Toms hardware was off by 25.6W from the Sparkle 220W power supply and my hasty post based on informal data that mistakenly excluded the hard drives was off by 11 W.  Let this be a lesson to me for posting informal data and once again, I apologize to my readers for my mistake.

After praising Tomshardware for doing a good job fixing their flash storage efficiency article, I must point out that Tomshardware’s made another horrendous error by claiming that and Intel D945GCLF system with an Intel Atom 1.6 GHz processor consumes 59W idle power.  Mr. Dandumont who authored the article claims that he used an “80 Plus” power supply implying that he was getting at least 80% efficiency on the PSU (Power Supply Unit) but that is a huge mistake.

UPDATE 7/21/2008 – Tomshardware’s Senior Editor Matthieu Lamelot has responded that they used the Tagan EasyCon U15 530 W power supply.  My 600W guess was fairly close and it means that Tomshardware loaded their PSU to roughly 3.6% load and that translates to roughly 30% efficiency which makes their benchmarks very inaccurate and misleading.  Tomshardware should at least test with a good 80 Plus 220W PSU but ideally they should test with a sub-100W PicoPSU.

The 80 Plus rating only claims greater than 80% efficiency if you’re talking about workloads between 20% to 100% output power and my guess is that Mr. Dandumont used a 600+ watt PSU which means he was likely loading the system at less than 3% workload.  3% workload on a PSU translates to a horrendous efficiency of less than 30%.

Tomshardware claims that the same system with a 7200 RPM 3.5″ hard drive consumes 59W idle and 62W peak using an energy efficient power supply but this isn’t even in the ballpark in terms of accuracy.  Tomshardware needs to correct this error and start using some more appropriate and smaller power supplies for testing computers that draw less than 50 watts of output power.

Update 8/20/2008 – My PhD friend Jack who is a very knowledgeable and meticulous tester tested the Intel D945GCLF with a Sparkle SPI220LE 220W PSU and got a measurement of 34W idle and 36.4W peak input power consumption.  Since the “80 Plus” SPI220LE gets around 73% efficiency at 19.2W output load according to Silent PC Review, we can reasonably estimate at 75% efficiency that the entire system actually consumes ~27W output power from the PSU.

Without the hard drive, the system peaked at 27.2W which means the output power from the PSU would be ~20W which is too much for 802.2af PoE.  The Intel 945GC chipset is unfortunately a bit power hungry so when the next version of Atom with on-die controller and graphics built in to the CPU, we can hopefully see the output power requirement dive down below 10W but until then, 802.3af is out of the question.

The SPI220LE is above 80% efficiency when the output load level is above 20% and here we’re only at around 8.1% load level so the efficiency drops.  I’ve done some testing with the Sparkle 55W Open Frame PSU and found that it runs at around 80% efficiency for the Intel 945GC/Atom board despite the fact that it’s not 80 Plus certified.  This is because the 55W PSU is running at optimum load levels.

Categories: Benchmarks, Energy efficiency, Intel, Via Tags:

Beware of Intel NIC driver updates

June 10th, 2008 7 comments

I had a pretty harsh experience tonight. I have one of Intel’s server grade NICs in my ISA server at work. It has a lot of IP addresses bound to the external adapter. We updated the NIC’s driver due to some odd behavior we were seeing (some of the ports were not being detected sometimes after a cold boot). Well, the installer decided to pick one of the IP addresses assigned to each port to make the primary IP address, and to drop all of the other IP addresses bound to thew adapter. So instead of the 5 minutes of downtime we expected, I got to spend an hour re-typing the critical IP addresses, and another hour tomorrow typing in the non-critical IP addresses. Let this be a warning: if you plan on updating the drivers for your Intel NIC, plan on possibly needing to re-configure it afterwards!

J.Ja

Categories: Intel, Networking Tags:

Intel Atom on 945 chipset motherboards have arrived!

June 6th, 2008 3 comments

The Intel Atom on 945 chipset motherboards have arrived (thanks to my friend Max for the tip) and they’re quite affordable! $77 with shipping in stock here. This should make an awesome embedded device or home server since the power consumption is so incredibly low.

This is a 4W TDP 45nm CPU that averages under a watt idle. The only thing that disappoints me is the big honking heat sink and fan on the GPU/chipset while the CPU takes a tiny bit of space with a tiny heat sink and no fan. The chipset uses an older manufacturing process which is why it’s so relatively big compared to the tiny 45nm CPU.  However, I’m pretty sure that you could remove that fan from the heat sink for the GPU/chipset especially if you don’t plan on using the GPU with 3D gaming.

Categories: Intel, Motherboards, Processors Tags:

Intel “Allendale” 2.2 GHz dual-core running at 2.93 GHz at stock voltage

June 6th, 2008 1 comment

I’m just building a new PC with using an Intel E2200 “Allendale” Core 2 2.2 GHz processor on an XFX nForce 630i model MG-630i-7159 motherboard.  This is a very similar configuration to my $400 computer build list.  Just for the heck of it I changed the FSB from 800 MHz to 1064 MHz using stock voltage and the next thing I know I’m running at 2.93 GHz.  I haven’t tried pushing it higher yet but pushing it to 1200 MHz base would jack the CPU up to 3.3 GHz which might require a voltage boost.

But before I try a faster speed, I ran a quick benchmark using wPrime 1.53 and compared it to my Intel E6600 2.4 GHz system.  The E2200 @ 2.93 GHz completed the 32M test at 28.362 seconds while my E6600 at the stock 2.4 GHz took 34.125 seconds.  Even this is an older 65nm dual-core value line CPU, I’m very impressed with it and I’m very pleased with this motherboard’s ability to overclock.

The system has a 400 GB Seagate hard drive using the NVIDIA 7150 embedded graphics and has an 18x DVD burner.  The total system at the wall idles at an impressive 48 watts and its overclocked setting at 2.93 GHz under load using wPrime 1.53 was an impressive 80 watts and seems to be stable so far.  The embedded graphics performance seems to be decent but I need to do a more thorough review of that.

However, this system seems to lock up after it wakes out of S3 Sleep state and I log in to the computer.  I thought it had something to do with the overclock setting but it wasn’t.  The system locks up even under all stock settings.  Just to put this in to context, most systems I build I have had similar problems with an Abit motherboard with an Intel 945 chipset as well.  The problem probably isn’t chipset related since some because I’ve had other motherboards with the same chipset work just fine.  I’ll have to check with XFX to see why this is being problematic.  I haven’t tried updating the BIOS yet but that may be my next immediate step.  I’m not sure if this is an implementation issue with XFX or an NVIDIA driver issue.

So far, it looks like a very stable motherboard for overclocking and the CPU will handle the higher clock speeds just fine, but this motherboard from XFX has a defective S3 sleep state.  So if this was for a server or a media center that constantly stayed on, then it would be a great system since it uses so little power in idle.  But this would be a problem for a normal PC since S3 sleep state operates at a very efficient 2 watts and allows instant-on.

Developing …

Categories: Build enthusiasts, Hardware, Intel Tags:

Build list for a nice $400 computer

April 29th, 2008 13 comments

It’s been a while since I’ve put out a PC build list so I’m going to start with a value edition with an embedded NVIDIA graphics motherboard and Intel dual-core CPU.  Note that this is more than powerful enough for any media center or office computer and even some casual low-end graphics gaming.  It’s a nice small computer that’s designed to be very quiet and fast.


Component Price
MSI P6NGM-FIH (NVIDIA GeForce 7150) HDMI Micro ATX 84
Intel Pentium E2180 Allendale 2GHz dual-core 70
2 GB DDR2-800 DIMM 41
Cooler Master Elite 340 – SMALL MicroATX tower (in store pickup) 40
FSP300-60GLN 300W efficient power supply 44
Western Digital 500 GB SATA hard drive (lowest power consumption) 90
LG 20X DVD burner, SATA 32
Sub total (including shipping) 401

Later this week I’ll put up a more powerful value system that can game well on any LCD up to 22 inches.

Categories: Build enthusiasts, Intel, NVIDIA Tags:

Intel’s Atom CPU flexes multithreading muscle

April 23rd, 2008 14 comments

Intel’s new sub-2.5 watt Atom CPU (codenamed Silverthorne) is showing it can run two process threads at the same time better than competing processors.  The Atom’s hyperthreading feature (in-order SMT) allows the single-core processor to be seen as two logical processors by an operating system.  When both processors are used whenever more than one task is being performed in the computer, substantial gains in performance can be made.

According to these slides presented at IDF on page 17, the Atom is showing an whopping 39% performance gain on SPECint_rate2000 when running two process threads at a mere 17% increase in power consumption.  Hexus.net found that the Atom’s hyperthreading feature allowed it to improve CINEBENCH 9.5 performance by more than 53%.  It should be noted that SPECint_rate2000 doesn’t really stress the memory subsystem so one would expect the SPECint_rate2006 gains on Atom hyperthreading to be lower.  However it’s not really practical to expect a small mobile or embedded device to have 4 Gigabytes of RAM with high memory bandwidth.

Some like Linus Torvalds criticized these Hyperthreading results because he feels that good Hyperthreading performance could be viewed in the glass-half-empty perspective of poor single-threaded performance.  I disagree with him because having a second processor is very beneficial to a computer especially when a single process locks up.  Modern compilers also have the auto-parallelization features that will try to take advantage of the second processor for single-threaded applications.

The closest competitor for the Intel Atom is Via’s Isaiah processor.  Recent benchmarks from Eeepcnew.de seem to have indicated that Via’s C7 Isaiah processor performs approximately 39% better on raw integer performance and approximately 3.5% better on raw floating point performance.  While the Isaiah processor performs well and has reasonable power consumption for a desktop and some larger notebooks, it consumes nearly 10 times more power than the Intel Atom.  Furthermore, the results from Eeepcnew.de are for single threading so once you factor in SMT performance gains, the Atom may actually perform better than the Isaiah.

Categories: Intel, Processors, Via Tags: