![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
|
|
![]() |
|
|||||||||
By William Van Winkle |
|||||||||||
Barcelona is compatible with HyperTransport 1.0, although its successors will make the jump to HyperTransport 3.0. With three HT 1.0 links in the CPU, Barcelona achieves a bandwidth of up to 2.6 GHz, effectively yielding a maximum throughput of 20.8 GBps. In part because of this common link architecture, the new Opteron is fully backward compatible with any Socket F Opteron system. Now, we remain skeptical of the real-world appeal of this upgradeability, especially in the light of some Split Plane elements we’ll discuss in a bit. But the upshot is that Rev F systems are strong enough on their own that it seems unlikely many customers will upgrade them. In general, such upgrades only happen regularly in HPC circles. The real benefit of backwards compatibility lies elsewhere. “We all know that probably 90% of customers aren’t going to want to upgrade a production system,” says Fruehe, “but for channel partners, the key is that it gives you the ability to have something very interesting to bring to them and open up that conversation. How many times do you go in to talk to a customer about X, they’re not interested in X, but while you’re there you uncover the opportunity for Y? Partners always say about AMD that they like that opportunity to go offer something that gets them in the door and gets attention. The upgradeability gets that attention. They may not necessarily upgrade, but it opens the dialogue to talk about other things.” THE WHOLE POWER PICTURE The bickering between AMD and Intel on power savings vacillates between amusing, intriguing, and annoying. Which matters more—draw from the component socket or from the wall? If you look at the former scenario, Intel wins. From the wall, if you study AMD’s PowerPoint presentation numbers, AMD wins. And this is no small point given that a difference of a few dozen watts spread over scores or hundreds of machines can add up to a substantial delta on the corporate power bill at the year’s end. So let’s break this down. There are four primary power-sucking component sets on an Intel Xeon 5300 platform: CPUs, memory modules, northbridge, and southbridge. Assume that a customer wants a strong compromise of 2P, quad-core performance with power savings, perhaps edging a bit toward the latter. The best Xeon pick would be a pair of 2.0 GHz L5335 CPUs (50W TDP each). The 5300 northbridge draws 32.4W and the southbridge 12.4W. Eight FB-DIMM modules draw 83.3W at idle. Add it up and you’ve got about a 228W draw. For the sake of argument, we’re going to ignore that CPUs almost always draw less than their stated TDP or maximum wattage, just the same as a 24x7 or even 10x5 duty cycle server draws more power from memory when running at anything above idle. On the AMD side, Barcelona’s lowest power group draws 68W per chip, which looks like a disadvantage. However, there’s only one chipset component, and it draws a mere 15W. Best of all, eight DDR2 modules, operating at the same 667 MHz frequency, pull only 14.3W. That looks like a 6X power savings on paper, but be aware that real-world numbers bring the gap much closer...although not too close. Seeing a Xeon platform suck down twice the wattage of an equivalent Opteron box at idle is not uncommon.
AnandTech recently did a benchmark analysis (posted July 17th, 2007) of this power draw situation and found that, with similar configurations and a given set of benchmark applications, AMD’s Opteron Rev F platform emerged as the performance-per-watt leader over dual-core Woodcrest. The article’s conclusion notes: “What is interesting to note is that AMD’s advantage isn’t at the processor level, but instead it’s related to the fact that they don’t use fully buffered DIMMs.” That same AnandTech article illustrates that 8x1GB DDR2 modules consume about 8W at idle, slipping in well under AMD’s own marketing numbers. On the Intel side 8x1GB FBDIMM modules spiked to about 69W—again, under AMD’s numbers. Curiously, AnandTech shows about 45W more “unaccounted for” power draw on the Intel config than for the AMD machine. Mind you, these were dual-core Woodcrest and Rev F systems, but given that the Xeon 5300 makes no power savings changes over the 5100 while Barcelona makes several improvements over the Rev F Opteron, we would expect the power picture to look even brighter for AMD with its latest generation. Here’s the odd question, though: All of these comparison scenarios use eight 1GB modules to arrive at 8GB, which AMD’s Fruehe states is the most common entry point for servers today. Why not use a 4x2GB configuration? To investigate this, we turned to AMD’s own power/cost estimator at enterprise.amd.com/Flash/PlatformPower.html. Using the same 2P platform described above with eight memory modules, AMD figures an annual cost benefit (at 13 cents per kWh) of $127.54. Moving to six modules drops the figure to $105.68, and going to four leaves only $83.81 in AMD’s favor. The implications of this narrowing gap were too significant to overlook, so we dug deeper and went to the ever-reliable genius on all things memory related, Michael Schuette, vice president of technology development at OCZ Technology.
“Most 4x2GB solutions would use double-sided DIMMs, so the number of ranks is essentially the same as in an 8x1GB solution using single-sided DIMMs. Only one rank can be active at any time, and the number of open pages is the same. On the other hand, if you were to use 2GB single-sided modules, the Intel MMU would not support the 11-bit column addressing required to see the full page width of 16K. Therefore, this is not a possibility for any Intel CPU regardless of chipset to the best of my knowledge. The same restrictions play into the 2x4GB configs, which, by definition, would have to have some issues with addressing, whereas AMD’s controllers can certainly handle them.” Said more plainly, assuming double-sided modules with 1Gb chips on the PCB, there is no noteworthy difference in performance or power consumption between 8GB of system memory configured in an 8x1GB or 4x2GB setup. In theory, you could drop power consumption for a 2GB module by using 2Gb chips (single-sided), but Intel’s platform won’t address this configuration. Multiple sources told us that there was a hefty price premium on higher capacity modules, but a little etail browsing disproved this. Currently, a 667 MHz Kingston 2GB FB-DIMM module kit (2x1GB) sells for $167.49 on Newegg. Kingston’s 4GB kit (2x2GB) sells for $308.99, meaning a $26 savings to get 4GB at 667 MHz in a 2x2GB configuration. This isn’t a huge deal, but it does save a few bucks for the customer and, more importantly, leaves more room for future expansion. One exception to the nearly ubiquitous 8x1GB platform analysis is the excellent work done at The Tech Report last year (http://techreport.com/reviews/2006q4/xeon-vs-opteron/index.x?pg=11) in analyzing Woodcrest, Clovertown, and Rev F parts in a four-module configuration. While Rev F emerges as a clear winner, drawing substantially less platform power than Xeon running an equivalent number of threads, there is one important caveat: Yes, Xeon will consume more power from the wall outlet in most scenarios, but also consider the time it takes to perform a task. If the user has an application that plays to Xeon’s advantages and that Xeon machine can complete a task in, say, half the time, then the overall power consumption could be less for Xeon.
Given all this, you could argue that AMD has already won the battle for power efficiency, at least until Intel makes the jump to 45nm later this year. Barcelona tilts the scales even further in AMD’s favor with three major updates to the Opteron architecture. In fact, AMD’s Fruehe refers to power savings as being Barcelona’s “secret weapon, our single biggest advantage.” As noted above, AMD deserves praise for cramming four Opteron cores in the same package and wattage spec as the dual-core Rev F. What remains unclear is whether or how much AMD had to compromise on processor frequency in order to make this happen. Phenom parts using essentially the same core will arrive in the fourth quarter at higher speeds but also with a higher power envelope. Given that the server market is generally more sensitive to power consumption than the desktop crowd, AMD’s strategy here seems wise, especially since Barcelona can’t slip into the same wattage specs as Xeon without incurring a big frequency drop. However, this shouldn’t imply that AMD slashed performance in order to drop its power numbers. We’ve already discussed several ways in which Barcelona optimizes performance over previous Opterons. In a similar fashion, AMD implemented four major updates to Barcelona’s power saving and thermal control technologies in order to double the core count while keeping processor wattage unmoved. The first significant update is Independent Dynamic Core Technology. With all multi-core processors to date, you essentially throttle the entire processor based on utilization. When utilization falls, clock speeds can drop appropriately. In a dual-core processor, for example, the two cores run in clock speed lockstep based on the highest utilization of either core. If one is running at 75% utilization and the other is idle, both clocks are set at the same speed, appropriate to the 75% utilization, even though one is idle. With Barcelona, each core has the ability to set its own individual clock speed. Obviously, depending on the scenario, this can yield a considerable amount of power savings. (Devil’s advocate note: If a quad-core processor is consistently running at least two cores in a reduced power state, perhaps a dual-core chip would prove a more cost-effective solution.) Barcelona’s second power innovation is called CoolCore, a fancy name for what engineers usually call grained gating. The idea is simple enough: Areas of the processor that aren’t being used can be shut down so they’re not drawing power. As any gamer knows, a processor is going to run faster the cooler it is—hence the whole active cooling sub-industry. CoolCore looks at large chunks of logic that can be shut down when not being used (course grained gating). There are also smaller, more discrete areas where Barcelona can turn off smaller groups of transistors and even single transistors within the processor that aren’t being used (fine grained gating).
“If you turn off the power of one transistor, it helps cool down adjacent transistors,” says AMD’s Fruehe. “So by taking them down across different areas on the chip, it’s going to help bring down the overall temperature of the processor. In addition, on the memory controller in the processor, you’ve got the ability, if you’re doing writes, to turn off the read logic, and, if you’re doing reads, to turn off the write logic. That allows you to reduce the power consumption of the memory controller.” The third power ace up Barcelona’s sleeve is called Dual Dynamic Power Management, a feature that often goes by the name Split Plane. In prior AMD designs, a single power plane feeds into the CPU, fueling both the cores and memory controller. With Dual Dynamic Power Management, a Barcelona processor can split that plane, sending one power set to the cores and one to the memory controller. This allows the memory controller to run at a different power level than the cores. Users can increase the performance in the memory controller by increasing its clock speed and achieve lower memory latency. In the end, they have finer control over the power settings for the CPU, memory controller, and the amount of power being consumed. This should pay particular dividends when Barcelona’s K10 core eventually makes its way into the mobile market. There is a caveat, though. It’s true that Barcelona is backward compatible with all prior Socket F motherboards. But to get the benefits of Split Plane, you need a motherboard built to include the feature, meaning only current and future releases. Again, if you take a Barcelona chip and drop it in a platform that doesn’t support Dual Dynamic Power Management, it will run just fine; you’ll just have the cores and memory controller running off the same power plane as they’ve done for years. Similarly, an Opteron Rev F processor will run perfectly in a Split Plane motherboard, only without the extra power perks. However, with Dual Dynamic Power Management supported in both the CPU and motherboard, expect a lower power draw and a 3% to 10% performance gain. The upside of this backwards and forward compatibility is that it will help resellers minimize the amount of inventory they need to carry and better standardize their parts. The only possible downside to the Split Power scene is that it does take some wind out of the sails of that “you can upgrade those old Rev F boards!” argument. Barcelona’s last major power improvement zone is a new feature set called Enhanced AMD PowerNow!, designed to “deliver performance on demand while minimizing power consumption.” This capability centers on new processor performance states (p-states) being much more proactive in managing core clock speeds, and AMD believes this will result in some degree of added power efficiency. Additionally, admins will appreciate that these latest PowerNow! modes are less dependent on driver support for directing p-state changes than in the K8 generation. With Barcelona, PowerNow! intelligently adjusts p-states at the most optimal time once directed to do so by the operating system versus letting the driver blindly force the CPU to a particular p-state, as was done with the earlier Opteron Rev F. VIRTUALLY ADVANCED You’ve probably learned by now that virtualization is the process of allocating hardware resources from a single computer to run multiple, simultaneous “guest” operating systems, each with its own set of applications. According to AMD, most servers operate below 15% of their resource capacity. (Whether quad- and octo-core processors will drop that number further is an interesting question.) So it only makes sense to run multiple virtual servers within a single physical server, thus making far more efficient use of that hardware and its attendant expenses. In conventional x86 virtualization scenarios, the system runs a hypervisor, also called a virtual machine monitor (VMM), that exists above the hardware layer but below guest OSes and negotiates communication between these two levels such that virtualized machines don’t stumble into resource conflicts. This process often requires the hypervisor to simulate the machine I/O responses that normally come from the hardware layer, and this simulation in turn can lead to substantial response latencies. The more hypervisor, OS, and CPU vendors can cooperate and design solutions to make virtualization more efficient, the better the system’s end performance in juggling these guest configurations and the hardware transitions between them. AMD started optimizing its processors for virtualization as of Stepping F in the K8 family, both for AM2 and Socket F packages. AMD virtualization (AMD-V) gets a new twist under Barcelona by implementing a feature called Rapid Virtualization Indexing, also called nested paging. The crux of the matter is page faults. In the context of computer memory, a page is a block of memory used to transfer data between physical memory and external storage, such as a hard drive. Each page has its own address location in physical memory. When a software program tries to access this address, a memory management unit in the CPU monitors the process. If the page isn’t where it’s supposed to be, the memory management unit cries foul and generates a page fault interrupt. The operating system is then in charge of resolving the problem or, as we’ve all seen, crashing the program. ...more |
|||||||||||
|
|||||||||||
Copyright © 2007 RAM Magazine. All rights reserved.
Do not duplicate or redistribute in any form. |
|||||||||||