Page 1
    Page 2
  Page 3
   

 

 
by William Van Winkle and
Chris Angelini
 
 


AMD: From the Top, Down

Back when AMD launched its Opteron processor, the company didn't have much credibility amongst enterprise customers. Formerly represented by the Athlon MP in multi-processor environments, it was Intel, not AMD, that offered the most compelling server and workstation hardware. It naturally took a while for the AMD64 initiative to achieve traction, but when it did, selling the Athlon 64 six months later was comparatively a piece of cake.

The Next Desktop King?
The Athlon X2 is AMD's foray into dual-core desktop processors. The X2 and Opteron are remarkably similar in architecture and performance, save for one socket pin and usually different cache sizes.

AMD is applying that same approach to dual-core processing and with good reason. Dividing the server and workstation markets into software profiles truly indicates the benefits of multiplying processor power. On the server side, you have high-performance computation engines, DCC (digital content creation) servers, database software, data mining, Java servers, Web servers, and client relationship management software, all of which fall into the compute-intensive and multi-threaded category most sensitive to processing performance. The workstation positioning involves DCC applications, computer-assisted engineering, video editing, and electronic design automation—again, all compute-intensive usage models that scale well to multiple processors. So even if software vendors haven't specifically optimized their applications for dual-core hardware, the performance benefits are already built-in.

Like Intel's Pentium D, dual-core Opteron processors consist of two physical cores co-existing on a single slab of silicon made up of 233 million transistors and measuring 199 square millimeters. The similarities between the two architectures end with dimensions, though. AMD insists that dual-core was on its mind back in 1999, during the original Opteron's design phase. Of course, the specific implementation decisions matter from a sales perspective, but by laying the foundation for dual-core early in the architecture's lifetime, AMD helped ensure it'd be able to execute. Note that dual-core Opteron processors were ready and shipping the day AMD forecasted they would be, whereas the original Opteron launch was pushed back several times.

Under the X2 Hood
This X2 processor die shot reveals just how pivotal cache is to AMD's performance architecture. Over half of the die real estate is now given over to L2 memory.

What exactly goes into gluing a pair of server chips onto a single die and getting them to work in an aging socket interface? AMD's Damon Muzny says the most important ingredient was a 90nm manufacturing node, instrumental in getting the processor down to a reasonable size. The two onboard cores interface with a system request queue and crossbar, which allows them to communicate with each other at full operating frequency. In turn, the crossbar manages transactions with a single on-die memory controller and three HyperTransport links operating at 1 GHz.

Remember that AMD recently accelerated the performance of its HyperTransport links up to 1 GHz. Whether or not that move was in anticipation of dual-core technology, there's now more than enough throughput for both cores to share.

"HyperTransport is already way overbuilt for the amount of traffic on it," says AMD's Damon Muzny. "Case in point: When we stepped from 800 MHz x 2 to 1,000 MHz x 2, going from the 754-pin to 939-pin, how much performance increase did you see from that jump? Basically none, right? And that's good, because that means it wasn't a bottleneck in the first place. So when you add a second core, it's not going to be throttled by an underbuilt interface."

The X2's single memory controller, which is still limited to two 72-bit data pathways and supports up to DDR400 modules, takes a slightly larger hit. Quite simply, dividing available bandwidth like that imposes unavoidable resource conflicts that can't be addressed without altering the processor's pin-out. AMD claims the resulting impact is minimal, but tests indicate actual throughput losses exceed 10 percent. That's not to say overall performance drops in kind. Instead, the new Opteron boasts augmented processing power at the expense of a little memory performance.

"In seriously memory-intense applications, you may lose 10% performance because the controller is shared among cores," says Muzny, "but that's still worlds better than having the memory controller on the northbridge. It's like going to the race track and being told I'll have to take the wings off my Ferrari so I don't get as much downforce, and people say now I'm going to suck through the turns. Then I find out my competition is a Ford Mustang. Now how bad do I suck?"

Dual-core Opteron processors, like all of AMD's other 90nm products, feature SSE3 instruction support, an improved memory controller capable of utilizing mismatched memory capacities, and PowerNow! functionality. Because none of AMD's previous offerings included provisions for really drilling down and optimizing efficiency, a la Hyper-Threading, the dual-core Opteron scales even more readily to threaded software.


An Infrastructure In Place

Without question, the most telling evidence of AMD's dual-core premonition is the way in which its emerging Opteron fits right into today's infrastructure. The Socket 940 interface had an uncertain future when the platform launched in 2003. Pundits knew the forthcoming Socket 939 implementation would lift the registered memory requirement for desktops and workstations. However, the interface's persistence is now a boon to the early adopters who anticipated AMD's foresight.

Dual-Core for Workstations
AMD's dual-core Opteron may be built much like an X2, but only Opteron can use registered memory and scale to support multiple-processor configurations.

The transition isn't completely transparent. In order for a Socket 940 motherboard to support dual-core, it must first work with the latest 90nm Opteron single-core processors. And even then, it needs a BIOS update to properly recognize the additional processing cores. Nearly all of the most recent boards qualify, but early models, such as ASUS' popular SK8V, haven't yet received the requisite patches.

Additionally, about one year ago when AMD started yielding dual-core CPUs, the company notified its motherboard partners that there would be an increase in processor power delivery requirements from 60W to 80W. Board manufacturers accommodated accordingly, but there may still be some early units that only support up to 60W, so it's best to check if in doubt.

How exactly is AMD able to guarantee the integrity of dual-core operation on a motherboard originally designed for a single-core processor? By keeping power consumption figures with the bounds of the platform's initial specification and abiding by its own pin outputs, physical limitations are overcome. Moreover, in integrating northbridge functionality, all chip-to-chip communication remains on-die, meaning there aren't any external components susceptible to timing issues.

Future iterations of the dual-core Opteron will likely break AMD's infrastructure harmony. Increased processing power calls for greater bandwidth not available through a shared memory controller and better memory technologies will warrant an upgrade. But the current crop of Opteron processors should carry the Socket 940 legacy onwards through 2006.


New Names, New Prices

AMD's existing nomenclature defines three distinct classes of server and workstation processors intended for single-socket configurations (the 100-series), dual-socket (the 200-series), and four to eight-way setups (the 800-series). A number of models exist within each class, all sporting similar features and different clock speeds. Historically, each subsequent speed bump would increment each class by two. So an Opteron 252, running at 2.6 GHz, followed the Opteron 250 at 2.4 GHz, and both would populate dual-processor motherboards.

Dual-core changes that naming scheme in a bid to delineate the new products from their single-core counterparts. The three classes persist; however, dual-core models now start at x65, running at 1.8 GHz, increment by five to x70 at 2.0 GHz, and end at x75 for 2.2 GHz of power. Clearly, AMD is subject to the same manufacturing complexities as Intel, forcing somewhat slower dual-core frequencies.

You wouldn't guess it based on pricing, though. AMD is being much more aggressive than Intel in that regard. And while that's how the server and workstation markets generally operate, the dual-core stack will undoubtedly give some of your customers a case of sticker shock. Then again, they're not being forced to buy a new motherboard, power supply, or memory subsystem, either.

In essence, the bottommost dual-core processor in each of the three classes picks up at the same price as the fastest single-core model. Thus an Opteron 152 (2.6 GHz) and an Opteron 165 (1.8 GHz) will both cost you $637. Move up to the 200-series and the same split runs a cool $851. By the time you're buying eight-way chips, the 852 and 865 Opterons run no less than $1,514 each.

The plan is to slowly waterfall dual-core price points down to single-core levels, replacing single-core models along the way. In a recent presentation by Pat Patla, director of server and workstation marketing at AMD, he claimed the timing of price movements will depend on market conditions, perhaps a reference to the gradual transition toward threaded software and ensuing acceptance of dual-core hardware.


Striking Back On the Desktop

As 2005 progresses, AMD will complement the Opteron lineup with a capable dual-core desktop contender called Athlon 64 X2. Scheduled to launch on June 1st, the planned focus is on system builder availability in Q3 '05 and a retail entry in the fourth quarter, well behind Intel's desktop offering. That's alright by AMD, though, since the mainstream space is trailing in terms of threaded software.

When it finally does emerge, expect the Athlon 64 X2 to employ current model number nomenclature, beginning with the 4200+ at 2.2 GHz and dual, 512KB caches. The flagship will be an Athlon 64 X2 4800+ purring along at 2.4 GHz, wielding a pair of 1MB repositories.

All Socket 939 motherboards, even those with older AGP-based chipsets, will support the dual-core Athlon 64 X2 with a simple BIOS update. The chips fall within AMD's 110W power envelope, meaning existing heatsinks suffice, too. They'll be manufactured at 90nm with the SSE3, mixed-memory support, and voltage improvements. Cool'n'Quiet and Enhanced Virus Protection are, of course, still standard features.

For all of the Athlon 64 X2's redeeming qualities, be warned. Initial forecasts show the X2 lineup priced significantly higher than single-core models. Whereas Intel looks like it's trying to push dual-core into the mainstream, AMD will hold the technology up at the enthusiast level through 2005. Then Athlon 64 X2 processors are expected to slowly displace single-core offerings completely.

Until that happens, AMD is counting on its stable of Athlon 64 processors to battle Intel's Pentium D almost exclusively. Mainstream customers who don't run threaded software will actually find the Athlon 64 a speedier option. But as media encoding and content creation programs become more popular, the slower dual-core Pentiums really shine.

Gamers represent the one rather influential market with a proclivity for upgrading that doesn't benefit from multiple processing cores. Games are currently single-threaded and deliver optimal performance on the fastest CPU, although it's a safe bet that this will change soon enough.

"We actually see that four threads is something that fits nicely with a lot of games," says Intel's Austin. "We think this will help in the realm of physics to provide more realistic collisions between two objects and less predetermination about what will happen in the event of a collision. If I jump through a window, instead of having the glass pattern look identical no matter how fast I'm running, it'll be able to calculate a glass spread in realtime."

If AMD were to completely phase out single-core processors in favor of slower, dual-core models, the company would risk losing favor in one of its strongest segments: enthusiasts. So while Intel pushes its premier gamer product, the Extreme Edition, into dual-core, AMD is preserving the Athlon 64 FX as a single-core chip running at aggressive clock frequencies. At least one more FX model is planned for a Q2 '05 launch, solidifying AMD's position amongst enthusiasts before the dual-core exodus. Sadly, AMD will leverage the FX's now-unique position to increase its asking price.


Software:
Piecing the Puzzle Together


When AMD's Opteron first debuted, it offered excellent 32-bit performance with the promise of 64-bit computing later on. Though that latter forecast was certainly attractive, nobody bought the chip exclusively for what it would do two years down the road. With dual-core processing, it isn't necessary to wait for a software foundation to support the hardware. Any program designed to for multi-processing today will exhibit a performance increase, and single-threaded applications run concurrently should similarly improve.

Server, workstation, and desktop operating systems alike already enable fundamental dual-core support by recognizing the hardware and handling multi-threaded applications accordingly. A recent presentation by Margaret Lewis, senior software strategist at AMD, revealed that the Linux 2.6 kernel boasts improved thread handling to exploit the benefits of dual-core. Solaris 10 is NUMA-aware by default, meaning it reduces the latency penalty suffered when multiple processors try to address the same memory by assigning different memory banks to each. The 32-bit version of Windows XP is not NUMA-aware, but it does feature an advanced scheduler that facilitates pre-emptive multi-tasking for smoother operation while running several single-threaded tasks. It's also capable of dispatching multiple threads, so certain applications, such as Windows Media Encoder 9, will enjoy healthy gains thanks to threading. Windows XP x64 Edition and Windows Server 2003 x64 Edition, on the other hand, are both properly optimized for dual core in that they are NUMA-aware, feature the same advanced scheduler, and properly dispatch multiple threads.

The third-party software scene is a somewhat different story. In general, writing a threaded application is more difficult than a single-thread program. When there's a sizable gain to be realized, developers will make the extra effort, and that's why you see heavy proliferation of threaded software in the server and workstation markets. Desktop apps are a little rarer, but then again, availability of hardware is expected to properly motivate the programming community. Games represent the classic mainstream example. You currently can't find a threaded game. However, with the introduction of dual-core processors, several developers have announced their intentions to put the technology to use in upcoming titles, most notable among them being Epic president Tim Sweeney's recent announcement about embracing multi-threaded coding in the Unreal 3 engine.


Intel and AMD
Dual-Core Roadmaps


Given the flak from AMD about targeting the wrong market segment first, Intel has been assiduous about making sure everyone knows about all of the dual-core designs it has scheduled for every major segment. Smithfield (Pentium Extreme Edition), of course, was Intel's first dual-core release, and this will be succeeded in the first half of next year by Presler. Presler's main claim to fame will be the move to 65nm, which will enable 2MB of L2 cache for each core as well as frequencies expected to finally top 4 GHz.

In the fourth quarter of 2005, over one year after the initial sampling, Intel should release Montecito, the dual-core Itanium MP successor to Madison-9M. While still built on 90nm, Montecito will use 1.72 billion transistors, much of which goes into the 24MB of L3 cache shared between the two cores. Montecito will be replaced by Montvale in 2006 or so, and the early buzz on Montecito's replacement, Tukwila (formerly named Tanglewood), calls for up to 16 cores on one processor. Additionally, Tukwila is slated to be the first processor to use CSI, Intel's response to HyperTransport. In the Itanium DP family, the dual-core Millington will be based on Montecito but be optimized for low-voltage systems.

The real surprise is that Intel will not update the Xeon line for dual-core until 2006. The MP variety will kick off dual-core with the 90nm Paxville followed by Tulsa on 65nm. Paxville is essentially two Pentium 4s spliced together, so a dual-processor Paxville will be very much like a modern quad-processor Xeon. Eventually, Whitefield will replace Tulsa and be based on Tukwila's architecture, including CSI. On the DP side, the 65nm Dempsey (Q1 ‘06) will handle dual-core in a smarter fashion. Whereas Paxville continues the same battle for the bus found in Smithfield, Dempsey will add arbitration logic to help each core go about bus access in the most efficient way possible.

Not least of all, Intel will unleash Yonah for notebooks in early 2006 as a dual-core, 65nm part.

As for AMD, the company has pretty much fired its guns in the server and workstation segment for 2005. The Egypt, Denmark, and Italy parts you see in the graphic are the x65, x70, and x75 introduced in April.

On the desktop, Toledo has now been branded as X2 and will start shipping in June at model numbers of 4200+, 4400+, 4600+, and 4800+. AMD notes that the FX will remain the fastest chip around for single-thread apps, but this will be the new desktop flagship for multi-thread. There are also indications that Toledo will appear in mobile implementations, as well. The mobile Sempron evolution, "Roma," will remain single-core.

There is some unconfirmed speculation that a dual-core part named Windsor will appear in the first half of 2006 with a new socket type called M2 that will supplant 939. Other sources indicate that M2 will be used for mobile desktop replacement chips, in particular one called Trinidad. AMD has said nothing publicly about M2 or essentially anything beyond the end of this year.

NOTE: The above projections were assembled from Intel and AMD official road maps as well as reputable and current Web sources. Naturally, some specifics may change before actual hardware starts shipping.


Selling Dual-Core Now

The last thing we wanted to do with this article was lay a whole bunch of technical information on you and conclude with: "Well, now you only have to wait six to twelve months for applications that put some real value into these platforms." Had that been the case, we wouldn't have covered it. As we've seen, most market segments can benefit from dual-core technology today, even if that only means in multitasking single-threaded applications.

Fortunately, the prior work Intel has done with promoting Hyper-Threading throughout the industry will now pay double dividends. You can demonstrate to customers that a wealth of applications already exist that will show sizable performance improvements thanks to dual-core.

"You get your dual-thread performance primarily in multimedia applications—video editing, sound editing, Dr. Divx, Windows Media Encoder—basically throughout the digital media category," says Kingston's Tekunoff. "There will be some very nice performance bumps in that area. Of course, once other applications get ported over, the whole market will move in that direction, but right now the server and workstation markets have much more multi-threaded software than the desktop."

In the mainstream consumer world, everyday applications such as Windows Media Encoder and Windows Movie Maker leverage multi-threading, as does Cyberlink's Power Director and Ahead's Nero. According to AMD, Adobe Premiere Pro shows gains of up to 60% over single-core, and there are at least two dozen multi-threaded filters built into Photoshop. Novell's SUSE LINUX 9.3 now supports multi-threading, and, naturally, design applications such as 3ds Max, Lightwave, and Canopus Procoder are all multi-threaded. What will vary from app to app and become a greater concern going forward is how many thread streams have been build into each title.

"We advocate that when a software vendor says they're going to go thread, they don't necessarily pick two or four threads," says Intel's Jeff Austin. "They have some functionality within their code that takes a look at what's available in the hardware—they can find that out from the operating system. Now, there is a law of diminishing returns on some of these applications, but if it's something that can take advantage of four threads or even eight threads, go down the right code path to bring the width of the number of threads as appropriate."

"Rendering is required across the board in our products: 3ds Max, Combustion, and others," echoes Pierre Bouchard, director of product development for Autodesk media entertainment systems. "Every application that uses parallelism and can do MIPS processing will most take advantage of dual-core technology. The challenge is to get more out of the parallelism of the applications."

To that end, Intel has established a business unit called the Software Solutions Group that is focused on enabling and enhancing applications for Intel's platforms, including the new dual-core offerings.

The SGG provides developers with a slew of tools to ease the threading development process. The program provides university-style training classes for developers as well as development platforms and to date has "graduated" over 150 applications.

With dual-core now becoming de rigeur, developers will turn to multi-threading as a matter of course. Artificial intelligence and natural language processing are likely to be two areas that benefit the most, but even Microsoft Office has some multi-thread capability for printing, spell checking, and other background tasks.

As might be expected while it has no direct competition in the segment, AMD is pushing for resellers to slant dual-core toward professional users. A desktop system, the company argues, can be sold as a workstation, or you can sell an Opteron workstation as an "uber-station"—a true dual-socket, quad-core workstation.

"The fact that AMD is first to market for workstations and servers will give them a bit of an edge in the short term because all the applications will be tested and validated using their technology," says Tekunoff. "But what are things going to look like one year out? I wouldn't want to guess."

Intel's Jeff Austin expects that his company will finish 2006 with a ship rate of greater than 70% of the Pentium-class mobile and desktop products being dual-core. In the server space, where applications are already ubiquitously multi-threaded, the dual-core ship rate should exceed 85 percent.

"We believe there are great end-user benefits from this dual-core capability and these platforms," says Austin, "and that there will be wonderful demand for them. It's a great opportunity for anyone who provides system solutions."

In the end, dual-core processors may be the best thing to happen to the desktop, workstation, and small server space in the last decade. Rarely have we seen innovations able to deliver 50% to 90% performance improvement at a single stroke, and the scalability of multi-threading is bounded only by fabrication processes and socket architectures.So whether your customers are looking for faster performance or greater productivity and efficiency, dual-core is an ideal technology on which to base your solutions, and the options within this field are only set to add greater variety and value in the coming months.


 
         
    Back to top
Page 1 2 3
   
   
Copyright © 2007 RAM Magazine. All rights reserved.
Do not duplicate or redistribute in any form.