![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
|
| Feature Story | ||||||||||||||||||||||||||||||||||||||||||||||
|
![]() |
|
||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||
If you’re a long-time reader of Reseller Advocate Magazine, you know we tend to cover hardware technology once it’s available—once it has been transformed into product that you can turn around and talk about with your customers. But today is special. Intel, the largest graphics vendor in the world, ironically shut out of the add-in card market for years, is making a bold push toward its own high-performance solution that it hopes will give AMD and NVIDIA a run for their money. We’re taking an early look at the architecture in question now that the company is starting to talk details. |
||||||||||||||||||||||||||||||||||||||||||||||
Intel has been in this position before. Ten years ago, the company saw big potential in the AGP interface—enough potential that it was willing to build a graphics architecture specifically to take advantage of texturing over the bus. Unfortunately, the performance penalty associated with moving data over AGP versus onboard memory was high enough to render that card—the Intel 740—uncompetitive next to competing designs. Ever since, Intel’s admittedly overwhelming success in graphics has come from its integrated logic: the low-end stuff that’s not much for gaming, yet manages surprising stability in day-to-day use. Apparently, Intel is just not satisfied with its spectator’s seat of the performance graphics market, though. More than a year ago, Intel quietly confirmed the existence of a project called Larrabee and hinted at a few of its attributes. Larrabee would be a “many-core” architecture with a number of in-order cores on a single die. It would incorporate a large L2 cache, too. But back then, nobody could really answer the million-dollar question: “What exactly was Larrabee being built to do?” At the time, we could only speculate. Now, Intel is giving us a more concrete idea of what it has in mind.
The ramifications of Intel’s decision will be widely felt if its architecture successfully makes the transition to product and then garners enough support from the software developer community. After all, Larrabee is based on an x86 architecture, unlike AMD’s Radeon or NVIDIA’s GeForce GPUs. But while the technology diverges from familiar graphics products, it’s also dissimilar from today’s most popular CPUs.
To begin, Larrabee’s x86 cores employ in-order execution, which means instructions must be fetched, dispatched, executed, and written to a register in that order. In contrast, Intel’s Core 2 Duo employs out-of-order execution. Whereas in-order designs are prone to stalls when instructions are not yet ready to be dispatched, the out-of-order architecture fills those gaps with instructions that are ready. The trade-off is one of complexity. Because Larrabee is in-order, its cores can be made substantially smaller, allowing more of them on a die and improving performance overall. Intel thus embarked on a design experiment. How many Larrabee cores could it fit on a die with a size and power budget similar to the 45nm Core 2 Duo? The answer turned out to be 10. The chip, armed with a 4MB L2 cache and a vector processing unit able to handle 16 32-bit operations per clock, could theoretically achieve 160 vectors per clock versus Core 2 Duo’s eight. Why is vector throughput so important? That’s what gives Larrabee so much floating-point muscle versus Intel’s own desktop processors. So, while Larrabee is in many ways CPU-like, it manages to cram 20 times more operations per clock into a comparable die.
Larrabee In Numbers
Additionally, Intel wraps Simultaneous Multi-Threading into Larrabee (think Hyper-Threading, as included in the Atom and new Nehalem processors). A single core can work on four threads, whereas Core 2 Duo handles a pair. The new architecture is updated to include 64-bit extensions as well—something the Pentium never had at its disposal. Move out beyond the individual core level and you get a better idea of how Larrabee processors will be arranged, even if Intel isn’t yet talking specifics when it comes to product configurations. Cores and memory will communicate over a 1,024-bit (that’s 512 bits in each direction) ring bus. Now, remember back to AMD’s R520 GPU (it was ATI back in those days), which powered the Radeon X1800-series cards. That graphics processor employed a 512-bit ring bus to deliver lots of memory bandwidth with low latencies across the chip. With the RV670 GPU, AMD shrank the ring bus to 256-bits. And when it launched RV770, the ring bus had been completely replaced by a 256-bit hub approach. The problem with the ring bus was its complexity—the number of transistors it consumed. But Intel’s decision to adopt a ring bus suggests the need for plenty of fast access to memory. Indeed, maintaining cache coherency and giving the cores access to blocks of fixed-function logic will likely put that bandwidth to good use. To that end, there’s actually very little fixed-function silicon in the architecture. Intel is advocating a highly programmable model that can be handled almost exclusively by the x86 cores. Texturing is the exception; that’s addressed by fixed-function logic able to perform standard operations like decompression and anisotropic filtering. The texture sampler is attached to the ring bus and communicates with the cores through L2 cache. Why go fixed-function when everything else is programmable? Without the sampler, Intel says filtering operations would take 12 times longer, and decompression would take 40 times longer.
Well, Larrabee faces a similarly uphill battle. Fortunately, the war is being waged by Intel rather than AGEIA. Nevertheless, before hardware based on Larrabee is able to succeed, it needs to work on existing games (which means supporting DirectX and OpenGL) before developers start getting fancy by writing to the hardware directly. We won’t bore you with the specifics of how Intel will achieve compatibility with today’s rasterized 3D apps other than to say DirectX and OpenGL instructions are to be handled by a software renderer—a potential detractor from performance. Of course, the silver lining is that if anyone can develop the software tools needed to make Larrabee perform well with a run-time compiler, it would be Intel. Certainly more exciting is the potential of Larrabee when ISVs start writing to the hardware itself using C. General-purpose GPU, physics processing, and HPC applications will all be possible as a result of the architecture’s massive floating-point horsepower. Earlier this year we attended an event at NVIDIA’s headquarters to introduce its latest Tesla computing solutions. One of the company’s big messages at that event was how much better suited many-core processors are to the HPC world than multi-core processors. The example given was a 100 teraflop datacenter. According to NVIDIA, it’d take 1,429 servers armed with quad-core CPUs to achieve such a performance benchmark at a total cost of nearly $6 million. Using 1U Tesla configurations, each equipped with four of its add-in cards, it would only take 25 servers to achieve the same compute power for less than $400,000. Ironically, now it’s NVIDIA’s competition emerging with a plan to go many-core. Just as the GT200 and its 240 shader processors help power through software compiled with CUDA, so too will Larrabee be able to enable even greater flexibility through what Intel is calling the Larrabee native interface.
Intel Making It Worth Your While
Before you see software developers go out of their way to program specifically for Intel’s native interface, the company is going to have to prove that Larrabee is not only here, but here to stay—just like NVIDIA is trying to do with CUDA. To that end, Intel says it is working closely with top ISVs to help hash out what they need Larrabee to do. For developers disinterested in optimizing for Larrabee, games handle the hardware like any other DirectX or OpenGL graphics card. Those who do take a step further have full access to the core’s guts and can bend the architecture in any way they want. But it’ll take a concerted effort from Intel’s developer relations team to get the big names in entertainment behind Larrabee. Fortunately, the flexibility of Larrabee should help ease its entry into the highly competitive and fast-moving graphics market dominated today by AMD and NVIDIA. What makes the architecture so unique is that, because Larrabee is made of x86 cores and completely programmable, it’s very easily adaptable through software. For instance, adding support for the next version of DirectX is expected to be as straightforward as updating a driver. Of course, that raises another concern: Will Intel’s driver team be able to support the work its architects are doing now? If the company expects to compete against whatever powerful GPUs are around at the beginning of 2009, it’d better make sure Larrabee gets better software support than some of its integrated graphics cores have seen.
In a recent email from the GPU vendor, NVIDIA sought to point out how Intel’s work on Larrabee validates its own many-core parallel processors, of which it says there will be 150 million by the time Intel is able to start shipping hardware. AMD is also interested in having its graphics products handle general-purpose GPU workloads. The company’s FireStream boards are packaged just like NVIDIA’s Tesla—without display outputs. And AMD has its own software development kit, including Brook+ (an open-source variant of C), a Core Math Library, and a Performance Library optimized for video transcoding. AMD is also supporting OpenCL, a computing language developed by Apple. It might be another year before we see an add-in card centering on Larrabee. By then, AMD and NVIDIA will no doubt have graphics architectures that put today’s GT200 and RV770 cores to shame in games. They may or may not have made headway into the still-young stream computing market. But either way, both graphics giants are feeling the pressure of what Larrabee could mean if it’s successful. No doubt Intel faces significant challenges as it attempts to create a hardware architecture that can compete for graphics gold and the software infrastructure needed for Larrabee to realize its full potential. Nevertheless, this is a major project for Intel—one that the channel will want to keep an eye on as shipping product edges closer and closer. After all, this is something that’ll interest everyone, from gamers to SMBs to the enterprise folks in datacenters. |
||||||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008 RAM Magazine. All rights reserved.
Do not duplicate or redistribute in any form. |