![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
|
|
![]() |
|
|||||
By Chris Angelini |
||||||
when you benchmark the performance of your desktop offerings, you test applications that customers will use. Media encoding programs, games, and office productivity suites are all perfectly pertinent metrics. Additionally, most of those tests are readily available, inexpensive (even free in many cases), and fairly uninvolved to run. An afternoon of focused configuration should yield plenty of data you can use for marketing or drawing comparisons against other systems. There's no reason you wouldn't want to put white boxes through their paces, right? Benchmark data for servers is equally valuable, especially in the face of multi-core processors, evolved storage technologies, and a wave of fresh software. After all, how can you upsell the latest and greatest unless you can prove it's actually better? As it turns out, the worlds of desktop and server benchmarking are completely different. In fact, you'll find that most hardware vendors and top-tier OEMs actually have separate teams for measuring performance at each pole. What makes server testing special? Almost everything. The hardware is designed for a more demanding customer, as is the software. And because many servers are closely tied to the client systems for which they work, the benchmark environment often involves an entire room full of machines instead of just one. Prefacing Server Performance In the past, procuring performance results from server hardware wasn't a problem, though. The system builders involved were intimately familiar with the technology and understood which applications would benefit from multiple processors, more memory, SCSI storage, and so on. As a case in point, there used to be a local vendor here in town that focused solely on CAD workstations. Architects and designers would order custom configurations optimized for whichever professional app they were using, paying incredible premiums for the company's consulting expertise. Predictably, the place folded after less specialized shops started springing up, selling copycat boxes without the shocking price tag. Granted, there's no telling how those other outfits handled support when software incompatibilities materialized or certified drivers needed to be updated. But the point is that there is a similar trend going on in the server space, where VARs of all sizes are getting easy access to server hardware. We presented Trevor Lawless of Intel's server platforms group with a fairly simple market observation: It has become increasingly easy for inexperienced VARs to build white box servers. (We pieced together a dual-socket Paxville machine capable of handling eight threads simultaneously in 30 minutes.). You take a white box chassis from a Tyan, Supermicro, or even Intel, drop in a pair of processors, stir in a few memory modules, and attach storage to taste. In one afternoon, you can have a fully functional and incredibly powerful machine online. Anyone who has ever built a desktop can handle the job. But most of the configuration decisions are made somewhat arbitrarily. If you're one of those resellers toying with white box servers, it seems as though there is real value in being able to delineate the benefits of doubling memory in a database server or adding a second processor to a 1U box serving up Web pages. Without years of experience working with servers and enterprise software, how do you know the best combinations or processing speed, memory capacity, and disk I/O? That's the question we set out to answer. If you're one of those resellers toying with whitebox servers, it seems as though there is real value in being able to delineate the benefits of doubling memory in a database server or adding a second processor to a 1U box serving up Web pages. "Many applications are very much sensitive to server configuration," says Intel's Lawless, "which is why many system builders look to test their hardware." Of course, if you already benchmark server hardware, then you probably already know that the tests run on servers are much more complicated than desktop metrics. A 10-minute run of Windows Media Encoder 9 might yield a good idea of how a dual-core Pentium D does against a Pentium 4, but it doesn't tell you anything about how well SAP or SQL Server 2005 would run. Lawless continues, "There's no doubt that server benchmarking is an in-depth endeavor, perhaps even an order of magnitude more intensive than desktop testing. Setup is much more involved, as are configuration requirements. Even then, it might take days or weeks to generate meaningful results." Michael Majdalany, administrator of the TPC (Transaction Processing Performance Council) concurs. "There is definitely a big learning curve when it comes to server benchmarking. The test systems themselves are large and might require several connected clients in order to function properly. With that said, we provide complete instructions to anyone with the desire to benchmark, including large OEMs and smaller system builders." A World of Synthetics Hopefully, the words of warning from two seasoned benchmarking vets haven't totally turned you off to the thought of testing your server builds. Verifiable performance numbers are, in fact, an important part of understanding the intricacies of configuration, such as how SAS might work better for you than SATA or why memory modules should always be installed in pairs on dual-channel platforms. Just as we've previously broken down desktop PC tests into synthetic and real-world, so too can server metrics be divided up. The former, as you know, is ideal for isolating a particular subsystem, serving up performance data on a memory subsystem's bandwidth or a network controller's raw throughput. Conversely, the latter utilizes real applications, scripted or otherwise timed, to paint a more general picture of how a server might respond in a business environment. Because server testing is so complex and because software vendors dedicate resources to development rather than testing, it appears that synthetic tests are more common in the enterprise. SPEC The SPEC (Standard Performance Evaluation Corporation) is one of the most well-known players in server benchmarking. It publishes several tests used to compare everything from compute power to system-wide handling of Java servers. The company's CPU2000 version 1.3 benchmark, for instance, is easy enough for novice lab techs to run. It relays scores for CPU integer and floating-point performance, which may help resellers determine at a very low level whether to use Opteron or Xeon processors and when to make the move over to dual-core. SPEC is very thorough in its documentation. A quick visit to SPEC's Web site (www.spec.org) reveals descriptions for each of the 26 components of CPU2000, along with graphed memory footprints. According to Intel's Trevor Lawless, SPEC's jbb2005 is even better for smaller resellers because it evaluates the performance of server-side Java on a system level by emulating a three-tier client and server system. Instead of attaching physical client workstations, jbb2005 generates driver threads. As a result, the test is pretty much self-contained. SPEC claims that the benchmark is good for measuring CPU performance, memory subsystems, and the scalability of multi-processor systems. SPEC publishes a number of other benchmarks intended to measure different server usage models, giving resellers the freedom to customize their test suites. SPECweb2005 helps evaluate the performance of Web servers. MAIL2001 does the same for mail servers using SMTP and POP3 protocols. SPEC's jAppServer2004 is another Java-based metric designed to measure the performance of J2EE 1.3 application servers. And an entire sub-category of HPC (high performance computing) tests aide in the measurement of larger compute clusters. The SPEC tests perform their jobs admirably given an otherwise limited landscape of available benchmarks. But there are still caveats of which resellers should be aware. Firstly, the older metrics still require connected client machines for loading. That means you'll want some sort of lab set up with the necessary hardware to network the server in question, simulating a production environment. And then there's the issue of price. Whereas many desktop benchmarks are free and others might run $50 or so, retail copies of the less expensive server tests cost $500 each. Step up to the jAppServer 2004 or HPC2002 test suite and you'll pay $2,000 and $3,000 respectively. TPC Alternatively, you can take the free route by running any of the TPC's three prominent server benchmarks: TPC-App, TPC-C, or TPC-H. The first is an application server and Web services benchmark that simulates the activities of a B2B transactional application server. Next, TPC-C emulates a computing environment where a number of users execute transactions against a database centered on the activities of an order-entry environment. TPC-H is a decision-support test, consisting of ad-hoc queries and concurrent data modifications, according to the TPC's descriptions. The TPC tests are perhaps the most difficult to run.There are no executables of which to speak. Instead, the TPC publishes specifications for each metric, free of charge, on its Web site (www.tpc.org). The 100+-page documents spell out each benchmark's requirements, and after going through the trio, we can safely say that most VARs will probably want to focus their efforts elsewhere unless they're selling into larger enterprises. "There can be a big learning curve for resellers looking to run the TPC benchmarks," admits Michael Majdalany, TPC administrator. "And testing large servers with multiple clients can take quite a long time."
But that doesn't stop top-tier vendors from running the tests on their own server designs. "The TPC consists of 24 members. It's a non-profit, self-policing organization. In order to publish results from one of our tests, a server vendor must first have that test audited. If at any point after the results become public and another manufacturer is unable to recreate those scores, they can challenge." In that way, published TPC results are kept honest and comparisons between your equipment and competing solutions can be made more accurately. It's the only way to fly if you're interested in going head to head with tier-ones selling $50,000+ machines. Welcome to the Real World For all of the theoretical maximums and focused stresses of synthetic testing, you can't really draw general conclusions about the performance of your servers without making some sort of parallel to more believable workloads—tasks that represent what your customer will actually do with your equipment. Real-world tests aren't as widely available when you move from the desktop to enterprise. At least, that's the impression we got after talking to a couple of professionals in the benchmarking field. Intel's Trevor Lawless made it clear that picking the right benchmark is the first critical step of application-based testing.
"In the absence of readily available real-world metrics, many folks resort to testing server equipment using desktop software. Those tests aren't at all representative of high-end performance. A mail server won't do well at all in DOOM 3—that's not what it's meant for." Fortunately, Lawless was able to turn us on to a couple of viable solutions put out by well-known software developers. Business solutions vendor SAP (www.sap.com) actually hosts a number of benchmark tests, most of which are run online. Financial services supplier SunGard (www.sungard.com) also recently enabled its Adaptiv (www.sungard.com) credit and market risk management software with a benchmark workload that Intel uses internally for testing. A custom GUI times the process as a simulation engine analyzes a hypothetical portfolio, really emphasizing the benefits of threaded processors. Thirdly, Lotus NotesBench (www.notesbench.org) simulates the behavior of Domino workstation-to-server or server-to-server operations. As with many of the other benchmarks we've discussed, NotesBench is highly involved, evidenced by a 144-page user's guide. We dug up a couple of other benchmarks on our own. The first, interestingly enough, comes from Microsoft. It's obscurely named the SQLIOStress utility (a quick Google search will take you right to it) and designed to simulate read/write patterns of a heavily loaded SQL Server database. The SQLIOStress test isn't designed as a performance metric so much as it's meant to tax the storage subsystem of your servers, exposing any weaknesses. Cinebench (www.cinebench.com) is another popular measure, which centers on MAXON's CINEMA 4D ray-tracing software. The processor test renders a scene with 35 light sources and 16 shadow maps, taking advantage of multi-processor configurations when they're available. Time to completion is the resulting score, reported in seconds. That sort of measurement won't apply to all of your customers, but for those with plans to render, Cinebench is both free and easy to run. A Rock and a Hard Place So it turns out that the most representative server benchmarks are incredibly involved and the easiest ones only go so far in hashing out true performance. The good news is that benchmark developers, such as the TPC, are looking for ways to streamline evaluation methods, and hardware vendors, such as Intel, are currently working with ISVs to enable more real-world test scenarios. Even if you cautiously shy away from server testing today, it'll definitely be something to revisit down the road. |
||||||
Copyright © 2007 RAM Magazine. All rights reserved.
Do not duplicate or redistribute in any form. |
||||||