Thursday, April 14, 2005

More on performance vs. load testing

I recently got some comments/questions related to my previous blog entry on performance vs. load vs. stress testing. Many people are still confused as to exactly what the difference is between performance and load testing. I've been thinking more about it and I'd like to propose the following question as a litmus test to distinguish between these two types of testing: are you actively profiling your application code and/or monitoring the server(s) running your application? If the answer is yes, then you're engaged in performance testing. If the answer is no, then what you're doing is load testing.

Another way to look at it is to see whether you're doing more of a white-box type testing as opposed to black-box testing. In the white-box approach, testers, developers, system administrators and DBAs work together in order to instrument the application code and the database queries (via specialized profilers for example), and the hardware/operating system of the server(s) running the application and the database (via monitoring tools such as vmstat, iostat, top or Windows PerfMon). All these activities belong to performance testing.

The black box approach is to run client load tools against the application in order to measure its responsiveness. Such tools range from lightweight, command-line driven tools such as httperf, openload, siege, Apache Flood, to more heavy duty tools such as OpenSTA, The Grinder, JMeter. This type of testing doesn't look at the internal behavior of the application, nor does it monitor the hardware/OS resources on the server(s) hosting the application. If this sounds like the type of testing you're doing, then I call it load testing.

In practice though the 2 terms are often used interchangeably, and I am as guilty as anyone else of doing this, since I called one of my recent blog entries "HTTP performance testing with httperf, autobench and openload" instead of calling it more precisely "HTTP load testing". I didn't have access to the application code or the servers hosting the applications I tested, so I wasn't really doing performance testing, only load testing.

I think part of the confusion is that no matter how you look at these two types of testing, they have one common element: the load testing part. Even when you're profiling the application and monitoring the servers (hence doing performance testing), you still need to run load tools against the application, so from that perspective you're doing load testing.

As far as I'm concerned, these definitions don't have much value in and of themselves. What matters most is to have a well-established procedure for tuning the application and the servers so that you can meet your users' or your business customers' requirements. This procedure will use elements of all the types of testing mentioned here and in my previous entry: load, performance and stress testing.

Here's one example of such a procedure. Let's say you're developing a Web application with a database back-end that needs to support 100 concurrent users, with a response time of less than 3 seconds. How would you go about testing your application in order to make sure these requirements are met?

1. Start with 1 Web/Application server connected to 1 Database server. If you can, put both servers behind a firewall, and if you're thinking about doing load balancing down the road, put the Web server behind the load balancer. This way you'll have one each of different devices that you'll use in a real production environment.

2. Run a load test against the Web server, starting with 10 concurrent users, each user sending a total of 1000 requests to the server. Step up the number of users in increments of 10, until you reach 100 users.

3. While you're blasting the Web server, profile your application and your database to see if there are any hot spots in your code/SQL queries/stored procedures that you need to optimize. I realize I'm glossing over important details here, but this step is obviously highly dependent on your particular application.

Also monitor both servers (Web/App and Database) via command line utilities mentioned before (top, vmstat, iostat, netstat, Windows PerfMon). These utilities will let you know what's going on with the servers in terms of hardware resources. Also monitor the firewall and the load balancer (many times you can do this via SNMP) -- but these devices are not likely to be a bottleneck at this level, since they usualy can deal with thousands of connections before they hit a limit, assuming they're hardware-based and not software-based.

This is one of the most important steps in the whole procedure. It's not easy to make sense of the output of these monitoring tools, you need somebody who has a lot of experience in system/network architecture and administration. On Sun/Solaris platforms, there is a tool called the SE Performance Toolkit that tries to alleviate this task via built-in heuristics that kick in when certain thresholds are reached and tell you exactly what resource is being taxed.

4. Let's say your Web server's reply rate starts to level off around 50 users. Now you have a repeatable condition that you know causes problems. All the profiling and monitoring you've done in step 3, should have already given you a good idea about hot spots in your applicationm about SQL queries that are not optimized properly, about resource status at the hardware/OS level.

At this point, the developers need to take back the profiling measurements and tune the code and the database queries. The system administrators can also increase server performance simply by throwing more hardware at the servers -- especially more RAM at the Web/App server in my experience, the more so if it's Java-based.

5. Let's say the application/database code, as well as the hardware/OS environment have been tuned to the best of everybody's abilities. You re-run the load test from step 2 and now you're at 75 concurrent users before performance starts to degrade.

At this point, there's not much you can do with the existing setup. It's time to think about scaling the system horizontally, by adding other Web servers in a load-balanced Web server farm, or adding other database servers. Or maybe do content caching, for example with Apache mod_cache. Or maybe adding an external caching server such as Squid.

One very important product of this whole procedure is that you now have a baseline number for your application for this given "staging" hardware environment. You can use the staging setup for nightly peformance testing runs that will tell you whether changes in your application/database code caused an increase or a decrease in performance.

6. Repeat above steps in a "real" production environment before you actually launch your application.

All this discussion assumed you want to get performance/benchmarking numbers for your application. If you want to actually discover bugs and to see if your application fails and recovers gracefully, you need to do stress testing. Blast your Web server with double the number of users for example. Unplug network cables randomly (or shut down/restart switch ports via SNMP). Take out a disk from a RAID array. That kind of thing.

The conclusion? At the end of the day, it doesn't really matter what you call your testing, as long as you help your team deliver what it promised in terms of application functionality and performance. Performance testing in particular is more art than science, and many times the only way to make progress in optimizing and tuning the application and its environment is by trial-and-error and perseverance. Having lots of excellent open source tools also helps a lot.

31 comments:

beza1e1 said...

Nice Post. I helped to get me started on testing my liasis project. Thanks for your writing!

Anonymous said...

It's very informative

Geetika said...

This is what I understand.

Performance: System's Response time
Stress Testing: Reduce the with the system's hardware and test
Load Testing: Increase the number of users.
Volume testing: Increase the load in terms of the data transferred, even though the user might be one. Just conduct a search on a huge database.

Barakino said...

Hi
First i would like to thank you for your effort in both articles on the subject it was very illuminating (eventhough i am new to the subject i believe i followed it in general sense).
second, I would like to repeat a question posted at a comment in the previous article which was not answered unfortunately. When i perform a test (load/performance) and i want to test the app for about 100 users when i set the tools (nevermind which one) to 100 concurrent connections i perform a much larger scaled test than i intended to. I mean 100 "real" users won't click alltogether at the same time. So, is there a key or a statistical info (like: out of 100 users connected to the same site at the same time 30 click together at a given time) that can help me convert the number of concurrent connections into real users browsing a web site simultanousely?
thanks

Grig Gheorghiu said...

Barakino,

Thanks for the kind words. As far as the 100-user example, I think if you want to simulate 100 concurrent users, you do need to have 100 concurrent connections. It's true that not all users will click at the same time on the same page, but most of the performance testing tools will allow you to select a a page randomly out of a pool of pages that you specify. I believe most tools introduce random delays between the simulated clicks.

Barakino said...

well,
currently i am using tools that does not have those functions inherently.
I would be greatful if you recommended an open source tool which has those functions and is relativly easy to learn for first timers.
thanks :)

Grig Gheorghiu said...

Barakino,

Take a look at some of the tools I mentioned in my HTTP performance testing with httperf, autobench and openload post.

Kurt Schultz said...

I have been using Microsoft Web Application Stress Tool. It is very easy to use, allows you to record scripts by navigating around your web site, has settings for number of concurrent connections, fixed or random delay, etc. and can coordinate running the scripts across multiple client machines if needed in order to increase the load on the server.

Although it is not an open source tool, it is available for free download from Microsoft.

prabhakar said...

Grig,
Nice to see a comparative study on the "Performance" jargon.
An applaudable effort.

Prabhakar

Phoenix said...

Nice article.
But I'm confused with one more aspect of testing which is Benchmark testing. In Benchmark testing we need to findout a reference point which can be use for further analysis, and in Performance testing we need to analyse the performance of product.
If I have a web application and wanna to do performace testing on that application then how can I differenciate it with Benchmark testing?

Grig Gheorghiu said...

phoenix -- you may be able to differentiate between performance and benchmark testing by the fact that benchmark testing is usually done with industry-standard tools and it usually follows industry-standard procedures. As an example, check out the TPC benchmarks for database transaction processing at http://www.tpc.org/

Grig

eswar said...

Hi Grig ,
The article really helped me in understanding the diff between perfromance and load testing.
Basically im a black box tester and im doing load testing with a tool called webload.
In my case im facing a problem whent the load has been increased beyond 40 users.
And also the results which were given by the tool in QA environment does not help me more while the application has been launched in production.
My application performance is degrading day by day and im not able to identify the key reason for this.
Can u suggest me some other tools which will help me find the root cause fot this degradation like some n/w performance identifying tools and data base optimizer tools.It will be great if these tools are open ware.

Rajiv Walia said...

HI Grig,
Thanks for very nice articles :)
Can you please explain these three terms in only 3 lines.

Thanks,
Rajiv

Aparna said...

Hi,
How do you do performance testing for a client server application?In other words what are all the things we need to take into account?We do not want to use any tool for this...How can we acheive performance testing then?

Anonymous said...

Both of your articles were very informative and even though I had a fairly good grasp of the topic, your clear explanations assisted me greatly in explaining the differences to less technical business staff. Cheers, Darren.

Swathi said...

A highly informative and precise article...can you also let me know what are the simple, open source tools available to profile and monitor the CPU, MEMORY, DATABASE upon running an application?. As you've mentioned, some toolkits should serve the purpose, but is there any way to interpret the results given by the tools in an understandable fasion? If yes, can you please suggest some of such inexpensive lightweight toolkits that help us with the statistics and analysis w.r.t. Database queries, CPU & Memory?

Grig Gheorghiu said...

Swathi -- if your app runs on Linux, then there's top, vmstat, iostat, mpstat, and others. For database monitoring, if you're using MySQL, there's a tool called mytop (google for it).

On Windows, there's the PerfMon tool that allows you to monitor all kinds of OS-related counters.

Grig

Swathi said...

Thanks for the ingo Grig!..Sorry, for not mentioning the type of application being put under test...its a java standalone application and I am in search of some useful tools(java oriented) that give us the statistical info. on the usage of CPU and Memory.

I found JConsole pretty useful wrt. memory, but not many tools wrt. CPU or Database. Say,I wanted to know whether/not a DB (be it Oracle, MySql or Derby) is overwhelmed and I wanted to test how fast the DB can receive inputs; to identify if a DB is blocking any messages and etc., are there any tools that could answer such questions?...Besides,while doing performance testing with some available java tools,like JProbe, I found that it identifies bottlenecks in some methods(hotspots)..but how do I now the user-response time using tools likt that? (I don't expect precise answers wrt. the tool that i use,but, in general, how does one analyse the tool results?)

Grig Gheorghiu said...

Swathi -- as a general technique, I'd use profiling in your case. There are tons of java profilers that will time each of your methods and will show you the hotspots. Say you identify a database-bound method which is a hotspot. Then you can issue that particular SQL query directly against the database, time it, and try optimizing it. Rinse and repeat :-)

Grig

P1 said...

Hi Greg ,, thnx for the informative articles.I have one queery here . Consider these 2 statements

"1]At this point, the developers need to take back the profiling measurements and tune the code and the database queries. The system administrators can also increase server performance simply by throwing more hardware at the servers -- especially more RAM at the Web/App server in my experience, the more so if it's Java-based.

2. Let's say the application/database code, as well as the hardware/OS environment have been tuned to the best of everybody's abilities. You re-run the load test from step 2 and now you're at 75 concurrent users before performance starts to degrade."

The things that were done in step one , is perf testing really req to find out that .... isn't the developer , admins etc suppose to optimize things right in the first place.Only input we gave them is that there is a problem at 50 user load. and after this input they ll do changes [follow industry best practices etc which they were suppose to do anyways] and then it works fine on 75 users .

Can you please elaborate these steps

Grig Gheorghiu said...

P1 -- yes, you're right, sysadmins and developers should be the main parties responsible for the performance tuning aspect. The testers' role is to measure performance and point out issues that they uncover. However, in an 'agile' environment there is close collaboration between all roles. So ideally sysadmins, developers and testers would collaborate in closing the feedback loop as tightly as possible. So the testers give feedback, the sysadmins and the developers tune the systems and the app, then the testers give more feedback, etc.

Grig

Chad said...

Question:

I'm new to the subject of Load testing and am planning on doing a grad project using some tool against some web site - since I can't get a LoadRunner license I was thinking of using SilkPerformer, but I've not chosen any tool yet - any suggestions?

But the other concern is just as great - do you know where I could find an e-commerce web site to test?

Thanks,

Chad

Anonymous said...

You can use soapui for doing load testing on web services. It is a open source tool and one of the best tools i have seen for web services.

Thejesh Giri Chikmagalur said...

Very Useful Information. Previously I taught Performance = Load + Stress, then on reading different topics in your blog I came to know the difference.

Anonymous said...

I'm trying to decipher info between your several articles on this subject. You said 1000 requests per concurrent user in this article, and in another article, you said 1 or 10 requests per concurrent user. 1000 requests per concurrent user sounds kinda high. Was that a typo?

Grig Gheorghiu said...

Anonymous -- when I said 1,000 connection per user, I was referring to the total number of connections that the simulated user will make to the server during the time of the test run. They are not concurrent connections necessarily. If you read the httperf blog post I wrote, it would refer to this parameter:

num-conns: specifies how many total HTTP connections will be made during the test run -- this is a cumulative number, so the higher the number of connections, the longer the test run

Hope this helps.

Grig

Anonymous said...

Very Informative. Could you please let me know what does concurrent user actually means?

Laurent said...

If something is interesting in
drawing charts from vmstat log,
you could try vmstax at
http://www.michenux.net/blog/?p=1

Pankaj said...

Hello,

I am trying to find the extrapolation techniques that can be used after load testing on a scaled down environment.
I have compled a bench mark testing on a scaled down version of the LIVE environment(from hardware /infra perspective) . Now, I am trying to extrapolate the results to suggest the performance of the LIVE environment.
Any suggesstions?

Henry said...

I was researching for application performance testing and came across this blog. very useful and informative.

The question that I have is not so much about what performance, load testing is (yes, it is still good to understand what they are) but when to do it and what type of environment to do it in.

I am struggling with the development team right now, they wanted to do "application performance testing" (they also refer to as the non-functional testing)as part of their iterative development cycle on code that are "stable" but might not be a release candidate (did not pass user acceptance testing). The reason is to eliminate any surprises and make sure the code is clean and "perform" as expected when they release for "performance and load testing". If they did not do it this way, it will be a bit too late or too close to production deployment to have enough time to fix things that the "perform and load testing" unearth.

To perform the test, they do not want to do it in the development environments. They request for a separate, production like environment in order to get some meaningful results.

So is what they said and wanted to do make sense?

Is this "application performance testing"?

Grig Gheorghiu said...

Henry -- what your dev team is asking makes sense to me. I usually have what is called a staging environment for that type of test. It's a smaller-scale replica of production.