This blog is a companion piece to the Holiday Readiness Podcast by Stephanie Heyward and Andon Salvarinov from HCL Commerce Advisory Services
The success of any performance testing depends on the quality of the tests and test results. If the test results are of poor quality, there can be hidden dangers in believing the results. One of the worst situations that an Online Commerce Operations team can find themselves is inadvertently creating a “false positive” situation or results that appear to be successful and meet the NFRs, and yet against real life traffic under load…the site fails.
How do you prevent “false positives”?
There are best practices in performance testing that would generally lead to success and trustworthy test results. Such as:
- Ensuring the performance environment is controlled and well maintained just like the production environment
- Validating that each test is run under same test conditions
- Make sure cache warmup is executed prior each test and is done in consistent manner.
- Conduct a “measured test” but report only on the steady state part of it. Steady State period of the test should be at least 30 min long, but usually is 1h. A measured test is for reporting versus an investigative test which is for collecting various pieces of data to try and get to root cause on a specific issue.
The ultimate goal of all these carefully chosen limitations and suggestions is to produce trustworthy test results that can be collected over time and track and monitor any regression in the online commerce implementation now and during actual peak.
The reporting format of the test results becomes critical when considering long term regression monitoring. The data collected needs to be relevant, numerically expressed, and easy to understand. Here is a test reporting framework example used by HCL.
In general, the reporting should be a single spreadsheet that provides a summary of the test results. Key data such as:
- How much traffic was produced during each test: Order/h and pageviews/h
- Amount of resources consumed based on this traffic (i.e. Average CPU utilization)
- Page response times for each transaction, with average, 95%, max and/or standard deviation
- Other specific response times (as in backend calls, etc.)
While advanced APM (application monitoring tools) that present graphs of the KPI (key performance indicators) are great to illustrate the dynamics within the test, they are useless to compare the results between tests. You need to report actual numeric values so that you can compare your testing now and in the future. The goal is to make this a best practice and repeatable so you have baseline data to go back to when you need to regression performance test after new hardware or new functionality is added.