How to Build Your End-To-End Load Tests Scenarios From Analytics & Statistics

8 min readMar 24, 2021

In my beginnings, I ran some completely useless tests. Why? Naturally because my tests were established to follow the critical path of my application. Like login → product page → add to cart → cart page → shipping page → payment page → confirmation → logout. With each time 1 seconds pauses between requests. But yeah every retailer dream about a 100% conversion rate and predicable customers but… No. Building a realistic scenario for end-to-end testing is quite a sustained run. You need digging into your analytics tools to find your customers behavior representation and building your performance tests accordingly.

The scaling will be done in 3 steps :

I. Defining the growth coefficient

II. Defining the targeted throughput

III. Defining the user behavior

Bonus : Example of JMeter implementation

I. Defining the growth coefficient using analytics and statistics

Defining the users growth projection

My response time were pretty good when production’s struggled to stay stable .. Why ? Because I did not update my performance test load during a while. Most of the online retailers have a customer traffic growth and you need to set a projection of what your load will look like in 3–4 years in order to be sure that tomorrow, your production will be safe and will be able to sustain the load.

My method to calculate the load I need to simulate is based on this :

With your analytics tool,

Find the best day of your application, the day which you get the more customers sessions. like the first day of the sales or during Christmas period or anything that represent the apotheosis of your business. (Not some spiky moments like top successful commercial operation that took the prod down)
Get the monthly number of customers for the 3 last years
Take the first month value, like January 2018, and the last month value, like January 2021. Then apply the following formula to found the monthly growth rate

Apply the found rate to the last month value and the following etc..

Feburary 2021 = January 2021 x Growth
March 2021 = February 2021 x Growth
Until reaching your 3 years projection.

Spiky part is historical, smooth part is projected

Now we need to calculate the variation rate :

Variation = (( final value — first value ) / first value) x 100

Where your final value is January 2024 and first value is January 2021

In our case :

((7 150 345 — 4 800 000) / 4 800 000) x 100 = 48,97%
Which means you expect +48,97% users in January 2024 based on your monthly growth history.

You have now your Growth coefficient : 1,4897

II. Defining the targeted throughput

Again, with your favorite analytics tool,

Find the best day of your application, the day which you get the more pages views. Still not some spiky moments like top successful commercial operation that took the prod down

Let’s say, it’s during the lock-down, all your stores are closed and all your customers are on your website. You hit 12 500 000 viewed pages in a day !

Now let’s assume your website have a decent traffic 10H per days (like 10H-20H) you can guess your precise number of requests to simulate for a test.

For example, if you want to run a 10H test, which would represent a full day of your traffic :

Get your 12 500 000 viewed pages. Multiply it by your growth coefficient : 1,4897

12 500 000 x 1,4897 = 18 621 250

Which means :

18 621 250 / 10 = 1 862 125 per hours
1 862 125 / 60 = 31035 per minutes
31035 / 60 = 517,25 per seconds

You got it ! It’s your target performance test throughput : 517,25 requests per seconds.

III. Mimicking the customer behavior

Ok now we know how many requests need to be done, but, what requests ? It’s not 517,25 homepage per seconds.

Again, in your favorite analytics tools, you’ll need to find the at least 80% most viewed pages.

In the above chart, you can see that with only 5 different categories of pages, you’ve already got 79% of your total traffic (Homepage / Product / Catalog / Stores / Checkout). But the more pages your cover, the best will your scenario match customer’s behavior. Once you’ve got this, you can begin to build your performance scenario, simulating a typical user path.

Customer scenario defined from analytics

Now, find your conversion rate and the percentage of abandon during each steps of your checkout process. This mean that on each steps of your conversion tunnel, you’ll lose some customers. You’ve this kind of data in analytics tools, like in Google Analytics as shown the chart below :

The data in the screenshots above are not our example, just a simulated view to understand the concept

As well, to know how to reproduce the customer’s sessions duration, you’ll need to know how much time they spend on which page. With this information, you’ll be able to know the throughput dispersion across the different pages and be able to calculate the number of virtual users you’ll need.

Time spent per pages as shown in Google Analytics (image credits : Databox)

For the example, you’ll assume that your Session duration is 6 minutes spread between the pages

Defining the number of Virtual Users

We know that we need to make 517,25 requests per seconds. As well, we know that our average session duration is 6 minutes

To calculate the number of Virtual Users needs you can do the following :

1 Iteration would take 360 seconds (6 minutes), in 1 iteration, you’ll see :

1 homepage
1 browse or search
1 product pages
1 store page
1 add to cart request
1 cart page
1 login page
1 sign in of signup request
1 shipping
1 payment
1 confirmation

So : 11 requests / 360s / Virtual users

Meaning, you’ll make 1 request every 32.7s (360s / 11 requests) =~ 0,03 requests / seconds / Virtual User

So : Virtual users needs = 517,25 / 0.03 = 17241

ProTip : If you are limited by the number of Virtual Users, you can just cut in half the number of VU while cutting in half your think times, it would behave the same.

If I can advice, simulating 1 VU as 1 real user is the best scenario. Because, you’ll have a precise idea of what load your infrastructure can handle. It will represent with fidelity the network exchanges. Because you’ll have 1 thread per real user, so one used socket, one keep alive per clients etc… If you split your virtual users by 2 and divide your think times by 2, you’ll make the same amount of requests per seconds, I agree, but you’ll use the same connection at twice the hit rate. As well in some application, like Oracle ATG, each client request (and cookie session) generate a server side session which cost memory.

De-multiplying the number of virtual users will really improve the fidelity of your test

IV. Concrete example and application

Applying these concepts in load testing

In your scenarios, you’ll need to apply some Think times and weight inside your iterations to simulate this kind of behavior.

Think times are the “time taken by the users between each actions”. Because human are not fast as machines, and so, we need our load testing bot to wait as the humans do. We know that your average session duration is around 6 minutes. So let’s spread this across our requests.

Weight are fixed percentage that will determine if the virtual users in the iteration will see or not a page. The weight is tested against a randomly generated variable.

Reproducing the Flow diagram in JMeter

First, we will create our pages flow before creating any intelligence.

Each R_HTTP is a Sampler HTTP Request

Then we will create the decisions tree (without any values yet) from the diagram above.

Now we will populate some variables with values and use them to make the IF work. The JMeter components used to populate handmade variables is the User Defined Variables

User defined variable with weight inside

Then use theses variables inside the IFs. To begin, create a Config Element → Random Variable from which we will base the decisions :

Creation of a random variable to compare each weight

And in each of the IFs, compare the previously manually set variable with the globalRandom one. We generate the globalRandom only once per iterations and per users to save some CPU during the load test

Using groovy to compare weight and random

ProTip : Reset the random variable at each steps of the checkout process.

Setting the Think Times

For each R_HTTP you’ll need to add a timer. I like the Timer → Uniform Random Timer because we can add a part of randomness to the think time. Avoiding virtual users to be too much synchronous. Apply each values depending of what you found on your analytics tool.

Apply this everywhere now according to the Average Session Duration spread the 6 minutes accordingly with your analytics tools.

V. Conclusion

I said it would be a sustained run, did I lied ? Reproducing the customers behavior in load testing and principally in end-to-end testing is really a rigorous work with plenty of data analysis. But in the end, you’ve got a working and pretty futuristic iso production scenario that will ensure your production stability for a while. Without doubts, I could have some unreal data in the screenshots or some in the examples, it’s because all these data are fake and just invented for this article. Now you have a perfect scenario and you know how much load to inject. What about trying this Dockerized JMeter starter kit to run it

If you have any questions feel free to ask and don’t hesitate to follow me to see more load testing stories !