My comment to Introducing new type of benchmark.
I’ve used Poisson load generators for blackbox perf testing of how the system responds to requests produced by independent irregular sources. The gist of a load generator is that it must generate the load with a given intensity regardless of the system response rate (compare to throughput measuring benchmarks: they don’t issue another request until they get a reply from the system, so their intensity is throttled by the system response rate). This is similar to what you describe in this post (though it looks like you still throttle the intensity on the client by spinning only N threads; my load generators would just keep generating requests with a given intensity spinning up as many threads as needed).
For whitebox perf testing, I usually use two simple benchmarks and then apply a simplified model that works pretty well. The two measurements are:
1. One-request latency benchmark. Run one request of interest from one thread (usually in a loop to get an average of multiple executions) to get the time it takes to execute one request.
2. The max system throughput benchmark. Run as many parallel requests as needed to make the system throughput (TPS in the case of a database) not increase any more.
As you can see, these can be done by the traditional benchmarks and they immediately get the properties of the system (compare to Poisson load generators: you need to have an idea of the intensity to get useful results).
Then I use the following simplified model: the system has one shared processing component with processing time P, plus the system has request delivery latency L (i.e. the time it takes for a request to reach the processing component – it’s convenient to think about it as networking latency).
The one-request latency benchmark gives you the single request response time: T = P + L. If it’s too high – you’ve got a problem that you’d generally want to address before getting to the throughput tests .
The max system throughput test gives the shared component processing time P – it’s the reciprocal of the max system throughput (1/TPS in the case of a database).
The response time of the system under load is R = P + L + Q, where is Q is the average time a request is waiting to be processed by the shared processing component. A mistake that I’ve seen people to make when reasoning about response time is they make an assumption that Q is going to be close to zero until the shared processing component is fully utilized, and that’s not true with Poisson distribution!
According to the queuing theory, Q = P * U / ( 1 – U ) where U is the system utilization, i.e. the request intensity divided by the system’s maximum throughput. So when the system is loaded at 66%, Q = 2 * P, at 75% utilization Q = 3 * P, at 83% utilization Q = 5 * P, at 90% utilization Q = 10 * P, at 99% utilization Q = 100 * P, etc.
This model obviously only helps with reasoning about the system response time given the system max throughput. It doesn’t help to reason about what workload mix is going to better represent the user’s workload – both the max throughput and response time are going to change together when the workload mix changes.