Designing for Load: Performance Testing in Production Systems

I started doing load testing seriously after we shipped a multi-tenant platform and discovered — under real traffic — that parts of the system weren't handling concurrent requests the way we assumed they would.

Since then I run Locust against every significant backend before it goes anywhere near production.

Why Locust

Python. Tests are just code. No YAML, no GUI, no test-runner DSL. A scenario is a class, a task is a function. I can simulate realistic user flows — login, navigate, submit, trigger background work — not just hammer a single endpoint.

Most load testing tools make you think in terms of requests per second. Locust makes it easy to think in terms of users doing things, which is closer to how systems actually fail.

What I actually test

Gradual ramp-up. Start low, increase users steadily. Watch for the point where latency starts climbing or errors appear. That's where the system starts to struggle.

Sustained load. Hold a realistic user count for 10–20 minutes. Things that look fine under a spike often degrade over time — memory leaks, connection pool exhaustion, cache issues.

Spike tests. Jump from low to high instantly. This reveals whether the system absorbs sudden bursts or falls over and takes time to recover.

What I look at

Not RPS. RPS is easy to make look good.

I care about p95 and p99 latency, error rate under load, and what the database is doing. Slow queries that are invisible at low traffic show up immediately under concurrency.

The other thing I watch closely is connection pool saturation. In a multi-tenant system especially, it's easy to undersize pools and not notice until traffic picks up.

What I usually find

Missing indexes. Under load, queries that seem fast with small data become obvious. One missing index can swing p99 from 50ms to 2000ms.

N+1 queries. Works fine in development, becomes a problem the moment you have real concurrent users. Usually found by watching what the database does while Locust is running.

Synchronous calls that should be async. A service waiting on a slow external call while holding a connection is a quiet bottleneck. Under load it gets loud.

Wrong cache keys. Global cache keys without tenant awareness are dangerous in multi-tenant systems. Load tests sometimes surface this before it becomes a data integrity issue.

Load testing as design feedback

The most useful thing about load testing isn't the numbers. It's what the numbers tell you about the design.

If the database is saturated, that's a design conversation. If a spike causes a cascade, that's worth understanding before production does it for you. I've used Locust results to justify adding Redis caching, moving heavy work to background workers, restructuring queries, and adjusting pool sizes. Those conversations are a lot easier when you have data.