← Case studies

Designing for Load: Performance Testing in Production Systems

Using controlled traffic to validate system design and uncover bottlenecks before they reach production.

  • Performance
  • Locust
  • Scalability
  • Backend

Why Performance Testing Matters

Most systems fail under load — not in development.

Features may work perfectly with one user.

But production brings:

  • Concurrency
  • Uneven traffic spikes
  • Slow downstream dependencies
  • Database contention
  • Resource exhaustion

Performance testing is not about chasing high numbers.

It is about validating architectural decisions under realistic pressure.


The Goal Is Not RPS

Many teams focus on:

  • Requests per second (RPS)
  • Peak throughput
  • “How much traffic can it handle?”

Those metrics are incomplete.

What matters more:

  • p95 / p99 latency
  • Error rate under load
  • CPU & memory stability
  • Database performance
  • Behavior during spikes

Throughput without stability is meaningless.


Why I Use Locust

I prefer Locust because:

  • It’s Python-based
  • Test scenarios are readable
  • Easy to simulate realistic user flows
  • Flexible enough for custom logic

Example flow:

  • Login
  • Fetch dashboard
  • Submit request
  • Trigger background processing

Instead of testing a single endpoint, I simulate real usage patterns.


Designing Realistic Load Scenarios

Load tests should reflect reality.

That means:

  • Mixed read/write traffic
  • Concurrent users
  • Gradual ramp-up
  • Sudden spike tests
  • Sustained load over time

Testing only steady traffic hides instability.

Spike tests reveal weaknesses quickly.


What I Look For During Tests

During load testing, I monitor:

  • Application latency (p95 / p99)
  • Error rates
  • CPU and memory usage
  • Database slow queries
  • Connection pool saturation
  • Cache hit/miss ratios

Performance testing is observability in action.

Without monitoring, load testing is just noise.


Common Bottlenecks I’ve Encountered

Some recurring patterns:

1. Missing Indexes

Under load, unindexed queries become visible immediately.

Fixing indexes can reduce p95 latency significantly.


2. N+1 Query Patterns

Works fine in development. Fails under concurrency.


3. Blocking I/O

Synchronous calls to external services become bottlenecks quickly.

Async processing or background queues often help.


4. Cache Misuse

Global cache keys. No tenant awareness. No TTL strategy.

Caching must be deliberate.


Beyond Testing: Architectural Feedback

Performance testing often reveals architectural decisions that need revisiting.

Examples:

  • Introducing Redis caching
  • Moving heavy tasks to background workers
  • Adding composite indexes
  • Increasing partition count
  • Adjusting connection pool size

Load testing is not a final step.

It is feedback for system design.


Lessons Learned

  • Measure latency distribution, not just averages.
  • Always test under concurrency.
  • Simulate real workflows, not just single endpoints.
  • Observe database behavior under stress.
  • Performance improvements should be validated, not assumed.

Final Thoughts

Performance testing is not about proving that a system is fast.

It is about discovering where it breaks.

I treat load testing as part of architecture validation —
a way to ensure systems remain stable under real-world conditions.

Design for load.

Test for failure.

Measure what matters.