Testing: Load with Latency

When I was asked to test the performance of a new API endpoint, I quickly realized: I had never done load or latency testing before. I knew performance mattered, but I didn’t know how to measure it, what numbers to care about, or how to even start.

This post is my attempt to write down what I learned. I’ll explain the terms in plain English, the different types of tests, and show a simple example of how to run your first load test. If you’ve ever seen words like p95 latency or throughput and felt lost—this one is for you.

Key Terms Explained

Latency: simply how long a request takes. If a user clicks “Save Preference” and the response comes back in 180 ms, that’s the latency for that request.

Why not average?

Let’s say 9 requests are really fast (~100 ms), but 1 request is very slow (~900 ms).

Average latency looks like ~200 ms (which hides the slow request).
p50 (median) means “50% of requests are faster than this” (maybe ~120 ms).
p95 means “95% of requests are faster than this” (maybe ~200 ms).
p99 means “99% of requests are faster than this” (maybe ~900 ms).

Users notice the slow ones (the “tail”), so we measure p95 and p99 in addition to averages.

Throughput (RPS/QPS): Requests per second. How many requests your service can process in a given time.
Concurrency: How many requests are being processed at once.
Errors: Non-2xx responses (500, 502) or timeouts. Even a 1% error rate is a red flag in production.

–

Types of Performance Tests

Smoke Test – A tiny test (1 request per second for 1 min) just to check if the endpoint works.
Baseline Test – Light load to capture normal latency under “calm” conditions.
Load Test – Run the system at expected traffic (say 100 RPS) for 10–15 minutes. Does it still meet your latency/error targets?
Stress Test – Push past expected traffic until it breaks. This tells you where the limits are.
Spike Test – Jump suddenly from low → high traffic. Can the system autoscale?
Soak Test – Run for hours at moderate load. Useful to find memory leaks or slow drifts.

Each one answers a different question.

Strategy: How to Run Your First Test

Define success first.

Example: “At 100 RPS, p95 ≤ 300 ms, error rate ≤ 0.1%.”

Start small.

Run a smoke test: 1 VU (virtual user), 1 request per second.

Ramp up gradually.

Increase RPS step by step until you reach your target.

Measure carefully.

Look at p50/p95/p99 latency, errors, throughput.

Observe your system.

Is CPU near 100%?
Are DB connections maxed?
Is an external service slow?

Document the results.

Write down what you tested, the numbers, and what you learned.

–

What to Look For in Results

Good signs
p95 stable across the run
Errors < 0.1%
CPU and DB usage below ~70%
Warning signs
p95/p99 climbing while p50 stays flat → system under strain
Errors/timeouts creeping in
DB or external services throttling

Wrap-Up

When I started, terms like p95 and throughput felt intimidating. But once I ran my first smoke test, it clicked: latency is just “how long it takes,” and load testing is just “seeing if it still works when many requests come in.”

The important part is to:

Learn the basic terms (p95, RPS, errors).
Run small tests first.
Build up to realistic load.
Watch how your system behaves, not just the test numbers.

If you’ve never done load testing before, I encourage you to try a 5-minute k6 script on your own API. It’s eye-opening to see how your service behaves under pressure.

Key Terms Explained#

Why not average?#

Types of Performance Tests#

Strategy: How to Run Your First Test#

What to Look For in Results#

Wrap-Up#