§ 036 · Linux

Technical Analysis: Why 0.2ms Queries Can Still Result in Low QPS

It is a classic trap: you spend a week fine-tuning your cache and your database hits only to realize your application is essentially running with the handbrake on. I saw this with my tests: MySQL and Readyset were returning rows in 0.2 ms, but the application throughput remained capped at 40 queries per second (QPS).

The database isn’t the bottleneck, and the cache isn’t the bottleneck. Your application code is simply leaving performance on the table. While running some demos recently, I found that a Node.js service went from 40 QPS to over 6,000 QPS, which is a 150x improvement without changing a single database setting or query plan. Here is a post-mortem of how application-layer bottlenecks can neutralize even the most aggressive infrastructure wins.

The “Fairness” Trap: Event Loop Lag

In Node.js and similar async runtimes we are taught to “be nice” to the event loop. We use setImmediate() or yield to ensure our heavy loops do not starve I/O or HTTP handlers. The intent is noble, but the result can be catastrophic when dealing with high-performance backends.

In my demo the worker yielded after every single query. The problem is that setImmediate does not execute “immediately”; it schedules the callback for the next iteration of the event loop after all I/O polling and microtask queues are processed. If your event loop has any pressure from Prometheus metrics or health checks that yield might take 3.4 ms to return.

If your query takes 0.2 ms but your yield takes 3.4 ms you are spending 95% of your time waiting for the application to wake back up. The math is simple: a 3.6 ms total iteration time (3.4 ms yield + 0.2 ms query) creates a theoretical ceiling that no amount of database tuning can break.

The Fix: Do not yield on every iteration. Batch your work instead. Yielding every 64 or 128 iterations keeps the app responsive without paying the “Event Loop Tax” on every single row.

The Defensive Sleep

We have all written code that says if there is no work to do then sleep for 100 ms so we do not spin the CPU. It is a standard safety measure for pollers and background workers. But what happens when a configuration change or a specific filter makes that “empty” state common?

In my debugging session a random 20% of iterations were hitting a “no work found” branch that triggered a setTimeout(100).

  • 80% active work × 0.17 ms = 0.136 ms

  • 20% empty branch × 100 ms = 20 ms

  • Total average time per loop is ~20.14 ms which equals ~50 QPS max

This bug is a chameleon. It looks like a slow database because the throughput is low but the database is not even being called. It is a defensive “nothing to do” sleep that accidentally becomes the primary bottleneck.

The Fix: Audit your setTimeout and setInterval calls. If a worker finds no work in its primary queue have it check a fallback or revalidate its state before it commits to a 100 ms nap.

Connection Pool Friction

Connection pooling is a best practice for a reason but it is not free. Every time you call pool.getConnection() the driver has to check the pool for an available connection and potentially validate it via a hidden SELECT 1.

On a local network, this is negligible. But move your app to the cloud, where there is cross-AZ latency or TLS handshakes, and suddenly checking out a connection for a 0.2 ms query triples your total latency.

The Fix: If you have a dedicated background worker, stop checking connections in and out of the pool for every single query. Hold a single connection for the duration of the worker lifecycle to eliminate the checkout overhead.

Post-Mortem: How to Audit Your Own Stack

If you suspect your app is the bottleneck, stop looking at the database dashboard and run these tests:

  • Benchmark the Raw Path: Run a tight loop with no yields and no sleeps against the DB from the app server.

  • Measure Event Loop Lag: Use prom-client to track nodejs_eventloop_lag_seconds. Anything above 1 ms means your yields are expensive.

  • The Sleep Audit: Search your codebase for setTimeout inside hot paths or worker loops. Ask what happens when that branch becomes the common case due to a configuration change.

  • The AI Code Review: If your logic was generated by an LLM, pay extra attention to async patterns. AI models are trained to be “safe” and often over-insert defensive yields or sleeps that prioritize system stability over maximum throughput. Always validate that AI-generated loops aren’t introducing accidental “dead time” between high-speed database calls.

Optimization is a full-stack game. You can have the fastest database or the most aggressive cache in the world, but if your application logic introduces milliseconds of “dead time” between calls, your infrastructure ROI will stay near zero.

Do not just tune the engine. Check if the brakes are rubbing.

Written by

Vinicius Grippa

Writes this blog. Mostly about databases. Boring on purpose.

More about me →

The floor is yours.

0 comments · Moderated · civil & on-topic

First comment appears here once approved. Questions, corrections, and counterpoints welcome — just no self-promotion.

Add a comment

Your email address is never published. * required

Subscribe · Posted when ready

A quiet, technical email about databases.

One post per send, corrections when I’m wrong, nothing else. No social-media cross-posts. No “what we learned.”

Unsubscribe with any reply