§ 039 · MySQL

Big O for MySQL: Why the Same Query Gets Slow at Scale

Big O for MySQL, explained visually: how O(1), O(log n), O(n), O(n log n), O(n²), and O(2ⁿ) decide whether your query stays fast as your data grows.

Welcome to the everyday problem Big O was invented to describe. This post is a practical, MySQL-flavoured tour of time complexity. We’ll keep the math light, the pictures clear, and we’ll finish with the exact algorithms your database uses to keep your queries fast (or slow).

What is Big O, really?

Forget seconds. Big O is not a unit of time. It’s a way of describing how the cost of an algorithm grows as the input grows. Specifically: if you double the data, does the work double? Stay the same? Quadruple? That shape is the algorithm’s complexity.

We call the input size n. A few shapes come up over and over in real systems:

  • O(1) — constant. The size of n doesn’t matter. Accessing an array by index, or a hash lookup.
  • O(log n) — logarithmic. Doubling n adds one more step. Binary search, B-tree traversal.
  • O(n) — linear. Work scales directly with size. A full table scan.
  • O(n log n) — linearithmic. A sort over n items.
  • O(n²) — quadratic. For every row, look at every other row. A naive nested-loop join.
  • O(2ⁿ) — exponential. One more row doubles the work. The ultimate danger class: combinatorial explosion, travelling salesman, brute-force subset enumeration.

The useful picture is this: as n grows, these curves diverge dramatically. Big O tells you which curve your query is riding.

Growth curves of O(1), O(log n), O(n), O(n log n), O(n²), and O(2ⁿ).
Same axes, same n — wildly different fates. The curve your query rides is the single biggest predictor of whether it’ll still be fast next year.

Little, medium, lots: the numbers that will ruin your week

Big O only matters because, at some size, the curves stop overlapping. Below is the same story with real numbers. We ignore constant factors — those matter in practice, but they don’t change the shape.

Complexity n = 100 n = 10,000 n = 10,000,000
O(1) 1 1 1
O(log n) 7 14 24
O(n) 100 10,000 10,000,000
O(n log n) 700 140,000 ≈ 240,000,000
O(n²) 10,000 100,000,000 100,000,000,000,000
O(2ⁿ) ≈ 1.3 × 1030 beyond atoms in the universe unthinkable

At n = 100 the count already dwarfs the number of atoms in your body. Exponential is not a number you compute with — it’s a number you avoid.

At n = 100 everything is fine — a nested loop over 100 × 100 rows is 10 K operations, basically free. At n = 10 million, the same nested loop is 100 trillion operations. Your query planner’s job is to never let that happen.

Notice something: O(log n) barely moves. Going from 100 rows to 10 million rows takes the count from 7 up to 24 — only 17 more operations for 100,000× more data. That’s the superpower of a B-tree index, and it’s why the first rule of MySQL tuning is “add an index.”

Why this matters specifically for MySQL

MySQL’s optimizer is, at heart, a machine for picking the algorithm with the best Big O for your actual data. Every time you run EXPLAIN, you’re looking at the optimizer’s answer to one question: “Which shape of curve should this query ride?”

Three things decide the answer:

  1. Which indexes exist — because each index unlocks an O(log n) access path.
  2. How many rows are expected — because at small n, a scan can beat an index (the constants win).
  3. How the tables join — because the difference between nested loop and hash join is the difference between O(n × m) and O(n + m).

Everything the optimizer does — index choice, join order, whether to use a temp table, whether to sort with filesort — is cost math. And cost math is just Big O with constants attached.

MySQL’s workhorse algorithm: the B+tree search

InnoDB stores every table as a B+tree on the primary key, and every secondary index as another B+tree. A B+tree of a billion rows is only about four levels deep. That’s the magic of O(log n) in action.

Here’s the search in motion. A query arrives (“find row with id = 42”), and the tree is walked from the root, one level per read, until it lands on the leaf page holding the row. With one billion rows, it’s still only four page reads.

Animated B+tree search — token descending root → internal → leaf.
A B+tree lookup visits one page per level. 10 million rows → 3–4 page reads. A billion rows → 4–5. The log is doing the heavy lifting.

This is why the single most impactful thing you can do for a slow query is usually “add an index on the column you’re filtering by.” You’re not making the query faster by a constant factor — you’re changing its complexity class from O(n) to O(log n). At 10 million rows, that’s a 400,000× difference in pages touched.

If you want to see this in motion with real MySQL semantics — clustered vs secondary index, covering vs non-covering — there’s an interactive version in the myflames project under docs/teach/index/btree.html.

The other shapes, as MySQL operators

O(n) — the full table scan

No usable index? MySQL reads every row. Double the rows, double the work. That’s type: ALL in EXPLAIN, and it’s the reason “this query was fine last year” is such a common support ticket.

EXPLAIN SELECT * FROM orders WHERE note LIKE '%refund%';
-- type: ALL
-- rows: 10,000,000
-- Extra: Using where

A LIKE '%something%' can’t use an index prefix — the wildcard at the front breaks the B-tree’s ordering. So the optimizer falls back to a scan: O(n). The fix is either FULLTEXT, a functional index, or rewriting the query so the wildcard is at the end.

O(n × m) — the nested loop join

For every row in the outer table, look up matching rows in the inner table. If the inner side has an index, each lookup is O(log m) and the whole join is O(n log m) — tolerable. If it doesn’t, each lookup is O(m) and the join is O(n × m) — catastrophic.

Nested loop join (O(n×m)) vs hash join (O(n+m)).
Same two tables. Left: every A row compares against every B row. Right: MySQL builds a hash on A, then probes it once per B row. At a million rows per side, the left panel is a trillion ops; the right panel is two million.

This is why the introduction of hash join in MySQL 8.0.18 was such a big deal. It gave the optimizer an O(n + m) option for cases where neither side had a useful index, turning a previously unshippable query into a merely slow one.

O(n log n) — the sort

When MySQL needs to return rows in a specific order and can’t get them pre-sorted from an index, it falls back to filesort. That step adds O(n log n) on top of whatever reading cost you already paid. For a million-row result set, that’s tens of millions of extra comparisons before a single byte goes back to the client.

This is where a sort-ordered covering index earns its keep. The classic “2-second query becomes 20 ms” speedup is actually two separate complexity wins stacked on top of each other:

1. “Sort-ordered” — the sort disappears entirely. A B-tree index is physically stored in key order, and InnoDB links the leaf pages together as a sorted list. If your ORDER BY col_x matches the index’s leading column, MySQL just walks the leaves in that already-sorted order — no comparison step, no temp buffer, nothing. The O(n log n) sort doesn’t get faster; it stops happening. Add LIMIT 20 and MySQL reads exactly 20 leaf entries and quits.

2. “Covering” — the per-row PK hops disappear. A secondary index’s leaves only contain the indexed columns plus the primary key. If your SELECT needs a column that isn’t in the index, MySQL has to do a second B-tree walk into the clustered tree for every matching row — an extra O(log n) per row, or O(k log n) total. A covering index has every column you asked for right there on the leaf, so that whole step vanishes.

Stacked, the complexity moves look like this:

Step Without index With sort-ordered covering index
Row access O(n) scan, or O(k log n) with PK hops O(log n + k) — seek + walk
ORDER BY sort O(n log n) eliminated (0 ops)
LIMIT k applied after the sort applied during the walk

That last row is the one that matters most for paginated queries: without the index, MySQL must sort all n rows even if you only want the top 20, because it can’t know which rows are the top 20 until it’s sorted them. With the index, LIMIT 20 stops the walk after 20 leaves. That is where the 100× speedup actually comes from.

O(1) — the dream (sort of)

Pure O(1) is rare in MySQL — even a primary-key lookup is O(log n) because it still walks the clustered tree. But amortized O(1) shows up in the InnoDB adaptive hash index, which builds an in-memory hash on top of frequently-accessed B-tree pages. When it hits, you skip the log-n tree walk entirely.

O(2ⁿ) — where exponential hides in MySQL

You almost never execute an exponential algorithm in MySQL — the optimizer works hard to keep it off your plate. But there’s one place where exponential work really does happen: query planning. Picking the best join order for N tables means weighing up to N! different orderings, which is in the same danger class as 2ⁿ. For 12 tables that’s nearly half a billion orderings to evaluate before the query starts running.

MySQL tames this with branch-and-bound pruning, on by default (optimizer_prune_level = 1). As the planner tries out partial join orders, it drops any branch whose cost already exceeds the cheapest complete plan found so far — there’s no point finishing a plan that’s already losing. That turns the theoretical N! down to thousands or millions of orderings actually costed. For most queries, it’s done in milliseconds and you never notice.

You do notice it on large joins (say, 15–20 tables) where the tables look similar enough that cost estimates can’t tell the good branches from the bad ones. Pruning loses its bite, and real planning time appears in the Optimizing / statistics / preparing stages (visible in SHOW PROCESSLIST) before a single row is touched. The exponential monster you avoided at runtime was waiting for you at plan time.

The EXPLAIN mindset: reading for complexity

Once you carry Big O around in your head, EXPLAIN reads differently. You stop asking “is this query fast?” and start asking “which complexity class is each row on?” Rough mapping:

EXPLAIN clue Complexity Feeling
type: const / eq_ref O(1) – O(log n) great
type: ref / range O(log n + k) good
type: index O(n) over index depends on n
type: ALL O(n) risky at scale
Using filesort + O(n log n) OK if n is small
Using join buffer (Block Nested Loop) ≈ O(n × m) scary
Using join buffer (hash join) O(n + m) good

Tools like myflames visualize this directly — the flame graph width is proportional to row count, so an O(n²) operator looks wider and hotter than an O(log n) one. The visual is just Big O wearing colour.

Takeaways

  • Big O describes the shape of a query’s cost, not the number of seconds. Shapes diverge brutally at large n.
  • Adding an index is usually a complexity change (O(n) → O(log n)), not just a speed tweak.
  • Joins without a usable inner-side index are O(n × m). Hash join is O(n + m). That’s often the difference between “works” and “doesn’t.”
  • Sorting is O(n log n). Pre-sorted indexes make sorts disappear entirely.
  • EXPLAIN is a complexity diagnosis, not a speed check. Read it that way.

The next time a query is slow, skip the “what changed?” reflex and ask “what shape is it riding?” first. Ninety percent of the time, the answer — and the fix — are in the curve.

Written by

Vinicius Grippa

Writes this blog. Mostly about databases. Boring on purpose.

More about me →

The floor is yours.

0 comments · Moderated · civil & on-topic

First comment appears here once approved. Questions, corrections, and counterpoints welcome — just no self-promotion.

Add a comment

Your email address is never published. * required

Subscribe · Posted when ready

A quiet, technical email about databases.

One post per send, corrections when I’m wrong, nothing else. No social-media cross-posts. No “what we learned.”

Unsubscribe with any reply