Understanding Disks, Part 1: Why Disk I/O Even Exists

Before you look at iostat, before you blame disks, before you scale hardware — you need one correct mental model.

Let’s build it.

Why does disk I/O even exist?

A program never wants disk.

A program wants data.

Disk I/O happens only in two situations:

Read: data is not in RAM
Write: data must be made durable

Everything else you observe — latency, queues, utilization — is just a consequence of this.

So when someone says:

“The disk is slow”

They usually mean one (or more) of these:

High access latency → I/O takes time
Too many requests → IOPS pressure
Too much data → throughput limit

That’s it.

Every disk metric you’ll ever see maps back to these three forces. Nothing more.

What actually happens when a process needs data?

Let’s walk the real path.

When a process requests data:

The kernel checks the page cache
If the data is there → no disk I/O
If not → kernel issues a block I/O request
The request enters a queue
The disk services it
The process wakes up

Critical detail (do not skip this):

The process is sleeping, not “using disk”.

That single fact explains most production confusion.

So remember:

High disk stats ≠ high CPU usage
A waiting process ≠ a busy CPU
A sleeping process consumes zero CPU

What `iostat` actually observes

iostat does not see:

Page cache hits
Application logic
Why the I/O happened

It sees only one thing:

Block I/O requests that reached the disk

That’s it.

So iostat answers:

“What happened after a cache miss?”

It does not answer:

“Why is my application slow?”

If you expect iostat to explain slowness by itself, you’re already wrong.

The only promise `iostat` makes

iostat promises exactly four things:

How many requests hit the disk
How big those requests are
How long they take
Whether they pile up

Nothing more. Nothing less.

Expect more, and you’ll misread it.

Stripping `iostat -x` to its bones

Forget columns. Think in questions.

1️⃣ Are requests even happening?

r/s, w/s

If these are near zero, the disk is irrelevant.

Stop looking at it.

2️⃣ How much data is moving?

rkB/s, wkB/s

This tells you whether you’re dealing with:

Lots of small I/O
Or fewer large transfers

Throughput problems live here.

3️⃣ Are requests slow?

await

This is end-to-end latency — from submission to completion.

But:

High await does not automatically mean bad disk.

Hold that thought.

4️⃣ Are requests queueing?

avgqu-sz

This tells you whether the disk (or something before it) can keep up.

Queue ≠ broken disk. You’ll learn why later.

5️⃣ Is the disk busy at all?

%util

This only tells you whether the disk had work.

It does not prove saturation by itself.

That’s it.

Everything else is decoration — for now.

The first mental checkpoint (critical)

Say this out loud:

High await does not automatically mean the disk is bad

Sometimes the disk is fine, but:

I/O is serialized
Requests are throttled
Forced syncs (fsync, journaling) are in play

You don’t need to understand those yet — just accept that latency ≠ disk failure.

Glue this model permanently

Process requests block access → goes to sleep → wakes on completion

This is the core mental model.

If a process is waiting on disk I/O:

CPU is not the bottleneck for that thread
CPU cannot help until I/O completes
Adding CPU does nothing

So:

I/O wait is not CPU starvation

For that thread, they are mutually exclusive.

Four common system states

Case 1: CPU high, disk low

→ CPU-bound workload
→ Disk stats irrelevant

Case 2: Disk high, CPU low

→ I/O-bound workload
→ Processes sleeping
→ System “feels slow”

Case 3: Disk high, CPU high

→ Mixed workload
→ Often bad application behavior (sync I/O, poor batching)

Case 4: Disk low, CPU low, system slow

→ Not a disk problem
→ Look at locks, memory pressure, network, or application logic

Repeat this until it sticks:

CPU executes.
Disk serves.
Queues wait.
Processes sleep.

The first real insight

Here it is:

If CPU is low and disk stats are high,
the system is slow because progress depends on I/O completion — not execution speed.

That single sentence explains 80% of production disk incidents.

Final lock-in question

What does a sleeping process tell you about CPU usage at that moment?

Answer:

When a process issues a blocking disk read and the data is not in page cache:

The kernel submits a block I/O
The process is put to sleep
The process consumes zero CPU until completion

Two clarifications you must internalize

1️⃣ CPU is low for that process, not necessarily the system

Other processes may still run.

Correct model:

I/O wait removes the requesting process from the run queue

Incorrect model:

“The CPU becomes idle”

2️⃣ A queue does not always mean an overloaded disk

Requests may queue because of:

Block layer scheduling
Device ordering
Journaling or fsync
Enforced serialization

This will matter later when interpreting avgqu-sz.

Foundation complete

Burn this into your brain:

Sleeping on disk I/O = zero CPU consumption for that thread

If you forget this, every disk metric you read will lie to you.

Understanding Disks, Part 1: Why Disk I/O Even Exists

Why does disk I/O even exist?

What actually happens when a process needs data?

Critical detail (do not skip this):

What `iostat` actually observes

The only promise `iostat` makes

Stripping `iostat -x` to its bones

1️⃣ Are requests even happening?

2️⃣ How much data is moving?

3️⃣ Are requests slow?

4️⃣ Are requests queueing?

5️⃣ Is the disk busy at all?

The first mental checkpoint (critical)

Glue this model permanently

Four common system states

Case 1: CPU high, disk low

Case 2: Disk high, CPU low

Case 3: Disk high, CPU high

Case 4: Disk low, CPU low, system slow

The first real insight

Final lock-in question

Two clarifications you must internalize

1️⃣ CPU is low for that process, not necessarily the system

2️⃣ A queue does not always mean an overloaded disk

Foundation complete

Comments

More from this blog

2.reading iostat

Command Palette

Why does disk I/O even exist?

What actually happens when a process needs data?

Critical detail (do not skip this):

What iostat actually observes

The only promise iostat makes

Stripping iostat -x to its bones

1️⃣ Are requests even happening?

2️⃣ How much data is moving?

3️⃣ Are requests slow?

4️⃣ Are requests queueing?

5️⃣ Is the disk busy at all?

The first mental checkpoint (critical)

Glue this model permanently

Four common system states

Case 1: CPU high, disk low

Case 2: Disk high, CPU low

Case 3: Disk high, CPU high

Case 4: Disk low, CPU low, system slow

The first real insight

Final lock-in question

Two clarifications you must internalize

1️⃣ CPU is low for that process, not necessarily the system

2️⃣ A queue does not always mean an overloaded disk

Foundation complete

Comments

More from this blog

What `iostat` actually observes

The only promise `iostat` makes

Stripping `iostat -x` to its bones