Before talking about tuning or root cause, just answer one thing:

What do these stats say is happening?
Not why. Only what.

Fictional `iostat -x` line

Device   r/s   w/s   rkB/s   wkB/s   await   avgqu-sz   %util
sda      50    10    2000    400     30.0    2.5        95

1️⃣ Are disk requests happening?

✅ Yes

~60 I/O requests per second total

2️⃣ Reads or writes?

✅ Mostly reads

50 r/s vs 10 w/s
Read traffic dominates
Reads are small (~40 kB per read)

3️⃣ Are requests completing quickly or slowly?

⚠️ Moderately slow

await = 30 ms
This is latency, not bandwidth

4️⃣ Are requests queueing?

⚠️ Yes

avgqu-sz = 2.5
More than one request waiting on average

5️⃣ Is disk busy?

✅ Almost always

%util = 95%

One killer insight (this matters)

Even with %util at 95%:

Throughput is tiny (~2 MB/s)
I/O size is small
Latency exists
Queueing exists

This is not heavy I/O.

This is many small read requests keeping the disk busy.

The disk is busy handling lots of small read I/O with moderate latency, so requests queue up even though total throughput is low.

Why is `%util` high even though throughput is very low?

Because %util measures time, not data.

The disk is almost always busy servicing many small I/O requests.
Each request takes time, so the disk stays occupied even though very little data is transferred per second.

High %util ≠ high throughput.
It just means the disk rarely gets to rest.

If this feels obvious, good.
If it doesn’t, this is exactly where people start misreading disks.

Understanding Disk Wait: `await` vs `svctm`

If you want to truly understand disk behavior, nothing is more important than what your processes actually experience while waiting. That’s where await and svctm come in.

1️⃣ `await` — the real deal

Conceptual definition:

Time from IO submission → IO completion
Includes queue wait + disk service time

Why it matters:

This is the true latency your processes see.
High await = processes sleeping on IO.
Even if CPU is idle, your app can be blocked.

Key insight:

await = user-perceived latency

Visual mental model:

Process submits I/O
       |
       v
   Queue → Disk → Completion
       ^
       |
  Process sleeps

await = total wait (queue + disk)
High await = applications feel slow, system “stuck”

2️⃣ `svctm` — the liar

Conceptual definition:

Time disk actually spends serving a request

Why it’s misleading:

Modern Linux kernels + multi-queue devices → calculation is meaningless
Does not include queue time → drastically underestimates latency
Example:

svctm = 1 ms
await = 20 ms

People see this, panic, and blame the “disk” incorrectly.

Rule of thumb:

Ignore svctm. Always trust await.

Quick summary for your mental model

Metric	What it really tells you
`await`	Total time process waits (queue + service)
`svctm`	Disk-only service time (rarely accurate)

Takeaway:

High await = process waiting → system slow
CPU can be idle, but progress is blocked.

exercise (write this mentally, don’t look up)

Look at this fake iostat:

Device    r/s   w/s   rkB/s  wkB/s  await  svctm  %util
sda       100    0    4000    0     25.0   1.5    95

Answer truthfully, no guesses:

Real latency per IO?
Is disk “fast” or “slow”?
Does svctm reflect actual process experience?

What does %util say?

 Real latency per IO: 25 ms (await)

 Disk service time: 1.5 ms (svctm, only disk handling) → most time is queueing (23.5 ms)

 Disk type likely HDD; throughput ~4 MB/s; latency normal for small random IO
 svctm does NOT reflect process experience

 Disk almost always busy (%util = 95%), but throughput low due to small IOs , disk had at 
 least one request in progress 95% of the time

Let’s go full production drill. No guessing. I’ll show a real-world snapshot and how to read every column exactly like you’re under pressure.

Sample `iostat -x 1 1`

 Device    r/s   w/s   rkB/s   wkB/s   await   svctm   avgqu-sz   %util
 sda      150    20    6000    400     28.0    2.0      3.5       97
 sdb       40    80    1600   3200     12.5    1.0      1.2       50
 sdc        0     0       0      0      0.0    0.0      0.0        0

Step 1 — Only read the numbers

No guessing causes. Just facts.

1️⃣ Are requests happening?

sda: yes, active I/O
sdb: yes, active I/O
sdc: no activity

2️⃣ Read-heavy or write-heavy?

sda: read-heavy (150 r/s vs 20 w/s)
sdb: write-heavy (40 r/s vs 80 w/s)

3️⃣ How big are the I/Os? (rough estimate)

sda: ~40 KB per read, ~10 KB per write
sdb: ~40 KB per read, ~40 KB per write

4️⃣ Are requests slow? (`await`)

sda: 28 ms → higher latency, noticeable to applications
sdb: 12.5 ms → moderate latency, usually fine under normal load

5️⃣ Are requests queueing? (`avgqu-sz`)

sda: 3.5 → noticeable queueing
sdb: 1.2 → light queueing

6️⃣ Is the disk busy? (`%util`)

sda: 97 % → nearly saturated
sdb: 50 % → moderately utilized
sdc: 0 % → idle

✅ Key takeaway

sda is the bottleneck: high latency, high queue, almost maxed out
sdb is active but healthy: moderate latency, some queue, half-utilized
sdc is irrelevant: idle, no requests

2.reading iostat

Fictional `iostat -x` line

1️⃣ Are disk requests happening?

2️⃣ Reads or writes?

3️⃣ Are requests completing quickly or slowly?

4️⃣ Are requests queueing?

5️⃣ Is disk busy?

One killer insight (this matters)

Why is `%util` high even though throughput is very low?

Understanding Disk Wait: `await` vs `svctm`

1️⃣ `await` — the real deal

2️⃣ `svctm` — the liar

Quick summary for your mental model

exercise (write this mentally, don’t look up)

Sample `iostat -x 1 1`

Step 1 — Only read the numbers

1️⃣ Are requests happening?

2️⃣ Read-heavy or write-heavy?

3️⃣ How big are the I/Os? (rough estimate)

4️⃣ Are requests slow? (`await`)

5️⃣ Are requests queueing? (`avgqu-sz`)

6️⃣ Is the disk busy? (`%util`)

✅ Key takeaway

Comments

More from this blog

Understanding Disks, Part 1: Why Disk I/O Even Exists

Command Palette

Fictional iostat -x line

1️⃣ Are disk requests happening?

2️⃣ Reads or writes?

3️⃣ Are requests completing quickly or slowly?

4️⃣ Are requests queueing?

5️⃣ Is disk busy?

One killer insight (this matters)

Why is %util high even though throughput is very low?

Understanding Disk Wait: await vs svctm

1️⃣ await — the real deal

2️⃣ svctm — the liar

Quick summary for your mental model

exercise (write this mentally, don’t look up)

Sample iostat -x 1 1

Step 1 — Only read the numbers

1️⃣ Are requests happening?

2️⃣ Read-heavy or write-heavy?

3️⃣ How big are the I/Os? (rough estimate)

4️⃣ Are requests slow? (await)

5️⃣ Are requests queueing? (avgqu-sz)

6️⃣ Is the disk busy? (%util)

✅ Key takeaway

Comments

More from this blog

Fictional `iostat -x` line

Why is `%util` high even though throughput is very low?

Understanding Disk Wait: `await` vs `svctm`

1️⃣ `await` — the real deal

2️⃣ `svctm` — the liar

Sample `iostat -x 1 1`

4️⃣ Are requests slow? (`await`)

5️⃣ Are requests queueing? (`avgqu-sz`)

6️⃣ Is the disk busy? (`%util`)