- Block I/O Stack
Raw BlockDevice I/O->, ,>[Volume Manager] ,------------------, /HostBus\
[Page Cache]<-->[FS] | (if used) | Block Layer | |Adapter|
|, | | |, | ,----------+->|Driver |
[BlockDevice Interface]-' [Device Mapper]->|[Classic] [MultiQ]+->\(SCSI) /
|______Scheduler___| '------>[Disk]
-
I/O types for tracing:
R: read,W: write,M: meta,S: sync,A: read-ahead,F: flush/force-unit-access,D: discard,E: erase,N: none. -
I/O queued is scheduled in Block Layer by Classic (NOOP, Deadline, CFQ) or MultiQueue schedulers. Classic use single request queue with single lock, a perf bottleneck for multi-cores. Multi-queue schedulers are..
- None: no queueing
- BFQ (Budget Q Fair Scheduling): allocates bandwidht and I/O time; similar to Completely Fair Queue
- mq-deadline: blk-mq version of Deadline
- Kyber: adjusts r/w dispatch queue lengths on perf; so latencies can be met
-
BPF to trace disk I/O req details; queued times; latency outliers; latency distribution; disk errors, scsi cmds & timeouts.
-
Sample strategy: Basic disk metrics (IOPS with
iostat); Trace block I/O latency dist & latency outliers (withbiolatency); Trace individual I/O for patterns as reads queue behind writes (withbiosnoop).
-
iostatfor per-disk I/O stats (IOPS, throughput, I/O req times & use). Columns forrrqm/s(read req queued & merged /sec),wrqm,r/s(read completed req/sec),w/s,rkB/s(KBs read from disk /sec),wkB/s. -
perftracing queueing of requestsblock_rq_insert, issue to storageblock_rq_issue& completionblock_rq_complete. BPF'sbiosnoopfor efficient alternative. -
blktracetacing block I/O events. Can cause overload. -
SCSI Logging via
dmesg; needsysctl -w dev.scsi.logging_level=0x1b6db6db.
,--------------------------,
| App | (tools from BCC & bpftrace)
|--------------------------|
| SysCall Interface |
|----------,---------------|
| Rest of | [VFS ]-|
| | [FileSystems]-| biopattern, biostacks, bioerr, seeksize,
| | [BlockDevice]-|<--biotop, biosnoop, biolatency, bitesize
| Kernel | [VolManager ]-|<--mdflush
| | [Block Layer] |<--iosched, blkthrot
| | [HBA (SCSI) ] |<--scsilatency, scsiresult
|----------:---------------|
| Device Drivers |<--nvmelatency
'--------------------------'
biolatency&biosnoopcan help analysing cloud env to isolate drives breaking latency SLOs.
biolatency -Q 10 1 # includes OS time as well
biolatency -D # shows histograms for disk separately
biolatency -Fm # in millisec histogram, sepeate for each I/O flag
Default read-ahead configs can ruin perf for heavy apps on fast disks; can be analyzed with
biosnoop.
-
biotop -Cas top for disks;bitesizetracks size of Disk I/O split by procs.seeksizetracking how many sectors seek are requested. -
biopatternto identify random or sequential I/O.biostackstraces full I/O latency with stacktrace.bioerrto trace error details. -
mdflushtracing multiple device flush events.ioschedtracing I/O sched queued time.
-
Count block I/O tracepoints
funccount t:block:*. -
Block I/O errors
trace 't:block:block_rq_complete (args->error) "dev %d type %s error %d", args->dev, args->rwbs, args->error'.