kafka: Add log directory offline status metric#23038
kafka: Add log directory offline status metric#23038piochelepiotr wants to merge 2 commits intomasterfrom
Conversation
Add kafka.log.directory.offline metric from the kafka.log:type=LogManager,name=LogDirectoryOffline JMX bean. Reports whether each log directory is offline (0 = healthy), tagged with log_directory path and kafka_cluster_id. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cf31a577d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # | ||
| - include: | ||
| domain: 'kafka.log' | ||
| bean_regex: 'kafka\.log:type=LogManager,name=LogDirectoryOffline,logDirectory="(.*)"' |
There was a problem hiding this comment.
Accept unquoted logDirectory in bean regex
The new matcher requires logDirectory="...", but this JMX key is frequently exposed as an unquoted value (for example logDirectory=/var/lib/kafka) depending on Kafka/JMX object-name serialization. In that common case this include rule never matches, so kafka.log.directory.offline is not emitted at all despite being added to metadata/tests. Please make the regex tolerate both quoted and unquoted forms to avoid silently dropping the metric.
Useful? React with 👍 / 👎.
Summary
kafka.log.directory.offlinemetric fromkafka.log:type=LogManager,name=LogDirectoryOffline,logDirectory=<path>JMX beanlog_directorypath andkafka_cluster_idMotivation
Kafka exposes per-log-directory health status via JMX. Monitoring this allows alerting when a log directory goes offline (e.g. disk failure), which can cause data loss if not addressed.
Test plan
kafka.log.directory.offlineis emitted withlog_directorytag containing the path🤖 Generated with Claude Code