Kernel metrics#77
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a unified, optionally-enabled kernel metrics surface for observing heap allocator usage and per-thread stack usage, wired through the scheduler and HAL stack abstractions.
Changes:
- Add a new
osiris::metricsmodule exposing global heap, per-task heap, and per-thread stack metrics APIs. - Extend the best-fit allocator and Cortex-M stack implementation to track/report allocator and stack usage metrics (behind
metricsfeature /osiris_metricscfg). - Add build/config plumbing (
OSIRIS_DEBUG_METRICS, feature flags, unexpected_cfgs allowlists) to enable metrics across crates.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sched/thread.rs | Adds per-thread stack metrics accessor (cfg-gated). |
| src/sched/task.rs | Adds per-task heap metrics accessor (cfg-gated). |
| src/sched.rs | Adds scheduler entry points to query per-thread/per-task metrics (cfg-gated). |
| src/metrics.rs | New unified public metrics API module. |
| src/mem/vmm/nommu.rs | Exposes AddressSpace allocator metrics (cfg-gated). |
| src/mem/alloc/bestfit.rs | Adds allocator metrics struct/counters, snapshot method, and tests (cfg-gated). |
| src/mem.rs | Adds global heap metrics snapshot helper (cfg-gated). |
| src/lib.rs | Exposes pub mod metrics behind cfg. |
| presets/stm32l4r5zi_def.toml | Adds OSIRIS_DEBUG_METRICS preset env var. |
| options.toml | Adds a debug.metrics option definition. |
| machine/cortex-m/src/native/sched.rs | Adds stack peak tracking + stack metrics implementation + tests (cfg-gated). |
| machine/cortex-m/Cargo.toml | Adds metrics feature and allows cfg(osiris_metrics) in lints. |
| machine/cortex-m/build.rs | Emits cfg(osiris_metrics) when OSIRIS_DEBUG_METRICS is set. |
| machine/api/src/stack.rs | Adds StackMetrics type + default Stacklike::metrics() (cfg-gated). |
| machine/api/Cargo.toml | Adds metrics feature and allows cfg(osiris_metrics) in lints. |
| machine/api/build.rs | Emits cfg(osiris_metrics) when OSIRIS_DEBUG_METRICS is set. |
| Cargo.toml | Adds top-level metrics feature and allows cfg(osiris_metrics) in lints. |
| build.rs | Emits cfg(osiris_metrics) when OSIRIS_DEBUG_METRICS is set. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| peak_offset: 0, | ||
| }; | ||
|
|
||
| stack.push_irq_ret_fn(entry, ctx, fin)?; |
| total_bytes, | ||
| used_bytes, | ||
| free_bytes: total_bytes - used_bytes, | ||
| peak_used_bytes: self.peak_offset * word, | ||
| } |
| //! Enabled by the `metrics` Cargo feature. Provides: | ||
| //! - Global heap metrics via [`kernel_metrics`] / [`global_heap_metrics`] | ||
| //! - Per-task heap metrics via [`task_heap_metrics`] | ||
| //! - Per-thread stack metrics via [`thread_stack_metrics`] | ||
| //! | ||
| //! For full stack metrics, the backend crate's `metrics` feature must also be | ||
| //! enabled (e.g. `hal_cortex_m/metrics`). Without it, stack metrics return zeros. |
| use crate::hal::mem::PhysAddr; | ||
| let layout = | ||
| std::alloc::Layout::from_size_align(length, align_of::<u128>()).unwrap(); | ||
| let ptr = unsafe { std::alloc::alloc(layout) }; |
thomasw04
left a comment
There was a problem hiding this comment.
Looks good. Left a few suggestions and comments. Also: please don't use mod.rs anymore, use the new layout like the rest of the codebase.
| /// Head of the free block list. | ||
| head: Option<NonNull<u8>>, | ||
| #[cfg(any(feature = "metrics", osiris_metrics))] | ||
| total_bytes: usize, |
There was a problem hiding this comment.
Why not a struct AllocatorMetrics?
| /// Called on every reschedule so external readers can access metrics without | ||
| /// acquiring the scheduler lock. | ||
| #[cfg(any(feature = "metrics", osiris_metrics))] | ||
| fn mirror_stats(&self) { |
There was a problem hiding this comment.
Updating everything here is unnecessary and very costly. On sched_enter there is only one thread and one task that needs an update. The one that was currently running. All other threads/tasks did not change. For now, we ignore multi-core/SMP systems.
| /// Returns the latest global kernel heap snapshot, or `None` if the scheduler | ||
| /// has not yet run a single reschedule. | ||
| pub fn global_heap() -> Option<HeapSnapshot> { | ||
| store::read_global_heap() |
There was a problem hiding this comment.
I think store::global_heap() would be a better name, as it's more consistent with the rest of the codebase. Same for the other functions.
| println!("cargo::rerun-if-changed=build.rs"); | ||
| println!("cargo::rerun-if-env-changed=OSIRIS_DEBUG_METRICS"); | ||
| if std::env::var("OSIRIS_DEBUG_METRICS").map_or(false, |v| v == "true" || v == "1") { | ||
| println!("cargo::rustc-cfg=osiris_metrics"); |
There was a problem hiding this comment.
All other cfg's do not have the osiris_ prefix. I think calling it "metrics" here is fine.
| /// Snapshot of allocator resource usage. Available when the `metrics` feature is enabled. | ||
| #[cfg(any(feature = "metrics", osiris_metrics))] | ||
| #[derive(Debug, Clone, Copy)] | ||
| pub struct AllocatorMetrics { |
There was a problem hiding this comment.
I would put this struct directly in alloc.rs and name it Metrics (alloc::Metrics), as these metrics should be relevant for all types of allocators.
| } | ||
|
|
||
| #[cfg(any(feature = "metrics", osiris_metrics))] | ||
| { |
There was a problem hiding this comment.
I suggest creating then functions in the Metrics struct, such as "record_alloc" that you then just call from all tracing points.
No description provided.