Skip to content

[codex] Add speculative draft Prometheus metrics#22

Draft
nycdubliner wants to merge 1 commit into
AtomicBot-ai:feature/turboquant-kv-cachefrom
nycdubliner:codex/speculative-draft-metrics
Draft

[codex] Add speculative draft Prometheus metrics#22
nycdubliner wants to merge 1 commit into
AtomicBot-ai:feature/turboquant-kv-cachefrom
nycdubliner:codex/speculative-draft-metrics

Conversation

@nycdubliner

Copy link
Copy Markdown

Summary

  • export speculative draft acceptance counters from the existing llama-server /metrics endpoint
  • label speculative counters by spec_type using the implementation type already tracked by common_speculative_state
  • document metric names, Prometheus/Grafana expressions, and curl verification

Validation

  • cmake --build build-hip-rocwmma --target llama-server -j "$(nproc)"
  • ran constrained Gemma 4 26B-A4B MTP server on 127.0.0.1:8084
  • verified curl -s http://127.0.0.1:8084/metrics | rg "speculative|draft" before and after one chat completion; counters increased from zero to generated/accepted values
  • git diff --check

Notes

The exported counters use the same in-memory source counters as the statistics <type> server log line. No log parsing or separate metrics endpoint is added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant