feat(jans-cedarling): implement collecting and sending telemetry for the lock server#13787
feat(jans-cedarling): implement collecting and sending telemetry for the lock server#13787
Conversation
* refactored `LogWorker` to generic `AuditWorker` to support both log and telemetry audit channels * Added `AuditKind` enum to discriminate between audit channels and carry target URL Signed-off-by: dagregi <dagmawi.m@proton.me>
Signed-off-by: dagregi <dagmawi.m@proton.me>
* start collecting and logging metrics logs * map metrics logs to telemetry logs and send to the lock server Signed-off-by: dagregi <dagmawi.m@proton.me>
📝 WalkthroughWalkthroughA new Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
| policy_stats: snapshot.policy_stats, | ||
| }; | ||
|
|
||
| self.config.log_service.log_any(metrics_entry); |
There was a problem hiding this comment.
Hmm, do we need to log metrics on each decision? It will be a lot of logs in the console
|
|
||
| /// Converts a [`LockServerMetricsEntry`] into a [`TelemetryEntry`] | ||
| fn telemetry_json_to_proto(entry: LockServerMetricsEntry) -> TelemetryEntry { | ||
| TelemetryEntry { |
There was a problem hiding this comment.
It is different from what is described in https://github.com/JanssenProject/jans/wiki/Cedarling-Telemetry-RFC#proto-definition
Signed-off-by: dagregi <dagmawi.m@proton.me>
* fix formatting issue Signed-off-by: dagregi <dagmawi.m@proton.me>
| ) -> Result<(), DataError> { | ||
| // Validate key | ||
| if key.is_empty() { | ||
| self.metrics.increment_error("data.invalid_key"); |
There was a problem hiding this comment.
isn't better add method to DataError that returns metric_error_key?
And in impl DataApi for Cedarling catch error and add to metric? This approach looks simpler and cleaner
There was a problem hiding this comment.
used matches for errors mentioned in the rfc in 27f6b5d
| self.metrics.record_jwt_validation(false); | ||
| }, | ||
| } | ||
| result |
There was a problem hiding this comment.
We can use
| result | |
| self.validate_single_token_inner(token_kind, jwt) | |
| .inspect(|_| self.metrics.record_jwt_validation(true)) | |
| .inspect_err(|_| self.metrics.record_jwt_validation(false)) |
Signed-off-by: dagregi <dagmawi.m@proton.me>
Signed-off-by: dagregi <dagmawi.m@proton.me>
…sion Signed-off-by: dagregi <dagmawi.m@proton.me>
There was a problem hiding this comment.
Actionable comments posted: 21
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
jans-cedarling/cedarling/src/tests/authorize_unsigned.rs (1)
261-284: 🧹 Nitpick | 🔵 TrivialConsider asserting the new
Metricbranch instead of just tolerating it.Broadening the allowed
log_kindvalues now silently accepts"Metric"entries without validating their shape. Given this PR is specifically introducing the metric log entry, the test ideally should verify at least oneMetriclog is produced for the request (or check that its payload contains expected keys such aspolicy_stats/error_counters). As written, the test would still pass if metric emission regressed to zero entries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/tests/authorize_unsigned.rs` around lines 261 - 284, The test currently allows "Metric" values for log_kind but doesn't assert any Metric-specific entries; update the test that iterates over logs (using the log and log_kind variables in authorize_unsigned.rs) to explicitly assert that at least one log has log_kind == "Metric" and that its payload contains expected keys (e.g., "policy_stats" and/or "error_counters"); keep the existing Decision assertions for when log_kind == "Decision", and add a branch for log_kind == "Metric" that fetches the metric payload from log (e.g., log.get("payload") or similar) and asserts the presence and basic shape of the metric fields so regressions that drop metric emission will fail the test.jans-cedarling/cedarling/src/lock/lock_config.rs (1)
170-202: 🧹 Nitpick | 🔵 TrivialAdd a test case covering the
telemetry_endpointfield.The typo fix (
telemetery_endpoint→telemetry_endpoint) is a behavior change that affects which JSON key deserializes intoAuditEndpoints::telemetry. The only existing test omitstelemetry_endpointfrom the input and assertstelemetry: None, so it would pass with either spelling. Consider extending the test (or adding a new one) that supplies atelemetry_endpointJSON key and asserts it correctly resolves to the/bulkURL — this guards against regressing the typo.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/lock/lock_config.rs` around lines 170 - 202, The current test should_deserialize_lock_config doesn't cover the corrected telemetry JSON key; extend this test (or add a new async tokio::test) to include "telemetry_endpoint": "https://test.com/audit/telemetry" in the src JSON and assert that deserialized.audit_endpoints.telemetry is Some("https://test.com/audit/telemetry/bulk".parse().unwrap()), similarly to how log is asserted; reference the LockConfig and AuditEndpoints types and their telemetry field to ensure the deserialization maps the telemetry_endpoint key to AuditEndpoints::telemetry with the automatic "/bulk" suffix.jans-cedarling/cedarling/src/context_data_api/store.rs (2)
107-194: 🛠️ Refactor suggestion | 🟠 MajorCentralize error-metric recording instead of repeating it at every error site.
Now that
DataError: ErrorMetricKey(pererror.rs),pushhas five duplicatedself.metrics.record_error(&err)call sites (lines 116, 129, 150, 160, 188), each wrapped in a small ad-hoc closure. A single.inspect_err(|e| self.metrics.record_error(e))(or recording once at theimpl DataApi for Cedarlingboundary) removes the duplication and keepspushreadable. This also matches the earlier reviewer's intent: "add method to DataError that returns metric_error_key [...] and in impl DataApi for Cedarling catch error and add to metric".♻️ Sketch
// push() body let result = (|| -> Result<(), DataError> { if key.is_empty() { return Err(DataError::InvalidKey); } // ... existing logic, returning DataError directly ... Ok(()) })(); result.inspect_err(|e| self.metrics.record_error(e))?; self.metrics.record_data_push(); Ok(())Or, simpler, record once in
Cedarling::push_data_ctx(lib.rs) aroundself.data.push(...)?and drop the per-site recording here entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 107 - 194, The push method duplicates self.metrics.record_error calls in multiple places; change push to return errors without recording and centralize the metric recording once by wrapping the existing logic in a short closure or Result variable inside push (e.g. let result = (|| -> Result<(), DataError> { ... })();) and then call result.inspect_err(|e| self.metrics.record_error(e))?; alternatively remove all per-site self.metrics.record_error calls and record once where push is called (e.g. in the impl DataApi for Cedarling / Cedarling::push_data_ctx) using DataError::ErrorMetricKey to map metric keys, ensuring any early-return Err(...) values are not individually instrumented inside DataEntry::new, serde_json::to_string error mapping, storage.set_with_ttl mapping, or key/TTL validation sites.
469-474:⚠️ Potential issue | 🟡 MinorDuplicate assertion, and the second one has no message.
Line 473 is a redundant copy of the preceding
matches!check and, unlike line 469, omits the descriptive message. Drop it — it adds no coverage and violates the repo rule that every assertion carry a message.🔧 Fix
let result = store.push("", json!("value"), None); assert!( matches!(result, Err(DataError::InvalidKey)), "push with empty key should return DataError::InvalidKey" ); - assert!(matches!(result, Err(DataError::InvalidKey))); }As per coding guidelines: "All assertions must include a descriptive message explaining what is being tested".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 469 - 474, Remove the duplicate assertion that repeats the matches!(result, Err(DataError::InvalidKey)) check without a message; keep the first assert! that includes the descriptive message ("push with empty key should return DataError::InvalidKey") so the test still verifies that result is Err(DataError::InvalidKey) while complying with the rule that all assertions include a message (reference the existing `result` value and `DataError::InvalidKey` in the retained assertion).jans-cedarling/cedarling/src/authz/mod.rs (1)
356-362:⚠️ Potential issue | 🟠 MajorMissing metrics recording on
execute_authorizefailure in the unsigned flow.In
authorize_multi_issuertheexecute_authorize(...)?call is wrapped (.map_err(AuthorizeError::RequestValidation).inspect_err(|e| { record_error(e); record_authz_error(); })?). Here inauthorize_unsignedthe equivalent call at lines 356-362 uses bare?and so theRequestValidationErrorfailure path is silently skipped by the metrics collector. Given the rest of the function records every other failure site, this looks unintentional.🐛 Proposed fix
- let response = self.execute_authorize(ExecuteAuthorizeParameters { - entities: &entities, - principal: principal_uid.clone(), - action: action.clone(), - resource: resource_uid.clone(), - context, - })?; + let response = self + .execute_authorize(ExecuteAuthorizeParameters { + entities: &entities, + principal: principal_uid.clone(), + action: action.clone(), + resource: resource_uid.clone(), + context, + }) + .inspect_err(|e| { + self.config.metrics.record_error(e); + self.config.metrics.record_authz_error(); + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/authz/mod.rs` around lines 356 - 362, The call to execute_authorize inside authorize_unsigned currently uses a bare ? so RequestValidation failures are not recorded; update the execute_authorize invocation in authorize_unsigned to mirror authorize_multi_issuer by mapping errors to AuthorizeError::RequestValidation and calling .inspect_err(|e| { record_error(e); record_authz_error(); }) before the final ? so that execute_authorize failures are captured by the metrics recorder (reference the execute_authorize(...) call and the authorize_unsigned function and reuse the same map_err/inspect_err wrapper and the record_error/record_authz_error helpers).jans-cedarling/cedarling/src/lock/transport/grpc.rs (1)
81-136: 🧹 Nitpick | 🔵 TrivialLGTM with one small polish opportunity.
The
AuditKindbranching, shareddeserialize_entriespath, and per-RPC metadata injection all look correct. The only minor nit: in theLogbranch youtoken.clone()for no reason — since the branches are mutually exclusive, you can movetokendirectly:♻️ Optional tweak
- let mut request = Request::new(BulkLogRequest { entries }); - request - .metadata_mut() - .insert("authorization", token.clone()); + let mut request = Request::new(BulkLogRequest { entries }); + request.metadata_mut().insert("authorization", token);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/lock/transport/grpc.rs` around lines 81 - 136, Remove the needless token.clone() in send(): use the token value directly when inserting the "authorization" metadata in the Log branch (replace request.metadata_mut().insert("authorization", token.clone()) with inserting token), keeping the Telemetry branch as-is (it already moves token into request.metadata_mut().insert("authorization", token)). This removes an unnecessary clone of the MetadataValue while still moving token into the chosen RPC; update only the code in the send function around the Log and Telemetry branches (references: send, token, client.process_bulk_log, client.process_bulk_telemetry).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@jans-cedarling/cedarling/src/authz/errors.rs`:
- Around line 165-169: The metric_key implementation on BuildContextError
collapses all variants into one bucket; change the ErrorMetricKey::metric_key
for BuildContextError to match on self (match self {
BuildContextError::KeyConflict(_) => "...",
BuildContextError::DeserializeFromJson(_) => "...",
BuildContextError::UnknownAction(_) => "...",
BuildContextError::MissingEntityId(_) => "...",
BuildContextError::InvalidKind(_) => "...",
BuildContextError::ParseEntityName(_) => "...",
BuildContextError::ContextCreation(_) => "...", }) and return a distinct
&'static str for each variant (for example "authz.context_build.key_conflict",
"authz.context_build.deserialize", "authz.context_build.unknown_action", etc.),
keeping a catch-all if needed; update the metric_key function only and preserve
the ErrorMetricKey trait and enum variants.
In `@jans-cedarling/cedarling/src/authz/metrics.rs`:
- Around line 1-153: Add unit tests for MetricsCollector and related helpers
covering positive and negative cases: create tests that exercise
MetricsCollector::record_evaluation to assert authz counters and per-policy
PolicyStats update, MetricsCollector::record_error to ensure error_counters
aggregate by metric_key, and snapshot_and_reset to verify counters are zeroed
while gauges remain (policy_count, uptime_secs computed from init_time,
last_eval_time_us preserved). Add tests for compute_percentiles with empty and
single-element vectors and for saturating_usize_to_i64 using usize::MAX to
verify saturation behavior. Place tests in this module's #[cfg(test)] mod tests
and reference the types/functions MetricsCollector, PolicyStats,
compute_percentiles, and saturating_usize_to_i64 to locate code.
- Line 313: The preallocation for the ops HashMap is too large: change the
HashMap::with_capacity(36) used when creating ops to a smaller, accurate size
(e.g., HashMap::with_capacity(22)) or remove the explicit capacity and use
HashMap::new() so it doesn't over-allocate; update the allocation where ops is
declared to match the actual number of inserted keys (the declaration using
HashMap::with_capacity(36) in metrics.rs).
- Line 98: The eval_times_us RwLock<Vec<i64>> is unbounded and can grow without
bound; change it to a bounded structure and update snapshot_and_reset to drain
it safely: replace Vec<i64> with a capped buffer (e.g., a VecDeque<i64> or a
small ring buffer) with a constant MAX_EVAL_SAMPLES and on insert into
eval_times_us push_back and pop_front when len==MAX; alternatively implement
reservoir sampling (replace a random slot when the cap is exceeded) or use an
online histogram type (e.g., hdrhistogram) instead of raw Vec; apply the same
bounded change to the other unbounded vectors referenced around
snapshot_and_reset (the fields noted at lines ~181-186) and ensure
snapshot_and_reset collects and clears the bounded buffer in one locked section
to avoid races.
- Line 98: record_evaluation is taking write locks on eval_times_us and
policy_stats on the hot path; change eval_times_us: RwLock<Vec<i64>> to a
concurrent, lock-free buffer (e.g., crossbeam::SegQueue<i64> or a thread_local
Vec drained at snapshot time) and modify record_evaluation to push the duration
into that queue instead of acquiring a write lock, then drain the queue in your
metrics snapshot/collect function to compute percentiles. For policy_stats, stop
mutating a shared Vec/HashMap under a hot write lock: use a concurrent map
(DashMap) or store Arc<AtomicU64> counters per policy id and only take a write
RwLock when inserting a new policy id (reads/increments on the hot path should
use the atomic increment or DashMap entry); update record_evaluation and the
snapshot/collect functions to read/aggregate from these lock-free structures.
- Around line 263-380: The snapshot_and_reset implementation performs per-atomic
swaps (insert_swap on authz_requests_total, authz_decision_allow/deny, etc.)
leading to non-atomic, inconsistent snapshots relative to concurrent
record_evaluation calls; fix by adding a global RwLock (e.g., metrics_rw:
RwLock<()>) to the Metrics struct, acquire a read lock in record_evaluation-like
writers and acquire a write lock at the start of snapshot_and_reset (around
interval_start, policy_stats, error_counters, eval_times_us and the
insert_swap/insert_load/insert_val sequence) so the entire snapshot is
point-in-time consistent, or alternatively add a short note in the docs near
snapshot_and_reset and the atomic fields (authz_requests_total,
authz_decision_allow/deny, token_cache.*, jwt.*) explaining the
eventual-consistency semantics if you choose not to change locking.
- Line 184: Multiple Mutex lock unwraps use inconsistent expect messages;
standardize them to "<field> lock should not be poisoned" for easier searching.
Replace the expect strings on the Mutex locks for the fields referenced (e.g.,
eval_times_us, policy_stats, eval_time and the other lock uses around lines
mentioned) so each call to .expect(...) uses the pattern "<field> lock should
not be poisoned" where <field> is the exact field/variable name (like
eval_times_us, policy_stats, eval_time) used in that expression; keep the rest
of the logic untouched.
- Around line 281-294: The snapshot key construction uses plain string suffixes
"{id}_allow" and "{id}_deny" which can collide with policy IDs that already end
with those suffixes; update the key naming in the block that builds snapshot
(the closure calling stats.snapshot() inside map.iter().flat_map and producing
the array of tuples) to use a delimiter that cannot appear in policy IDs (e.g.
"{id}.allow" and "{id}.deny" to match the authz.decision_allow convention) or,
better, stop flattening into string keys and instead return a nested structure
(e.g. a per-policy struct or tuple) so keys are not mixed; make the change where
snapshot is created and ensure downstream consumers that expect the previous
flat keys are updated to handle the new names or nested shape before calling
map.clear().
- Around line 400-414: The percentile index calculation in compute_percentiles
is using a redundant .min(len - 1) and doesn't implement the nearest‑rank method
described; update the idx closure to compute nearest‑rank indices using
something like ((len * pct + 99) / 100).saturating_sub(1) (and remove the .min),
ensure idx takes pct as usize and returns a valid index for times, and keep
returns times[idx(50)], times[idx(95)], times[idx(99)], times[len - 1]
unchanged; adjust any doc comment on compute_percentiles to reflect the
nearest‑rank method if present.
In `@jans-cedarling/cedarling/src/entity_builder/build_multi_issuer_entity.rs`:
- Around line 48-52: The metric_key implementation for MultiIssuerEntityError
currently collapses all error variants into "authz.entity_build"; update the
impl of MultiIssuerEntityError::metric_key to match the per-variant mapping
approach used by ValidateJwtError::metric_key by returning distinct &'static str
values for each enum variant (e.g., "authz.entity_build.missing_issuer",
"authz.entity_build.missing_exp", "authz.entity_build.invalid_entity_uid",
"authz.entity_build.no_valid_tokens",
"authz.entity_build.build_trusted_issuer_uid", "authz.entity_build.build_attrs",
"authz.entity_build.entity_creation_failed" or similar names). Locate the impl
block for crate::authz::metrics::ErrorMetricKey for MultiIssuerEntityError and
implement a match on self that returns a unique metric key string per variant so
operational counters preserve useful signal.
In `@jans-cedarling/cedarling/src/jwt/validation/trusted_issuer_validator.rs`:
- Around line 60-62: The match arm currently groups Self::EmptyEntityTypeName
with Self::MissingRequiredClaim under "jwt.missing_required_claim"; update the
mapping so EmptyEntityTypeName uses a distinct config-specific metric key (e.g.
"config.empty_entity_type_name") instead of the JWT claim key. Locate the match
on the error enum in trusted_issuer_validator.rs (the arm referencing
Self::MissingRequiredClaim and Self::EmptyEntityTypeName) and split them into
two arms: keep MissingRequiredClaim returning "jwt.missing_required_claim" and
make EmptyEntityTypeName return the new "config.empty_entity_type_name" metric
name.
In `@jans-cedarling/cedarling/src/lock/mod.rs`:
- Around line 325-333: The bootstrap uses the same
LockServiceConfig.log_channel_capacity for both audit and telemetry channels,
which couples unrelated capacities; update the config and usage so telemetry has
an explicit capacity: add a new LockServiceConfig.telemetry_channel_capacity (or
rename log_channel_capacity to make intent explicit) and change create_worker to
use telemetry_channel_capacity when constructing the telemetry mpsc channel
(keep audit channel using log_channel_capacity and update any config
construction/initialization sites to set the new field).
- Around line 243-267: The current match on (bootstrap_conf.telemetry_interval,
lock_config.audit_endpoints.telemetry) silently drops telemetry when only one
side is set; update the non-matching arm(s) to emit a warn-level log via logger
explaining which piece is missing (e.g., "telemetry_interval set but no
telemetry endpoint advertised" or "telemetry endpoint configured but no
telemetry_interval set") so operators can detect the misconfiguration; keep the
existing behavior of returning (None, None) for
telemetry_worker/telemetry_ticker when incomplete, and reference the symbols
telemetry_interval, lock_config.audit_endpoints.telemetry, telemetry_worker,
telemetry_ticker, create_worker, TelemetryTicker::spawn, and logger to locate
where to add the logger.warn calls.
- Around line 278-289: The shutdown races because shut_down cancels the shared
cancel_tkn then awaits log_worker → telemetry_worker → telemetry_ticker so the
telemetry worker can drop its Receiver before the ticker emits its final
snapshot; fix by introducing a dedicated cancellation token for the ticker
(e.g., telemetry_ticker_cancel) that the ticker observes instead of the shared
cancel_tkn, call telemetry_ticker_cancel.cancel() and await the telemetry_ticker
handle first (so emit_snapshot() can push into the worker buffer), then cancel
the shared cancel_tkn and await telemetry_worker and log_worker; ensure whoever
constructs the telemetry_ticker is updated to receive and use this new dedicated
token.
- Around line 396-405: The match in LockService::log_any uses a wildcard arm
that will silently ignore future LogType variants; update the match in the
log_any method (impl LogWriter for LockService) to be exhaustive by listing
explicit arms: Some(LogType::Decision) -> dispatch to self.log_worker,
Some(LogType::Metric) -> dispatch to self.telemetry_worker,
Some(LogType::System) -> do nothing (explicit empty action), and None -> do
nothing; remove the `_ => {}` wildcard so the compiler will flag any new LogType
variants.
In `@jans-cedarling/cedarling/src/lock/proto/audit.proto`:
- Around line 56-67: In message TelemetryEntry, add reserved declarations to
prevent re-use of removed fields: reserve the numeric tag 2 and the numeric
range 6 to 12 (e.g. reserved 2; reserved 6 to 12;), and reserve the removed
field name "event_time" (e.g. reserved "event_time";) — if you have the original
names for fields 6–12, add them as reserved string names as well; place these
reserved lines at the top of the TelemetryEntry message before other field
declarations.
In `@jans-cedarling/cedarling/src/lock/telemetry_ticker.rs`:
- Around line 39-53: The final snapshot can be dropped because
telemetry_ticker::run uses the same cancel token as the consumer worker; ensure
the ticker finishes before the telemetry worker by giving telemetry_ticker::run
its own CancellationToken (or a derived child token) and exposing a way to
cancel+await it from LockService::shut_down before canceling the AuditWorker;
specifically, add a private ticker_cancel token used only by
telemetry_ticker::run, replace the shared cancel_tkn usage in
telemetry_ticker::run/emit_snapshot, and in LockService::shut_down cancel and
.cancelled().await (or await the join handle) for that ticker token/handle
before cancelling the worker so dispatch_entry/telemetry_worker.tx.try_send
cannot race with the receiver being dropped.
- Around line 55-65: The code calls self.metrics.snapshot_and_reset()
unconditionally which discards metrics even when self.logger is absent; change
emit_snapshot to first check whether a logger is available (e.g. try to upgrade
the Weak or inspect self.logger) and return early if there is no consumer, or
alternatively obtain a snapshot without resetting and only call reset after a
successful self.logger.log_any(...) dispatch; specifically modify emit_snapshot
to guard the call to snapshot_and_reset (or split into snapshot + conditional
reset) so MetricsLogEntry creation and logger.log_any(...) only proceed when the
logger can be upgraded/exists (use the same MetricsLogEntry, emit_snapshot,
snapshot_and_reset, and logger.log_any identifiers to locate the code).
- Line 40: The interval created in telemetry_ticker (the let mut ticker =
interval(self.interval) in telemetry_ticker.rs) should use
MissedTickBehavior::Delay to avoid bursty catch-up ticks after stalls; update
the code to construct the interval and then call
set_missed_tick_behavior(MissedTickBehavior::Delay) on the ticker, and add the
needed import (tokio::time::MissedTickBehavior) so the ticker produces delayed
ticks instead of bursty ones.
In `@jans-cedarling/cedarling/src/lock/transport/mod.rs`:
- Around line 65-95: In deserialize_entries, handle the empty-input case by
returning Ok(Vec::new()) immediately when entries.is_empty() instead of treating
it as all-malformed; also stop collapsing JSON parse failures and conversion
failures into one silent skipped count by distinguishing
serde_json::from_str::<S>(...) errors from T::try_from(...) errors — when
iterating entries, attempt from_str and on Err call log_warn with a descriptive
message (include the entry index/label and parse error), otherwise attempt
T::try_from and on Err call log_warn with a distinct message for conversion
failure; finally compute parsed from successful conversion results and only
return Err(TransportError::Serialization(...)) when parsed.is_empty() after
these separate logs.
In `@jans-cedarling/cedarling/src/lock/transport/rest.rs`:
- Around line 42-83: The send method duplicates the post/send/error_for_status
sequence in both AuditKind::Log and AuditKind::Telemetry; refactor by extracting
the HTTP POST into a small helper/closure inside send that takes a &str url and
a serde-serializable entries value and performs
self.client.post(url).json(&entries).send().await?.error_for_status()?; keep the
type-specific deserialize_entries calls for LockServerLogEntry/CedarlingLogEntry
and LockServerMetricsEntry/CedarlingMetricsEntry, then pass the resulting
entries and url into the helper to eliminate duplication while preserving the
existing generics and error handling.
---
Outside diff comments:
In `@jans-cedarling/cedarling/src/authz/mod.rs`:
- Around line 356-362: The call to execute_authorize inside authorize_unsigned
currently uses a bare ? so RequestValidation failures are not recorded; update
the execute_authorize invocation in authorize_unsigned to mirror
authorize_multi_issuer by mapping errors to AuthorizeError::RequestValidation
and calling .inspect_err(|e| { record_error(e); record_authz_error(); }) before
the final ? so that execute_authorize failures are captured by the metrics
recorder (reference the execute_authorize(...) call and the authorize_unsigned
function and reuse the same map_err/inspect_err wrapper and the
record_error/record_authz_error helpers).
In `@jans-cedarling/cedarling/src/context_data_api/store.rs`:
- Around line 107-194: The push method duplicates self.metrics.record_error
calls in multiple places; change push to return errors without recording and
centralize the metric recording once by wrapping the existing logic in a short
closure or Result variable inside push (e.g. let result = (|| -> Result<(),
DataError> { ... })();) and then call result.inspect_err(|e|
self.metrics.record_error(e))?; alternatively remove all per-site
self.metrics.record_error calls and record once where push is called (e.g. in
the impl DataApi for Cedarling / Cedarling::push_data_ctx) using
DataError::ErrorMetricKey to map metric keys, ensuring any early-return Err(...)
values are not individually instrumented inside DataEntry::new,
serde_json::to_string error mapping, storage.set_with_ttl mapping, or key/TTL
validation sites.
- Around line 469-474: Remove the duplicate assertion that repeats the
matches!(result, Err(DataError::InvalidKey)) check without a message; keep the
first assert! that includes the descriptive message ("push with empty key should
return DataError::InvalidKey") so the test still verifies that result is
Err(DataError::InvalidKey) while complying with the rule that all assertions
include a message (reference the existing `result` value and
`DataError::InvalidKey` in the retained assertion).
In `@jans-cedarling/cedarling/src/lock/lock_config.rs`:
- Around line 170-202: The current test should_deserialize_lock_config doesn't
cover the corrected telemetry JSON key; extend this test (or add a new async
tokio::test) to include "telemetry_endpoint": "https://test.com/audit/telemetry"
in the src JSON and assert that deserialized.audit_endpoints.telemetry is
Some("https://test.com/audit/telemetry/bulk".parse().unwrap()), similarly to how
log is asserted; reference the LockConfig and AuditEndpoints types and their
telemetry field to ensure the deserialization maps the telemetry_endpoint key to
AuditEndpoints::telemetry with the automatic "/bulk" suffix.
In `@jans-cedarling/cedarling/src/lock/transport/grpc.rs`:
- Around line 81-136: Remove the needless token.clone() in send(): use the token
value directly when inserting the "authorization" metadata in the Log branch
(replace request.metadata_mut().insert("authorization", token.clone()) with
inserting token), keeping the Telemetry branch as-is (it already moves token
into request.metadata_mut().insert("authorization", token)). This removes an
unnecessary clone of the MetadataValue while still moving token into the chosen
RPC; update only the code in the send function around the Log and Telemetry
branches (references: send, token, client.process_bulk_log,
client.process_bulk_telemetry).
In `@jans-cedarling/cedarling/src/tests/authorize_unsigned.rs`:
- Around line 261-284: The test currently allows "Metric" values for log_kind
but doesn't assert any Metric-specific entries; update the test that iterates
over logs (using the log and log_kind variables in authorize_unsigned.rs) to
explicitly assert that at least one log has log_kind == "Metric" and that its
payload contains expected keys (e.g., "policy_stats" and/or "error_counters");
keep the existing Decision assertions for when log_kind == "Decision", and add a
branch for log_kind == "Metric" that fetches the metric payload from log (e.g.,
log.get("payload") or similar) and asserts the presence and basic shape of the
metric fields so regressions that drop metric emission will fail the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 664213e1-b45b-4d54-8ebe-d9d59887239c
📒 Files selected for processing (25)
jans-cedarling/cedarling/src/authz/errors.rsjans-cedarling/cedarling/src/authz/metrics.rsjans-cedarling/cedarling/src/authz/mod.rsjans-cedarling/cedarling/src/context_data_api/error.rsjans-cedarling/cedarling/src/context_data_api/store.rsjans-cedarling/cedarling/src/entity_builder/build_multi_issuer_entity.rsjans-cedarling/cedarling/src/entity_builder/error.rsjans-cedarling/cedarling/src/init/service_factory.rsjans-cedarling/cedarling/src/jwt/mod.rsjans-cedarling/cedarling/src/jwt/token_cache.rsjans-cedarling/cedarling/src/jwt/validation/trusted_issuer_validator.rsjans-cedarling/cedarling/src/jwt/validation/validator.rsjans-cedarling/cedarling/src/lib.rsjans-cedarling/cedarling/src/lock/lock_config.rsjans-cedarling/cedarling/src/lock/log_worker.rsjans-cedarling/cedarling/src/lock/mod.rsjans-cedarling/cedarling/src/lock/proto/audit.protojans-cedarling/cedarling/src/lock/telemetry_ticker.rsjans-cedarling/cedarling/src/lock/transport/grpc.rsjans-cedarling/cedarling/src/lock/transport/mapping.rsjans-cedarling/cedarling/src/lock/transport/mod.rsjans-cedarling/cedarling/src/lock/transport/rest.rsjans-cedarling/cedarling/src/log/log_entry.rsjans-cedarling/cedarling/src/log/mod.rsjans-cedarling/cedarling/src/tests/authorize_unsigned.rs
| authz_decision_deny: AtomicI64, | ||
| authz_errors_total: AtomicI64, | ||
|
|
||
| eval_times_us: RwLock<Vec<i64>>, |
There was a problem hiding this comment.
eval_times_us is unbounded between snapshots.
The buffer grows by one i64 per authz request until snapshot_and_reset drains it. With a large CEDARLING_AUDIT_TELEMETRY_INTERVAL and/or high request rates this is a memory‑growth risk (and a potential DoS vector for adversarial load). Consider a bounded ring buffer or reservoir sampling for percentile estimation (e.g. cap the vec length and drop/sample once full), or switch to an online sketch such as t-digest/HDR histogram.
Also applies to: 181-186
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/authz/metrics.rs` at line 98, The eval_times_us
RwLock<Vec<i64>> is unbounded and can grow without bound; change it to a bounded
structure and update snapshot_and_reset to drain it safely: replace Vec<i64>
with a capped buffer (e.g., a VecDeque<i64> or a small ring buffer) with a
constant MAX_EVAL_SAMPLES and on insert into eval_times_us push_back and
pop_front when len==MAX; alternatively implement reservoir sampling (replace a
random slot when the cap is exceeded) or use an online histogram type (e.g.,
hdrhistogram) instead of raw Vec; apply the same bounded change to the other
unbounded vectors referenced around snapshot_and_reset (the fields noted at
lines ~181-186) and ensure snapshot_and_reset collects and clears the bounded
buffer in one locked section to avoid races.
Hot-path write-lock contention on every authorization call.
record_evaluation takes write locks on both eval_times_us and policy_stats for every authz request. Under concurrency this serializes authorizations at the collector and negates the benefit of the atomics above. Consider a lock-free queue / crossbeam::SegQueue, thread-local buffers drained at snapshot time, or per-policy atomics behind a rarely-written RwLock initialized once policies are known (read lock on the hot path, write only when a new policy ID is observed).
Also applies to: 180-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/authz/metrics.rs` at line 98, record_evaluation
is taking write locks on eval_times_us and policy_stats on the hot path; change
eval_times_us: RwLock<Vec<i64>> to a concurrent, lock-free buffer (e.g.,
crossbeam::SegQueue<i64> or a thread_local Vec drained at snapshot time) and
modify record_evaluation to push the duration into that queue instead of
acquiring a write lock, then drain the queue in your metrics snapshot/collect
function to compute percentiles. For policy_stats, stop mutating a shared
Vec/HashMap under a hot write lock: use a concurrent map (DashMap) or store
Arc<AtomicU64> counters per policy id and only take a write RwLock when
inserting a new policy id (reads/increments on the hot path should use the
atomic increment or DashMap entry); update record_evaluation and the
snapshot/collect functions to read/aggregate from these lock-free structures.
| pub(crate) fn snapshot_and_reset(&self) -> MetricsSnapshot { | ||
| // Compute interval duration and reset start time | ||
| let now = Utc::now(); | ||
| let interval_secs = { | ||
| let mut start = self | ||
| .interval_start | ||
| .lock() | ||
| .expect("interval_start lock should not be poisoned"); | ||
| let duration = now.signed_duration_since(*start); | ||
| *start = now; | ||
| duration.num_seconds() | ||
| }; | ||
|
|
||
| let policy_stats = { | ||
| let mut map = self | ||
| .policy_stats | ||
| .write() | ||
| .expect("policy_stats lock should not be poisoned"); | ||
| let snapshot = map | ||
| .iter() | ||
| .flat_map(|(id, stats)| { | ||
| let snap = stats.snapshot(); | ||
| [ | ||
| (id.clone(), snap.evaluations), | ||
| (format!("{id}_allow"), snap.allow_count), | ||
| (format!("{id}_deny"), snap.deny_count), | ||
| ] | ||
| }) | ||
| .collect(); | ||
| map.clear(); | ||
| snapshot | ||
| }; | ||
|
|
||
| // Take and reset error_counters | ||
| let error_counters = { | ||
| let mut counters = self | ||
| .error_counters | ||
| .write() | ||
| .expect("error_counters lock should not be poisoned"); | ||
| std::mem::take(&mut *counters) | ||
| }; | ||
|
|
||
| // Compute percentiles from eval times, then clear buffer | ||
| let times_snapshot = { | ||
| let mut times = self.eval_times_us.write().expect("eval_time lock poisoned"); | ||
| std::mem::take(&mut *times) | ||
| }; | ||
| let (p50, p95, p99, max_time) = compute_percentiles(times_snapshot); | ||
|
|
||
| // Build operational_stats map from atomic fields | ||
| let mut ops = HashMap::with_capacity(36); | ||
|
|
||
| insert_swap(&mut ops, "authz.requests_total", &self.authz_requests_total); | ||
| insert_swap( | ||
| &mut ops, | ||
| "authz.requests_unsigned", | ||
| &self.authz_requests_unsigned, | ||
| ); | ||
| insert_swap( | ||
| &mut ops, | ||
| "authz.requests_multi_issuer", | ||
| &self.authz_requests_multi_issuer, | ||
| ); | ||
| insert_swap(&mut ops, "authz.decision_allow", &self.authz_decision_allow); | ||
| insert_swap(&mut ops, "authz.decision_deny", &self.authz_decision_deny); | ||
| insert_swap(&mut ops, "authz.errors_total", &self.authz_errors_total); | ||
|
|
||
| // Authorization latency (gauges, swap) | ||
| insert_load(&mut ops, "authz.last_eval_time_us", &self.last_eval_time_us); | ||
| insert_val(&mut ops, "authz.eval_time_p50_us", p50); | ||
| insert_val(&mut ops, "authz.eval_time_p95_us", p95); | ||
| insert_val(&mut ops, "authz.eval_time_p99_us", p99); | ||
| insert_val(&mut ops, "authz.eval_time_max_us", max_time); | ||
|
|
||
| // Token cache (counters swap, gauge load) | ||
| insert_swap(&mut ops, "token_cache.hits", &self.token_cache_hits); | ||
| insert_swap(&mut ops, "token_cache.misses", &self.token_cache_misses); | ||
| insert_swap( | ||
| &mut ops, | ||
| "token_cache.evictions", | ||
| &self.token_cache_evictions, | ||
| ); | ||
|
|
||
| // JWT validation (counters swap) | ||
| insert_swap( | ||
| &mut ops, | ||
| "jwt.validations_total", | ||
| &self.jwt_validations_total, | ||
| ); | ||
| insert_swap( | ||
| &mut ops, | ||
| "jwt.validations_success", | ||
| &self.jwt_validations_success, | ||
| ); | ||
| insert_swap( | ||
| &mut ops, | ||
| "jwt.validations_failed", | ||
| &self.jwt_validations_failed, | ||
| ); | ||
|
|
||
| insert_swap(&mut ops, "data.push_ops", &self.data_push_ops); | ||
| insert_swap(&mut ops, "data.get_ops", &self.data_get_ops); | ||
| insert_swap(&mut ops, "data.remove_ops", &self.data_remove_ops); | ||
|
|
||
| insert_val( | ||
| &mut ops, | ||
| "instance.uptime_secs", | ||
| now.signed_duration_since(self.init_time).num_seconds(), | ||
| ); | ||
| insert_load(&mut ops, "instance.policy_count", &self.policy_count); | ||
|
|
||
| MetricsSnapshot { | ||
| policy_stats, | ||
| error_counters, | ||
| operational_stats: ops, | ||
| interval_secs, | ||
| } | ||
| } |
There was a problem hiding this comment.
Snapshot is not point-in-time consistent across counters.
Atomics are swapped one at a time, so a concurrent record_evaluation between e.g. swap(authz_requests_total) and swap(authz_decision_allow) can cause requests_total != unsigned + multi_issuer and requests_total != allow + deny + errors within the same snapshot. This is usually acceptable for telemetry but is worth documenting so downstream dashboards/alerts don't treat these as hard invariants — or if strict consistency is desired, guard the whole record/snapshot with a RwLock (readers for record, writer for snapshot).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/authz/metrics.rs` around lines 263 - 380, The
snapshot_and_reset implementation performs per-atomic swaps (insert_swap on
authz_requests_total, authz_decision_allow/deny, etc.) leading to non-atomic,
inconsistent snapshots relative to concurrent record_evaluation calls; fix by
adding a global RwLock (e.g., metrics_rw: RwLock<()>) to the Metrics struct,
acquire a read lock in record_evaluation-like writers and acquire a write lock
at the start of snapshot_and_reset (around interval_start, policy_stats,
error_counters, eval_times_us and the insert_swap/insert_load/insert_val
sequence) so the entire snapshot is point-in-time consistent, or alternatively
add a short note in the docs near snapshot_and_reset and the atomic fields
(authz_requests_total, authz_decision_allow/deny, token_cache.*, jwt.*)
explaining the eventual-consistency semantics if you choose not to change
locking.
| } | ||
|
|
||
| pub(crate) async fn run(self, cancel_tkn: CancellationToken) { | ||
| let mut ticker = interval(self.interval); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider MissedTickBehavior::Delay for the interval.
tokio::time::interval defaults to MissedTickBehavior::Burst, so if the task is ever stalled (e.g., long GC pause, WASM scheduler hiccup, backpressure behind log_any), multiple catch-up ticks will fire back-to-back and produce near-zero-interval snapshots. For a metrics aggregator this distorts rates and interval_secs. Delay (or Skip) yields more meaningful telemetry intervals.
♻️ Proposed change
pub(crate) async fn run(self, cancel_tkn: CancellationToken) {
let mut ticker = interval(self.interval);
+ ticker.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay);
ticker.tick().await;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let mut ticker = interval(self.interval); | |
| let mut ticker = interval(self.interval); | |
| ticker.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/lock/telemetry_ticker.rs` at line 40, The
interval created in telemetry_ticker (the let mut ticker =
interval(self.interval) in telemetry_ticker.rs) should use
MissedTickBehavior::Delay to avoid bursty catch-up ticks after stalls; update
the code to construct the interval and then call
set_missed_tick_behavior(MissedTickBehavior::Delay) on the ticker, and add the
needed import (tokio::time::MissedTickBehavior) so the ticker produces delayed
ticks instead of bursty ones.
| fn emit_snapshot(&self) { | ||
| let snapshot = self.metrics.snapshot_and_reset(); | ||
| let entry = MetricsLogEntry { | ||
| base: BaseLogEntry::new_metric_opt_request_id(None), | ||
| policy_stats: snapshot.policy_stats, | ||
| error_counters: snapshot.error_counters, | ||
| operational_stats: snapshot.operational_stats, | ||
| interval_secs: snapshot.interval_secs, | ||
| }; | ||
| self.logger.log_any(entry); | ||
| } |
There was a problem hiding this comment.
snapshot_and_reset() is called unconditionally — metrics are silently discarded when the logger is absent.
If self.logger is None (or the Weak fails to upgrade downstream), log_any is a no-op but snapshot_and_reset() has already zeroed the collector. Consider short-circuiting when no logger is present, or at minimum gate the reset on a successful dispatch, so that misconfiguration does not cause silent data loss between intervals.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/lock/telemetry_ticker.rs` around lines 55 - 65,
The code calls self.metrics.snapshot_and_reset() unconditionally which discards
metrics even when self.logger is absent; change emit_snapshot to first check
whether a logger is available (e.g. try to upgrade the Weak or inspect
self.logger) and return early if there is no consumer, or alternatively obtain a
snapshot without resetting and only call reset after a successful
self.logger.log_any(...) dispatch; specifically modify emit_snapshot to guard
the call to snapshot_and_reset (or split into snapshot + conditional reset) so
MetricsLogEntry creation and logger.log_any(...) only proceed when the logger
can be upgraded/exists (use the same MetricsLogEntry, emit_snapshot,
snapshot_and_reset, and logger.log_any identifiers to locate the code).
| pub(super) fn deserialize_entries<T, S>( | ||
| entries: &[SerializedAuditEntry], | ||
| label: &str, | ||
| log_warn: impl Fn(String), | ||
| ) -> Result<Vec<T>, TransportError> | ||
| where | ||
| T: TryFrom<S>, | ||
| S: serde::de::DeserializeOwned, | ||
| { | ||
| let parsed: Vec<T> = entries | ||
| .iter() | ||
| .filter_map(|v| { | ||
| serde_json::from_str::<S>(v) | ||
| .ok() | ||
| .and_then(|s| T::try_from(s).ok()) | ||
| }) | ||
| .collect(); | ||
|
|
||
| let skipped = entries.len() - parsed.len(); | ||
| if skipped > 0 { | ||
| log_warn(format!("skipped {skipped} malformed {label} entries")); | ||
| } | ||
|
|
||
| if parsed.is_empty() { | ||
| return Err(TransportError::Serialization(format!( | ||
| "all {skipped} {label} entries were malformed, nothing to send" | ||
| ))); | ||
| } | ||
|
|
||
| Ok(parsed) | ||
| } |
There was a problem hiding this comment.
Misleading error when called with an empty input; silent conflation of deserialization vs. conversion failures.
Two concerns in deserialize_entries:
- If
entries.is_empty(),skipped == 0andparsed.is_empty()both hold, producing the error"all 0 {label} entries were malformed, nothing to send". Today both REST/gRPC transports guard with anis_empty()check upstream, but the helper is public-to-the-module and easy to misuse. ReturnOk(Vec::new())on empty input, or assert the precondition. .ok().and_then(|s| T::try_from(s).ok())silently collapses two distinct failure modes (malformed JSON vs. schema/conversion failure) into the same "skipped" bucket. The warning can't distinguish them, which hurts operator diagnostics and masks breaking changes in the lock-server schema. Consider logging the individual error (debug!/warn!) per skipped entry, or at least counting them separately.
♻️ Proposed tweak for (1)
T: TryFrom<S>,
S: serde::de::DeserializeOwned,
{
+ if entries.is_empty() {
+ return Ok(Vec::new());
+ }
let parsed: Vec<T> = entries🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/lock/transport/mod.rs` around lines 65 - 95, In
deserialize_entries, handle the empty-input case by returning Ok(Vec::new())
immediately when entries.is_empty() instead of treating it as all-malformed;
also stop collapsing JSON parse failures and conversion failures into one silent
skipped count by distinguishing serde_json::from_str::<S>(...) errors from
T::try_from(...) errors — when iterating entries, attempt from_str and on Err
call log_warn with a descriptive message (include the entry index/label and
parse error), otherwise attempt T::try_from and on Err call log_warn with a
distinct message for conversion failure; finally compute parsed from successful
conversion results and only return Err(TransportError::Serialization(...)) when
parsed.is_empty() after these separate logs.
haileyesus2433
left a comment
There was a problem hiding this comment.
LGTM, but make sure to fix the failing ci
Signed-off-by: dagregi <dagmawi.m@proton.me>
…icker Signed-off-by: dagregi <dagmawi.m@proton.me>
Signed-off-by: dagregi <dagmawi.m@proton.me>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
jans-cedarling/cedarling/src/lock/mod.rs (1)
275-287:⚠️ Potential issue | 🟠 MajorShutdown ordering still races the final telemetry snapshot.
The separate ticker cancellation token was added, but
self.cancel_tkn.cancel()on line 276 is still invoked before the ticker is cancelled and awaited. Sinceself.cancel_tknis the token observed by bothlog_workerandtelemetry_worker, both workers start their shutdown/drain concurrently with the ticker on lines 277‑280. If thetelemetry_workerexits and drops itsReceiverbefore the ticker's cancel branch pushes the finalsnapshot_and_reset()entry throughlog_any → dispatch_entry → try_send, that last interval's metrics are dropped on a closed channel.Cancel and await the ticker first, then cancel
self.cancel_tknand await the workers, so the ticker's final snapshot is guaranteed to be enqueued before the telemetry worker stops reading.🔒️ Proposed fix
pub(crate) async fn shut_down(&mut self) { - self.cancel_tkn.cancel(); - if let Some((cancel_tkn, handle)) = self.telemetry_ticker.take() { - cancel_tkn.cancel(); - () = handle.await_result().await; - } - if let Some(worker) = self.log_worker.take() { - () = worker.handle.await_result().await; - } - if let Some(worker) = self.telemetry_worker.take() { - () = worker.handle.await_result().await; - } + // Stop the ticker first so its final snapshot reaches the telemetry worker + // before the worker observes cancellation and exits. + if let Some((cancel_tkn, handle)) = self.telemetry_ticker.take() { + cancel_tkn.cancel(); + () = handle.await_result().await; + } + self.cancel_tkn.cancel(); + if let Some(worker) = self.log_worker.take() { + () = worker.handle.await_result().await; + } + if let Some(worker) = self.telemetry_worker.take() { + () = worker.handle.await_result().await; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/lock/mod.rs` around lines 275 - 287, The shutdown ordering in shut_down currently cancels self.cancel_tkn before cancelling and awaiting the telemetry_ticker, which allows telemetry_worker to exit and drop its Receiver before the ticker can enqueue the final snapshot via snapshot_and_reset → log_any → dispatch_entry → try_send; modify shut_down to first take, cancel, and await the telemetry_ticker (the (cancel_tkn, handle) from telemetry_ticker.take()), then call self.cancel_tkn.cancel(), and only after that await log_worker.handle.await_result() and telemetry_worker.handle.await_result() so the ticker's final snapshot is guaranteed to be sent before workers stop reading.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@jans-cedarling/cedarling/src/authz/metrics.rs`:
- Around line 617-622: The test saturating_usize_to_i64_normal_value uses bare
assert_eq! macros; update the three assertions to include descriptive messages
describing the expectation for saturating_usize_to_i64: e.g., that 0 converts to
0, that a small value like 100 converts unchanged, and that usize::MAX saturates
to i64::MAX. Edit the assertions in the saturating_usize_to_i64_normal_value
function to pass a final &str message to each assert_eq! reflecting those
expectations.
In `@jans-cedarling/cedarling/src/context_data_api/entry.rs`:
- Line 348: The project uses Duration::from_mins()/from_hours() (e.g.,
StdDuration::from_mins(1) in entry.rs and multiple occurrences in
benches/context_data_store_benchmark.rs, src/context_data_api/config.rs, and
src/tests/context_data_api.rs) which require Rust 1.91.0+. Fix by either
updating rust-toolchain.toml channel to "1.91.0" or higher, or replace each
Duration::from_mins(x) with Duration::from_secs(x * 60) and
Duration::from_hours(x) with Duration::from_secs(x * 3600) across the mentioned
files (search for StdDuration::from_mins and StdDuration::from_hours to locate
all occurrences).
In `@jans-cedarling/cedarling/src/context_data_api/store.rs`:
- Around line 200-206: The get telemetry is only recorded in DataStore::get but
not in DataStore::get_entry, so reads via get_entry (used by
DataApi::get_data_entry_ctx) are not counted; update DataStore::get_entry to
call self.metrics.record_data_get() when it returns Some(...) (and remove the
duplicate recording from DataStore::get to avoid double-counting) or
alternatively record in both but ensure no duplicates; locate the functions
named get_entry and get in store.rs and add the metrics.record_data_get() call
in the successful-lookup path of get_entry (or mirror it consistently in both
paths).
---
Duplicate comments:
In `@jans-cedarling/cedarling/src/lock/mod.rs`:
- Around line 275-287: The shutdown ordering in shut_down currently cancels
self.cancel_tkn before cancelling and awaiting the telemetry_ticker, which
allows telemetry_worker to exit and drop its Receiver before the ticker can
enqueue the final snapshot via snapshot_and_reset → log_any → dispatch_entry →
try_send; modify shut_down to first take, cancel, and await the telemetry_ticker
(the (cancel_tkn, handle) from telemetry_ticker.take()), then call
self.cancel_tkn.cancel(), and only after that await
log_worker.handle.await_result() and telemetry_worker.handle.await_result() so
the ticker's final snapshot is guaranteed to be sent before workers stop
reading.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a11794ec-798e-473c-8095-62d74d2788ae
📒 Files selected for processing (11)
jans-cedarling/cedarling/benches/context_data_store_benchmark.rsjans-cedarling/cedarling/src/authz/metrics.rsjans-cedarling/cedarling/src/common/policy_store/vfs_adapter.rsjans-cedarling/cedarling/src/context_data_api/config.rsjans-cedarling/cedarling/src/context_data_api/entry.rsjans-cedarling/cedarling/src/context_data_api/store.rsjans-cedarling/cedarling/src/context_data_api/validation.rsjans-cedarling/cedarling/src/lib.rsjans-cedarling/cedarling/src/lock/mod.rsjans-cedarling/cedarling/src/lock/telemetry_ticker.rsjans-cedarling/cedarling/src/tests/context_data_api.rs
| #[test] | ||
| fn saturating_usize_to_i64_normal_value() { | ||
| assert_eq!(saturating_usize_to_i64(0), 0); | ||
| assert_eq!(saturating_usize_to_i64(100), 100); | ||
| assert_eq!(saturating_usize_to_i64(usize::MAX), i64::MAX); | ||
| } |
There was a problem hiding this comment.
Add descriptive assertion messages to saturating_usize_to_i64_normal_value.
The three assert_eq! calls here are bare. As per coding guidelines, "All assertions must include a descriptive message explaining what is being tested".
✏️ Proposed fix
#[test]
fn saturating_usize_to_i64_normal_value() {
- assert_eq!(saturating_usize_to_i64(0), 0);
- assert_eq!(saturating_usize_to_i64(100), 100);
- assert_eq!(saturating_usize_to_i64(usize::MAX), i64::MAX);
+ assert_eq!(saturating_usize_to_i64(0), 0, "zero must map to 0");
+ assert_eq!(saturating_usize_to_i64(100), 100, "small values must pass through unchanged");
+ assert_eq!(
+ saturating_usize_to_i64(usize::MAX),
+ i64::MAX,
+ "usize::MAX must saturate to i64::MAX",
+ );
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/authz/metrics.rs` around lines 617 - 622, The
test saturating_usize_to_i64_normal_value uses bare assert_eq! macros; update
the three assertions to include descriptive messages describing the expectation
for saturating_usize_to_i64: e.g., that 0 converts to 0, that a small value like
100 converts unchanged, and that usize::MAX saturates to i64::MAX. Edit the
assertions in the saturating_usize_to_i64_normal_value function to pass a final
&str message to each assert_eq! reflecting those expectations.
| "key1".to_string(), | ||
| json!("value1"), | ||
| Some(StdDuration::from_secs(60)), | ||
| Some(StdDuration::from_mins(1)), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check declared MSRV for jans-cedarling
fd -t f -e toml | xargs rg -nP --no-messages -C1 '(rust-version|channel|toolchain)\s*='
fd -a 'rust-toolchain*' 2>/dev/null | xargs -I{} cat {}
# Ensure no other call sites still use seconds-based equivalents we might have missed
rg -nP '\bDuration::from_(mins|hours|days|weeks)\s*\(' --type=rustRepository: JanssenProject/jans
Length of output: 3864
🌐 Web query:
Duration::from_mins from_hours Rust 1.91.0 stabilization duration_constructors_lite
💡 Result:
Duration::from_mins and Duration::from_hours were stabilized in Rust 1.91.0 via the duration_constructors_lite feature. They are now stable methods on std::time::Duration. These constructors create a Duration from the specified number of minutes or hours, respectively, panicking if the value overflows Duration's representable range. Example usage: use std::time::Duration; let ten_minutes = Duration::from_mins(10); assert_eq!(600, ten_minutes.as_secs); let six_hours = Duration::from_hours(6); assert_eq!(21_600, six_hours.as_secs); Note: Duration::from_mins was marked stable since 1.53.0 in some docs, but the full duration_constructors_lite stabilization including both methods occurred in 1.91.0. Larger constructors like from_days and from_weeks remain unstable behind duration_constructors due to semantic concerns around leap seconds and calendar days.
Citations:
- 1: https://doc.rust-lang.org/beta/std/time/struct.Duration.html
- 2: Tracking Issue for duration_constructors_lite rust-lang/rust#140881
- 3: https://doc.rust-lang.org/1.91.0/std/time/struct.Duration.html
- 4: https://blog.rust-lang.org/2025/10/30/Rust-1.91.0/
- 5: https://doc.rust-lang.org/beta/src/core/time.rs.html
- 6: https://doc.rust-lang.org/stable/std/time/struct.Duration.html
- 7: https://doc.rust-lang.org/std/time/struct.Duration.html
🏁 Script executed:
# Check if there's an explicit MSRV in Cargo.toml or other configuration
fd -t f -e toml | xargs rg -A2 'package\s*\{' | head -40
# Check jans-cedarling/Cargo.toml specifically
cat jans-cedarling/cedarling/Cargo.toml | head -30
# Verify rust-toolchain.toml is the source of truth
cat jans-cedarling/rust-toolchain.tomlRepository: JanssenProject/jans
Length of output: 877
Update Rust toolchain to 1.91.0 or later, or replace Duration::from_mins() and Duration::from_hours() calls with Duration::from_secs() equivalents.
Duration::from_mins() and Duration::from_hours() were stabilized in Rust 1.91.0. The project declares channel = "1.90.0" in rust-toolchain.toml, which is earlier than the required version. The following files will fail to compile with the declared toolchain:
benches/context_data_store_benchmark.rs(8 occurrences)src/context_data_api/config.rs(6 occurrences)src/context_data_api/entry.rs(line 348)src/tests/context_data_api.rs(13 occurrences)
Either update the toolchain to 1.91.0+ or replace these calls (e.g., Duration::from_mins(1) → Duration::from_secs(60)).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/context_data_api/entry.rs` at line 348, The
project uses Duration::from_mins()/from_hours() (e.g., StdDuration::from_mins(1)
in entry.rs and multiple occurrences in benches/context_data_store_benchmark.rs,
src/context_data_api/config.rs, and src/tests/context_data_api.rs) which require
Rust 1.91.0+. Fix by either updating rust-toolchain.toml channel to "1.91.0" or
higher, or replace each Duration::from_mins(x) with Duration::from_secs(x * 60)
and Duration::from_hours(x) with Duration::from_secs(x * 3600) across the
mentioned files (search for StdDuration::from_mins and StdDuration::from_hours
to locate all occurrences).
| pub(crate) fn get(&self, key: &str) -> Option<Value> { | ||
| self.get_entry(key).map(|entry| entry.value) | ||
| let result = self.get_entry(key).map(|entry| entry.value); | ||
| if result.is_some() { | ||
| self.metrics.record_data_get(); | ||
| } | ||
| result | ||
| } |
There was a problem hiding this comment.
Telemetry gap: get_entry() does not record record_data_get().
DataStore::get() records record_data_get() on Some, but DataStore::get_entry() does not. Since DataApi::get_data_entry_ctx in lib.rs delegates to self.data.get_entry(key), reads that go through the metadata API are silently excluded from the data_get_ops counter, producing inconsistent get-operation telemetry depending on which public method the caller uses.
Move the metric into get_entry (or record it in both paths) so the operational counter reflects all successful reads.
🔧 Suggested placement
pub(crate) fn get_entry(&self, key: &str) -> Option<DataEntry> {
if self.config.enable_metrics {
// ... existing write-lock path ...
+ self.metrics.record_data_get();
Some(entry)
} else {
// ... existing read-lock path ...
+ self.metrics.record_data_get();
Some(entry)
}
}
pub(crate) fn get(&self, key: &str) -> Option<Value> {
- let result = self.get_entry(key).map(|entry| entry.value);
- if result.is_some() {
- self.metrics.record_data_get();
- }
- result
+ self.get_entry(key).map(|entry| entry.value)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 200 -
206, The get telemetry is only recorded in DataStore::get but not in
DataStore::get_entry, so reads via get_entry (used by
DataApi::get_data_entry_ctx) are not counted; update DataStore::get_entry to
call self.metrics.record_data_get() when it returns Some(...) (and remove the
duplicate recording from DataStore::get to avoid double-counting) or
alternatively record in both but ensure no duplicates; locate the functions
named get_entry and get in store.rs and add the metrics.record_data_get() call
in the successful-lookup path of get_entry (or mirror it consistently in both
paths).
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
jans-cedarling/cedarling/examples/lock_integration.rs (1)
21-22:⚠️ Potential issue | 🟡 Minor
SSA_JWTconstant is now unused.With
ssa_jwt: None, theSSA_JWTconstant (and its// NOTE: make sure you replace this with your own SSAcomment) at lines 21–22 is no longer referenced and will trigger adead_codewarning when building the example. Either re-enable the SSA path (e.g., keepSome(SSA_JWT.to_string())and document that users must substitute their own) or remove the constant and its note. Keeping it as a documented option (commented-out alternative) would better preserve the example's instructional value.Also applies to: 33-33
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/examples/lock_integration.rs` around lines 21 - 22, The SSA_JWT constant is unused because ssa_jwt is set to None; either restore the SSA path or remove the dead constant. Fix by either (A) changing the example to use ssa_jwt: Some(SSA_JWT.to_string()) and keep the SSA_JWT constant and its comment (document that users must replace the sample SSA), or (B) delete the SSA_JWT constant and its NOTE comment so there is no dead_code warning; locate SSA_JWT and the ssa_jwt assignment in lock_integration.rs to apply one of these two options.
♻️ Duplicate comments (1)
jans-cedarling/cedarling/src/context_data_api/store.rs (1)
200-206:⚠️ Potential issue | 🟡 MinorTelemetry gap in
get_entrystill not fixed —record_data_get()only fires for callers going throughget().
DataApi::get_data_entry_ctxinlib.rsdelegates toself.data.get_entry(...)directly, so reads that retrieve the fullDataEntry(metadata path) bypassrecord_data_get()entirely. Thedata_get_opscounter therefore under-counts and depends on which public entry point the caller uses, which contradicts the “collect telemetry data within Cedarling” objective of the linked issue.Moving the metric into
get_entry(on the successful-lookup, non-expired path in both the write-lock and read-lock branches) and deleting the record inget()gives a single, consistent counter.🔧 Suggested placement
pub(crate) fn get(&self, key: &str) -> Option<Value> { - let result = self.get_entry(key).map(|entry| entry.value); - if result.is_some() { - self.metrics.record_data_get(); - } - result + self.get_entry(key).map(|entry| entry.value) } pub(crate) fn get_entry(&self, key: &str) -> Option<DataEntry> { if self.config.enable_metrics { // ... existing write-lock path, after expiration check & re-store ... + self.metrics.record_data_get(); Some(entry) } else { // ... existing read-lock path, after expiration check ... + self.metrics.record_data_get(); Some(entry) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 200 - 206, The telemetry counter is only incremented in get(), so callers that use get_entry (e.g., DataApi::get_data_entry_ctx) bypass record_data_get(), causing under-counting; move the metrics call into get_entry so successful, non-expired lookups increment the counter regardless of caller. Modify get_entry (the method that returns Option<DataEntry/Value>) to call self.metrics.record_data_get() on the successful path(s) inside both the write-lock and read-lock branches (after confirming the entry exists and is not expired), and remove the record_data_get() call from get() so all reads funnel through the single metric increment. Ensure you reference the same metrics instance used by get() (self.metrics) and preserve existing expiration/locking logic around get_entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@jans-cedarling/cedarling/examples/lock_integration.rs`:
- Line 36: The inline comment next to telemetry_interval is stale and
contradicts the new value; update the comment on the telemetry_interval:
Some(Duration::from_secs(3)) line in lock_integration.rs (referencing
telemetry_interval and Duration::from_secs(3)) to accurately describe that
telemetry will be sent every 3 seconds (or remove the misleading "don't send
telemetry" text), so the comment matches the code's behavior.
- Around line 29-31: The example uses a personal host in the config_uri string;
update the literal assigned to config_uri in lock_integration.rs to use a stable
placeholder (e.g.,
"https://<your-lock-server>/.well-known/lock-server-configuration" or
"https://demoexample.jans.io/...") and add a brief inline comment explaining it
must be replaced by the user's lock server URL; locate the config_uri assignment
and replace the hard-coded dagregi-suitable-iguana.gluu.info host accordingly.
In `@jans-cedarling/cedarling/src/context_data_api/config.rs`:
- Around line 69-71: Update the cryptic comment above the max_ttl field to state
that the clippy suppression is required because clippy's duration lint
(clippy::duration_suboptimal_units, introduced in Rust 1.91) would suggest using
Duration::from_hours(1) or similar, but this project pins the MSRV to Rust 1.90
in rust-toolchain.toml, so we must keep Duration::from_secs(3600); reference the
lint name, the reason (MSRV 1.90), and the exact expression
(Duration::from_secs(3600)) so future maintainers don't replace it with
from_hours/from_mins when linters change.
In `@jans-cedarling/cedarling/src/context_data_api/store.rs`:
- Around line 180-186: The SparKVError::TTLTooLong arm currently reports
requested: ttl.unwrap_or_default(), which yields Duration::ZERO when callers
passed None; change it to use the already-computed effective_ttl_std so the
DataError::TTLExceeded reflects the actual TTL submitted to SparkV (i.e.,
replace requested: ttl.unwrap_or_default() with requested: effective_ttl_std in
the SparKVError::TTLTooLong match arm), and consider adding a debug-assert or
comment noting that effective_ttl_chrono/config.max_ttl should normally prevent
this branch from being hit.
In `@jans-cedarling/cedarling/src/lock/register_client.rs`:
- Around line 83-86: Replace the hard-coded basic_auth credentials in the call
to .basic_auth(...) with the dynamically registered client_id and client_secret
obtained from the earlier registration flow (the variables populated in the
registration_endpoint handling around lines 58–69) so the token request uses the
just-registered credentials; remove the literal strings
("2200...","DssY38lwQQjD") in register_client.rs and use the variables (e.g.,
client_id, client_secret or whatever names are used in register_client or the
surrounding function), and then update the mock_token_endpoint test to assert on
the Authorization header (or accept the dynamic credentials) so tests verify
auth correctness.
---
Outside diff comments:
In `@jans-cedarling/cedarling/examples/lock_integration.rs`:
- Around line 21-22: The SSA_JWT constant is unused because ssa_jwt is set to
None; either restore the SSA path or remove the dead constant. Fix by either (A)
changing the example to use ssa_jwt: Some(SSA_JWT.to_string()) and keep the
SSA_JWT constant and its comment (document that users must replace the sample
SSA), or (B) delete the SSA_JWT constant and its NOTE comment so there is no
dead_code warning; locate SSA_JWT and the ssa_jwt assignment in
lock_integration.rs to apply one of these two options.
---
Duplicate comments:
In `@jans-cedarling/cedarling/src/context_data_api/store.rs`:
- Around line 200-206: The telemetry counter is only incremented in get(), so
callers that use get_entry (e.g., DataApi::get_data_entry_ctx) bypass
record_data_get(), causing under-counting; move the metrics call into get_entry
so successful, non-expired lookups increment the counter regardless of caller.
Modify get_entry (the method that returns Option<DataEntry/Value>) to call
self.metrics.record_data_get() on the successful path(s) inside both the
write-lock and read-lock branches (after confirming the entry exists and is not
expired), and remove the record_data_get() call from get() so all reads funnel
through the single metric increment. Ensure you reference the same metrics
instance used by get() (self.metrics) and preserve existing expiration/locking
logic around get_entry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8570c7ac-ca60-46d2-b52b-cb49ed8fff7c
📒 Files selected for processing (8)
jans-cedarling/cedarling/benches/context_data_store_benchmark.rsjans-cedarling/cedarling/examples/lock_integration.rsjans-cedarling/cedarling/src/context_data_api/config.rsjans-cedarling/cedarling/src/context_data_api/entry.rsjans-cedarling/cedarling/src/context_data_api/store.rsjans-cedarling/cedarling/src/context_data_api/validation.rsjans-cedarling/cedarling/src/lock/register_client.rsjans-cedarling/cedarling/src/tests/context_data_api.rs
| SparKVError::TTLTooLong => DataError::TTLExceeded { | ||
| requested: ttl.unwrap_or_default(), | ||
| max: self | ||
| .config | ||
| .max_ttl | ||
| .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)), | ||
| } | ||
| }, | ||
| }, |
There was a problem hiding this comment.
TTLTooLong branch reports Duration::ZERO as requested when no explicit TTL was provided.
In the SparKVError::TTLTooLong arm, ttl.unwrap_or_default() resolves to Duration::ZERO for callers that passed None and relied on default_ttl. The returned DataError::TTLExceeded { requested, max } is then misleading for both logs and telemetry (and the unrelated data.ttl_exceeded metric key conflates two distinct causes: caller-supplied TTL too long vs. default-TTL path tripping sparkv's internal cap).
Use the already-computed effective_ttl_std instead so the error reflects what was actually submitted to sparkv:
🔧 Proposed fix
- SparKVError::TTLTooLong => DataError::TTLExceeded {
- requested: ttl.unwrap_or_default(),
- max: self
- .config
- .max_ttl
- .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)),
- },
+ SparKVError::TTLTooLong => DataError::TTLExceeded {
+ requested: effective_ttl_std,
+ max: self
+ .config
+ .max_ttl
+ .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)),
+ },Side note: since effective_ttl_chrono is already capped against config.max_ttl upstream and the sparkv config max_ttl mirrors config.max_ttl (or INFINITE_TTL_SECS when unset), this branch should in practice be unreachable — worth confirming and/or downgrading to a debug-assert so real TTLTooLong failures surface instead of being silently coerced.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SparKVError::TTLTooLong => DataError::TTLExceeded { | |
| requested: ttl.unwrap_or_default(), | |
| max: self | |
| .config | |
| .max_ttl | |
| .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)), | |
| } | |
| }, | |
| }, | |
| SparKVError::TTLTooLong => DataError::TTLExceeded { | |
| requested: effective_ttl_std, | |
| max: self | |
| .config | |
| .max_ttl | |
| .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)), | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 180 -
186, The SparKVError::TTLTooLong arm currently reports requested:
ttl.unwrap_or_default(), which yields Duration::ZERO when callers passed None;
change it to use the already-computed effective_ttl_std so the
DataError::TTLExceeded reflects the actual TTL submitted to SparkV (i.e.,
replace requested: ttl.unwrap_or_default() with requested: effective_ttl_std in
the SparKVError::TTLTooLong match arm), and consider adding a debug-assert or
comment noting that effective_ttl_chrono/config.max_ttl should normally prevent
this branch from being hit.
d0a3259 to
2917dcc
Compare
- Add #![allow(clippy::duration_suboptimal_units)] in test/benchmark files since the clippy job in CI uses the latest toolchain while the rest use 1.90.0 Signed-off-by: dagregi <dagmawi.m@proton.me>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
jans-cedarling/cedarling/src/context_data_api/store.rs (2)
470-476:⚠️ Potential issue | 🟡 MinorRedundant assertion without a descriptive message.
Line 475 duplicates the matcher already covered on lines 471–474 and omits the required descriptive message. Drop the duplicate.
🔧 Proposed fix
let result = store.push("", json!("value"), None); assert!( matches!(result, Err(DataError::InvalidKey)), "push with empty key should return DataError::InvalidKey" ); - assert!(matches!(result, Err(DataError::InvalidKey)));As per coding guidelines: "All assertions must include a descriptive message explaining what is being tested".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 470 - 476, The test contains a duplicated assertion for store.push("", json!("value"), None) checking Err(DataError::InvalidKey); remove the second redundant assert and keep the original assert!(matches!(result, Err(DataError::InvalidKey)), "push with empty key should return DataError::InvalidKey") so the remaining assertion includes the required descriptive message referencing store.push and DataError::InvalidKey.
107-194: 🧹 Nitpick | 🔵 TrivialConsider consolidating the per-error
metrics.record_errorcalls into a single exit point.The
pushbody now interleaves 5self.metrics.record_error(&err); return Err(err);pairs with the business logic, which hurts readability and is easy to drift out of sync when new error paths are added (e.g., if a newSparKVErrorvariant is added, forgetting to record it would silently miss telemetry).Options to consider (not blocking):
- Extract the body into an inner
fn push_inner(...) -> Result<(), DataError>and record errors once at the outer call site:♻️ Sketch
pub(crate) fn push(&self, key: &str, value: Value, ttl: Option<StdDuration>) -> Result<(), DataError> { let result = self.push_inner(key, value, ttl); match &result { Ok(()) => self.metrics.record_data_push(), Err(e) => self.metrics.record_error(e), } result }This keeps telemetry recording orthogonal to error construction and prevents future drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 107 - 194, The push method interleaves multiple self.metrics.record_error(&err) calls; extract the existing body into an inner function (e.g., fn push_inner(&self, key: &str, value: Value, ttl: Option<StdDuration>) -> Result<(), DataError>) that performs validation, builds DataEntry, checks size, and calls storage.set_with_ttl (handling SparKVError -> DataError conversion) but does not call metrics; then change push to call push_inner, match on the Result and call self.metrics.record_data_push() on Ok or self.metrics.record_error(e) once on Err before returning the original Result; ensure DataError variants (including those created from SparKVError) are returned unchanged so the outer match can record them.
♻️ Duplicate comments (2)
jans-cedarling/cedarling/src/context_data_api/store.rs (2)
180-190:⚠️ Potential issue | 🟡 Minor
TTLTooLongbranch still reportsDuration::ZEROwhen caller passedNone.
ttl.unwrap_or_default()yieldsDuration::ZEROfor callers relying ondefault_ttl, which now also feeds therecord_errormetric and the user-visibleDataError::TTLExceeded { requested }. Useeffective_ttl_stdso both the error and the emitted metric reflect the TTL actually submitted to sparkv.🔧 Proposed fix
- SparKVError::TTLTooLong => DataError::TTLExceeded { - requested: ttl.unwrap_or_default(), - max: self - .config - .max_ttl - .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)), - }, + SparKVError::TTLTooLong => DataError::TTLExceeded { + requested: effective_ttl_std, + max: self + .config + .max_ttl + .unwrap_or(StdDuration::from_secs(INFINITE_TTL_SECS as u64)), + },Note: this branch should be unreachable in practice because
effective_ttl_chronois already capped againstconfig.max_ttlbeforeset_with_ttl. Consider downgrading todebug_assertor adding a comment so real sparkv-internal TTL failures surface instead of being silently coerced.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 180 - 190, The TTLTooLong branch constructs DataError::TTLExceeded using ttl.unwrap_or_default(), which yields Duration::ZERO for callers who passed None and causes self.metrics.record_error(&err) to report the wrong requested TTL; change the branch to use the already-computed effective_ttl_std (the actual TTL submitted to sparkv) when building DataError::TTLExceeded and when recording the metric so both the emitted error and metrics reflect the real TTL; update references in the TTLTooLong branch where DataError::TTLExceeded and self.metrics.record_error(&err) are created to use effective_ttl_std instead of ttl.unwrap_or_default(), and consider adding a debug_assert/comment noting this branch should be unreachable because effective_ttl_chrono is capped before set_with_ttl.
200-206:⚠️ Potential issue | 🟡 MinorTelemetry gap persists:
get_entry()still does not recordrecord_data_get().
DataApi::get_data_entry_ctx(inlib.rs) delegates directly toDataStore::get_entry, which bypasses therecord_data_get()call that only fires inget(). Reads through the metadata-returning public API are silently excluded from thedata_get_opscounter, producing inconsistent telemetry depending on which API the caller uses.Move the increment into
get_entryso both paths are counted once:🔧 Proposed fix
pub(crate) fn get(&self, key: &str) -> Option<Value> { - let result = self.get_entry(key).map(|entry| entry.value); - if result.is_some() { - self.metrics.record_data_get(); - } - result + self.get_entry(key).map(|entry| entry.value) } pub(crate) fn get_entry(&self, key: &str) -> Option<DataEntry> { if self.config.enable_metrics { // ... existing write-lock path that returns Some(entry) ... + self.metrics.record_data_get(); Some(entry) } else { // ... existing read-lock path that returns Some(entry) ... + self.metrics.record_data_get(); Some(entry) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@jans-cedarling/cedarling/src/context_data_api/store.rs` around lines 200 - 206, The telemetry increment is only in DataStore::get (metrics.record_data_get()) so calls that go through DataStore::get_entry (e.g., DataApi::get_data_entry_ctx) are not counted; move the record_data_get() call into the DataStore::get_entry implementation (call metrics.record_data_get() exactly once at the start or just before returning a successful entry) and remove the duplicate increment from DataStore::get so both get() and callers of get_entry() are counted uniformly; reference functions: DataStore::get, DataStore::get_entry, DataApi::get_data_entry_ctx, and metrics.record_data_get.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@jans-cedarling/cedarling/src/context_data_api/store.rs`:
- Around line 470-476: The test contains a duplicated assertion for
store.push("", json!("value"), None) checking Err(DataError::InvalidKey); remove
the second redundant assert and keep the original assert!(matches!(result,
Err(DataError::InvalidKey)), "push with empty key should return
DataError::InvalidKey") so the remaining assertion includes the required
descriptive message referencing store.push and DataError::InvalidKey.
- Around line 107-194: The push method interleaves multiple
self.metrics.record_error(&err) calls; extract the existing body into an inner
function (e.g., fn push_inner(&self, key: &str, value: Value, ttl:
Option<StdDuration>) -> Result<(), DataError>) that performs validation, builds
DataEntry, checks size, and calls storage.set_with_ttl (handling SparKVError ->
DataError conversion) but does not call metrics; then change push to call
push_inner, match on the Result and call self.metrics.record_data_push() on Ok
or self.metrics.record_error(e) once on Err before returning the original
Result; ensure DataError variants (including those created from SparKVError) are
returned unchanged so the outer match can record them.
---
Duplicate comments:
In `@jans-cedarling/cedarling/src/context_data_api/store.rs`:
- Around line 180-190: The TTLTooLong branch constructs DataError::TTLExceeded
using ttl.unwrap_or_default(), which yields Duration::ZERO for callers who
passed None and causes self.metrics.record_error(&err) to report the wrong
requested TTL; change the branch to use the already-computed effective_ttl_std
(the actual TTL submitted to sparkv) when building DataError::TTLExceeded and
when recording the metric so both the emitted error and metrics reflect the real
TTL; update references in the TTLTooLong branch where DataError::TTLExceeded and
self.metrics.record_error(&err) are created to use effective_ttl_std instead of
ttl.unwrap_or_default(), and consider adding a debug_assert/comment noting this
branch should be unreachable because effective_ttl_chrono is capped before
set_with_ttl.
- Around line 200-206: The telemetry increment is only in DataStore::get
(metrics.record_data_get()) so calls that go through DataStore::get_entry (e.g.,
DataApi::get_data_entry_ctx) are not counted; move the record_data_get() call
into the DataStore::get_entry implementation (call metrics.record_data_get()
exactly once at the start or just before returning a successful entry) and
remove the duplicate increment from DataStore::get so both get() and callers of
get_entry() are counted uniformly; reference functions: DataStore::get,
DataStore::get_entry, DataApi::get_data_entry_ctx, and metrics.record_data_get.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0005bba2-3ebf-41c0-9680-e14f512cc831
📒 Files selected for processing (6)
jans-cedarling/cedarling/benches/context_data_store_benchmark.rsjans-cedarling/cedarling/src/context_data_api/config.rsjans-cedarling/cedarling/src/context_data_api/entry.rsjans-cedarling/cedarling/src/context_data_api/store.rsjans-cedarling/cedarling/src/context_data_api/validation.rsjans-cedarling/cedarling/src/tests/context_data_api.rs
Prepare
Description
Target issue
closes #10849
Implementation Details
Test and Document the changes
Please check the below before submitting your PR. The PR will not be merged if there are no commits that start with
docs:to indicate documentation changes or if the below checklist is not selected.Summary by CodeRabbit
New Features
Bug Fixes