[SVLS-9175] feat: emit OOM metric on memory equality with per-request dedup by lym953 · Pull Request #1241 · DataDog/datadog-lambda-extension

lym953 · 2026-05-29T19:13:26Z

Background

From our knowledge (before this PR), here's the behavior when each runtime OOMs:

emits runtime-specific error message. This can happen on Java, Node (case 1 in the table below) and .NET
In PlatformRuntimeDone event, error_type is Runtime.OutOfMemory. This can happen on Python and Ruby.
In PlatformReport event, max_memory_used == memory_size. This can happen on Python, Ruby, Node and Go.

To capture OOM for all these scenarios (except Node case 2, which was just called out in #1237) without double counting, right now the extension emits aws.lambda.enhanced.out_of_memory metric in these scenarios:

when we see runtime-specific error messages for Java, Node and .NET
when we see Runtime.OutOfMemory
when we see max_memory_used == memory_size for Go, i.e. only when runtime is provided.al2. We don't do this for other runtimes (Python, Ruby, Node) to avoid double counting.

Problem

In issue #1237, a customer called out a new scenario: "Node (case 2)" in the table. The only evidence of OOM is max_memory_used == memory_size, and there is no runtime-specific log message. As a result, OOMs like this are not captured by the OOM enhanced metric.

This PR

Regardless of runtime, use all the three ways to capture OOM.
In addition, dedup by request_id to avoid double counting.
Add one integration test per runtime (except for Node, which has 2 tests)

Test plan

Passed the added unit tests and integration tests.

To reviewers

Most of the code changes are for integration tests.

Details (generated by Claude Code)

Closes the gap surfaced in #1237: a Node.js Lambda that hit its memory limit (Memory Size 192 MB / Max Memory Used 192 MB, Status: timeout) did not emit aws.lambda.enhanced.out_of_memory because none of the three existing detection paths matched.

Why the existing paths missed it. V8 spent its budget in GC rather than declaring JavaScript heap out of memory, so the runtime log-line match never fired. The runtime crashed on a wall-clock timeout, so PlatformRuntimeDone reported no error_type. And the max_memory_used_mb == memory_size_mb check in PlatformReport was gated on runtime.starts_with("provided.al") to avoid double-counting against the log path, so Node was excluded.
What changes. Drop the provided.al* restriction so the equality check applies to every runtime. To avoid double-counting against the two pre-existing paths (some invocations satisfy both equality and Runtime.OutOfMemory simultaneously), add a per-Context oom_emitted flag. All three detection paths funnel through a new Processor::try_increment_oom_metric, which checks/sets the flag and is a no-op on subsequent calls for the same request_id.
Plumbing. Event::OutOfMemory now carries an Option<String> request_id. The log-path detector reads it from LambdaProcessor::invocation_context.request_id (set on PlatformStart, cleared on PlatformRuntimeDone/PlatformReport). None is only realistic in Managed Instance mode (extensions can't subscribe to INVOKE there); the helper falls back to a best-effort emit without dedup in that case.

…th per-request dedup Customer report (#1237): a Node.js Lambda that hit its memory limit (Memory Size 192 MB / Max Memory Used 192 MB, Status: timeout) did not emit aws.lambda.enhanced.out_of_memory because none of the existing detection paths matched. The Node runtime did not log "JavaScript heap out of memory" (V8 spent its time in GC instead of declaring an OOM), and PlatformRuntimeDone reported no error_type — just a wall-clock timeout — so the log-string and Runtime.OutOfMemory paths both stayed silent. Drop the provided.al* restriction on the PlatformReport equality check so any runtime emits OOM when max_memory_used_mb == memory_size_mb. To avoid double-counting against the two pre-existing paths (some invocations satisfy both equality and Runtime.OutOfMemory at the same time), add a per-Context oom_emitted flag. All three detection paths now funnel through Processor::try_increment_oom_metric, which checks the flag, sets it on first emission, and is a no-op on subsequent calls for the same request_id. The flag lives with the per-invocation Context and is cleared automatically when on_platform_report removes the context. Plumbing: Event::OutOfMemory now carries an Option<String> request_id (the log-path detector reads it from the logs processor's invocation_context.request_id, set on PlatformStart and cleared on PlatformRuntimeDone). When request_id is None — only realistic in Managed Instance mode, where extensions cannot subscribe to INVOKE — the helper falls back to a best-effort emit without dedup. Tests cover three scenarios: same request_id emits exactly once, two distinct request_ids each emit, and the equality path still fires (regression coverage for the dropped provided.al* check). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

datadog-prod-us1-5 · 2026-05-29T19:15:39Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/datadog-lambda-extension | publish layer e2e sandbox (amd64, fips)

🔄 Retry job. This looks flaky and may succeed on retry.
Rate exceeded during AWS API call for ListLayerVersions operation due to ThrottlingException after max retries reached.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 631ed00 | Docs | Datadog PR Page | Give us feedback!}

Adds a new `oom` integration-test suite that exercises the OOM dedup change (Context::oom_emitted, #1241) end-to-end across every supported runtime. Each lambda intentionally allocates until it OOMs; the test asserts aws.lambda.enhanced.out_of_memory increments by exactly one data point per function over the invocation window — which fails if the dedup flag stops working and two detection paths emit for the same invocation. New lambda apps under integration-tests/lambda/: - oom-node-v8-heap : exercises log-line path (JavaScript heap OOM) - oom-node-sigkill : exercises PlatformRuntimeDone Runtime.OutOfMemory path - oom-python : MemoryError — log path AND PlatformRuntimeDone path both fire, so dedup is necessary for count==1 - oom-ruby : NoMemoryError — same dual-path coverage as Python - oom-java : OutOfMemoryError (log-line path) - oom-dotnet : OutOfMemoryException (log-line path) - oom-go : fatal: runtime: out of memory — log path AND PlatformReport memory-equality path both fire Framework additions: - Ruby and Go runtime/layer helpers in lib/util.ts (Ruby tracer layer; Go has no tracer layer — extension layer alone covers the test). - Oom CDK stack registered in bin/app.ts. - build-ruby.sh (zip-as-is for now; Gemfile build stubbed) and build-go.sh (Docker cross-compile to ARM64 Linux, bootstrap binary). - Pipeline template additions for the two new build stages and oom suite registration in test-suites.yaml. - getMetricCount() + OUT_OF_MEMORY_METRIC in tests/utils/datadog.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI run on the first oom suite (commit 5a833ac) returned counts: node-v8-heap=1 ✓ node-sigkill=1 ✓ python=1 ✓ dotnet=1 ✓ ruby=0 ✗ java=0 ✗ go=0 ✗ Reproducing locally: - Ruby: function failed at init with `cannot load such file -- datadog_lambda_rb`. The Datadog Ruby tracer is a regular gem (no handler shim like Python's `datadog_lambda.handler.handler`), so set handler to `lambda_function.handler` and drop `DD_LAMBDA_HANDLER`. - Go: function timed out (30s) at `Max Memory Used: 192 MB / Memory Size: 192 MB` without emitting any enhanced metrics. Two changes: * Drop `AWS_LAMBDA_EXEC_WRAPPER=/opt/datadog_wrapper` — the wrapper sets language-specific tracer env vars; Go's tracer is in-module not layer-based, so the wrapper just changes runtime detection without helping. With the wrapper removed and a clean exec, the extension's enhanced-metric pipeline starts emitting. * Replace the `for { append(make([]byte, 10MB)) }` loop with a single `make([]byte, 500MB)` that writes every page. Go's slice doubling + GC kept the loop from OOMing reliably in the 30s timeout window; eager allocation guarantees `fatal error: runtime: out of memory` fires immediately, exercising bottlecap's log-line detection. - Java: also failed in CI (count=0) but local repro now returns count=1 with the same code path. Leaving the Java app unchanged for the next CI run to confirm. If it fails again, likely the extension didn't flush the metric before the JVM crashed; would need DD_SERVERLESS_FLUSH_STRATEGY changes or per-function twice-invoke. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…oom suite The integration-test framework defaults DD_SERVERLESS_FLUSH_STRATEGY to `end`, which means the extension only flushes at end of invocation. For OOM tests that's a tight race: the function dies, then Lambda sends PlatformRuntimeDone, then bottlecap increments the OOM metric, then Shutdown comes and the sandbox is reaped. If the metric flush can't finish in that narrow window, the data point is lost. Run 1 of the oom suite returned ruby/java/go=0 (3 of 7 failed). Run 2 returned ruby/node-sigkill/python/dotnet/go=0 (5 of 7 failed) — but java=1 this time. The set of "failing" runtimes is not stable across runs, confirming a timing race rather than a code bug. `default` flushes every ~1s in addition to invocation-end, giving the OOM metric a much wider window to reach Datadog before the sandbox is torn down. All other integration suites keep using `end` since their invocations complete cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI runs were intermittently returning count=0 for ruby/java/go/dotnet/ node-sigkill/python — varying combinations across runs. Diagnosing showed the data points were correctly emitted and durably ingested by Datadog within ~30s of the OOM, but the `/api/v1/query` endpoint sometimes returned no results for very-recently-ingested points. The single-shot 5-minute wait was too brittle. Polling strategy: wait 90s after invocation, then re-query every 30s until every runtime reports count>=1 or the 12-min budget is exhausted. Early-exits when all runtimes pass, so the common case is faster than the previous single-shot 5-min wait while the worst case is bounded. Each poll iteration logs the current counts and the still-missing runtimes, so debugging future flakes from CI logs requires no rerun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lt` was a no-op `FlushStrategy::Default` falls back to `End` until the lookback buffer fills (~20 invocations). The OOM test does a single cold-start invoke per function, so `default` behaved identically to `end` — explaining why the prior commit's change had no observable effect. `continuously,1000` schedules an unconditional 1s periodic flush regardless of invocation count, so the OOM metric reaches Datadog well before the sandbox is reaped after the function process dies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…es OOM-kill Root cause of the prior `[oom]` failures (6/7 runtimes stuck at count=0): at 192 MB the kernel OOM-killer often picks the bottlecap extension instead of the function runtime — Lambda surfaces this as `errorType: Extension.Crash`. A dead extension can't emit the OOM metric, so the test sees nothing in Datadog. Reproduced locally on us-east-2 arm64 with an IntegTests-style Python function: at 192 MB → `Extension.Crash`, no metric. Bumping to 256 MB → `Runtime.OutOfMemory`, count=1 in Datadog within 30 s. 256 MB gives the extension ~30 MB headroom while keeping every detection path active: the function still hits memory_size in PlatformReport, still emits its runtime-specific OOM log line, and still gets `Runtime.OutOfMemory` in PlatformRuntimeDone. The customer's #1237 case (192 MB) is unaffected — this is a test-harness change. Also drops the `DD_SERVERLESS_FLUSH_STRATEGY=continuously,1000` override from the prior commit; with the extension surviving, the default `end` flush is sufficient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Trim historical `provided.al` context from OOM detection comments - Rewrite `test_handle_ondemand_report_emits_oom_on_memory_equality` doc comment to describe what the test covers, not how the rule changed - Refocus `current_request_id` doc on its sole purpose (OOM metric dedup by request_id) and drop speculative scenarios that weren't directly verified; use "LMI mode" consistently - Drop "as of 2026-05" qualifier from the OOM detection path list - Bump Datadog-Ruby3-4-ARM default layer 9 -> 28 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lym953 · 2026-06-02T01:06:17Z

+    /// `PlatformRuntimeDone` / `PlatformReport`. Returns `None` in LMI mode,
+    /// where extensions cannot subscribe to the `INVOKE` event so
+    /// `platform.start` is never delivered.
+    fn current_request_id(&self) -> Option<String> {


To reviewers: is it anti-pattern to get request_id in this way?

~~Actually, for all three cases, we can get request_id from either log payload or telemetry event payload, so we don't need this current_request_id() function. Let me delete it.~~

For the error

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

No request id is available from the log itself, so we still need the function current_request_id().

Copilot

Pull request overview

This PR expands OOM detection so aws.lambda.enhanced.out_of_memory can be emitted for all runtimes when any of the three OOM signals is observed (runtime-specific OOM log line, Runtime.OutOfMemory in PlatformRuntimeDone, or max_memory_used_mb == memory_size_mb in PlatformReport). To avoid double counting when multiple signals fire for the same invocation, it adds a per-invocation dedup flag keyed by request_id. It also adds a new cross-runtime integration test stack/suite (plus Ruby/Go build plumbing) to validate “exactly once per invocation”.

Changes:

Emit OOM metric on max_memory_used_mb == memory_size_mb for all runtimes, and dedupe per invocation via Context::oom_emitted.
Extend the event bus / processor plumbing so OOM events can carry an optional request_id.
Add an OOM integration-test stack & test suite covering multiple runtimes, plus Go/Ruby build steps in CI/local deploy.

Reviewed changes

Copilot reviewed 26 out of 28 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`bottlecap/src/lifecycle/invocation/context.rs`	Adds `oom_emitted` flag to dedupe OOM metric emissions per invocation.
`bottlecap/src/lifecycle/invocation/processor.rs`	Uses dedup helper for all OOM detection paths; removes `provided.al*` gating on memory equality; adds unit tests.
`bottlecap/src/lifecycle/invocation/processor_service.rs`	Threads optional `request_id` through the processor command for OOM events.
`bottlecap/src/logs/lambda/processor.rs`	Tags OOM log-line events with `request_id` (when available) for dedup.
`bottlecap/src/event_bus/mod.rs`	Changes OOM event shape to include optional `request_id`.
`bottlecap/src/bin/bottlecap/main.rs`	Forwards new OOM event shape into the invocation processor service.
`bottlecap/src/metrics/enhanced/lambda.rs`	Updates OOM metric docs to reflect new dedup path and detection coverage.
`integration-tests/tests/utils/datadog.ts`	Adds helper to query total metric emission count.
`integration-tests/tests/oom.test.ts`	Adds cross-runtime integration test asserting exactly one OOM metric emission per invocation.
`integration-tests/lib/stacks/oom.ts`	New CDK stack deploying OOM repro lambdas across runtimes.
`integration-tests/lib/util.ts`	Adds Ruby/Go runtime + Ruby tracer layer helpers.
`integration-tests/bin/app.ts`	Registers the new OOM test stack in the integration test app.
`integration-tests/lambda/oom-/`	Adds OOM repro Lambda sources for Node/Python/Ruby/Java/.NET/Go.
`integration-tests/scripts/local_deploy.sh`	Adds Ruby/Go build steps to local deploy.
`integration-tests/scripts/build-ruby.sh`	New Ruby build script (currently no-op for Gemfile-less lambdas).
`integration-tests/scripts/build-go.sh`	New Go cross-compile script producing `bin/bootstrap` for provided runtime.
`.gitlab/templates/pipeline.yaml.tpl`	Adds CI jobs to build Ruby/Go lambdas and wires them into the integration suite.
`.gitlab/datasources/test-suites.yaml`	Adds the `oom` test suite entry.

litianningdatadog · 2026-06-02T13:43:05Z

+                return;
+            }
+            ctx.oom_emitted = true;
+        }


I’m curious: in what cases would the request ID be empty or unavailable? Are either of those cases valid? Maybe we can add a debug log for this.

Added a debug log

Tested on LMI. OOM log can arrive before the extension receives request_id from PlatformStart event. In this case, request_id is empty. Updated comment to explain this.

litianningdatadog

Left a minior comment

Per PR review feedback. The two no-dedup branches in `try_increment_oom_metric` were previously silent; surfacing them as debug logs makes the LMI-mode case (request_id=None) and the rare context-eviction case (request_id supplied but absent from the buffer) visible during investigations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a dedicated `lmi-oom` suite that deploys one Python function on the LMI Capacity Provider and asserts that the OOM enhanced metric is emitted when the function hits its memory cap. Exercises the LMI-specific log-line path where `current_request_id()` returns `None` because `platform.start` is never delivered, so the OOM detector flows through the no-dedup branch of `try_increment_oom_metric`. Assertion is `count >= 1` rather than `== 1` because Path 2 (`Runtime.OutOfMemory` via synthesized runtime_done from `handle_managed_instance_report`) also fires for the same invocation and cannot dedup against the log path's `None`. A future change can tighten this once LMI dedup is addressed. Also simplifies overly-verbose comments above the two no-dedup debug logs — the log messages are self-explanatory. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Deployed a Python OOM Lambda on the LMI Capacity Provider, captured the extension's debug logs from CloudWatch, and observed the actual flow: - PlatformStart IS delivered in LMI mode (prior comment claimed it wasn't). - For a Python `MemoryError` that fires immediately on first allocation, the OOM log line is processed by `LambdaProcessor` *before* the `PlatformStart` telemetry event's handler updates `invocation_context.request_id` — both arrive in the same millisecond. - `current_request_id()` therefore returns `None` and the metric flows through the no-dedup branch (the new debug log fires). - The synthesized runtime-done from `handle_managed_instance_report` reports `error_type=Runtime.Unknown` (not `Runtime.OutOfMemory`), so Path 2 does NOT fire for this Python OOM shape. Final metric count = 1 (no double-count). Updates the `current_request_id()` doc, the no-dedup debug log message, and the LMI OOM stack/test comments to reflect what was actually observed rather than the prior (incorrect) "platform.start never delivered in LMI" hypothesis. Assertion stays `>= 1` for robustness against future changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously the comments on `try_increment_oom_metric`, the path-3 caller in `handle_ondemand_report`, and `increment_oom_metric` all opened with confident phrasing ("exactly once per request_id", "isn't counted multiple times when more than one detection path fires"), and the no-dedup fallback was buried at the bottom or absent. That mischaracterizes the guarantee: when the OOM log line lands before/after the active-invocation window in `LambdaProcessor`, or when the context has been evicted, the metric will be double-counted by a subsequent detection path. Restructures the three comments so the best-effort caveat is up front and the two edge cases (request_id=None race, context evicted) are called out explicitly with their consequences. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In LMI mode the function-log JSON payload always carries a `requestId` field that we already extract a few lines above the OOM detection block. Plumbing that value into `Event::OutOfMemory` instead of falling back to `current_request_id()` closes the race observed in #1241 (comment) where a fast OOM log line is processed before this same processor's `PlatformStart` handler updates `invocation_context.request_id`. OnDemand mode is unaffected — `request_id` from the log payload is unconditionally `None` there, so we still fall back to `current_request_id()`, which works because `PlatformStart`'s race window doesn't manifest in OnDemand operationally. Updates the `current_request_id` doc and the LMI OOM stack/test comments to reflect that the LMI case now goes through the deduped branch by way of the payload `requestId`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Id universally Per #1241 (comment): paths 2 and 3 already get `request_id` directly from the PlatformRuntimeDone and PlatformReport event payloads (as a function parameter); only path 1 (the OOM log-line detector in `LambdaProcessor::get_message`) was using `current_request_id()`. And path 1 has an even better source for the request id — the `requestId` field that structured JSON log payloads already carry — which doesn't race with the in-processor `PlatformStart` handler. Drops the `is_managed_instance_mode` gate around payload `requestId` extraction so on-demand mode also benefits (it was the LMI Python case that surfaced the race empirically, but the same source is more accurate than `invocation_context.request_id` in on-demand mode too). The OOM detector now tags `Event::OutOfMemory` with the extracted payload `requestId` directly; the Extension log variant passes `None` (extension log payloads don't carry a function request id), and falls through to `try_increment_oom_metric`'s no-dedup branch. Updates `test_regular_lambda_does_not_extract_request_id` → `test_regular_lambda_extracts_request_id_from_payload` since the rule it was locking in (LMI-only extraction) no longer holds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… requestId universally" This reverts commit 1c95f5a.

…) fallback Per #1241 review: when the request id is available from the log/event payload, use it directly; only fall back to a workaround (`current_request_id()`) when the payload doesn't carry one. Drops the `is_managed_instance_mode` gate on payload `requestId` extraction so on-demand mode also benefits. The OOM detector now reads the payload field whenever it's present (Python, Ruby, .NET, and Java/Node when JSON log format is configured) regardless of mode, and falls back to `current_request_id()` only for text-payload OOM logs (Node V8 fatal, Go fatal, Java stderr) where no `requestId` field exists. The fallback path preserves the count==1 behavior for the double-detect cases on the integration suite (Java OutOfMemoryError, Node SIGKILL, Go fatal-error) — these were what the previous "drop current_request_id() entirely" refactor would have regressed. Also renames `test_regular_lambda_does_not_extract_request_id` → `test_regular_lambda_extracts_request_id_from_payload` to match the new universal extraction behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e-shot allocator CDK stack create-function in CI failed for the new `lmi-oom` suite with 'MemorySize value failed to satisfy constraint: Lambda Managed Instance functions must have memory size greater than or equal to 2048'. LMI Lambda enforces a 2 GB floor. Bumping to 2048 MB exposes a second problem: the existing `oom-python` source allocates 10 MB strings in a loop, which on 2 GB either runs past the test budget or gets kernel SIGKILL'd silently before CPython raises MemoryError — exactly what we need Path 1 of the OOM detector to see. Adds `oom-python-lmi/lambda_function.py` with a single `bytearray(100 * 1024 ** 3)` allocation. 100 GB exceeds any reasonable Lambda memory cap by orders of magnitude, so CPython's allocator refuses immediately and raises a clean MemoryError without involving the cgroup OOM killer. Verified manually with `yiming-lmi-oom-debug` in us-east-1 (PR #1241 thread). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rollup bucket The `[lmi-oom]` suite failed 2x in a row on `0da7a59f` despite the metric being present in Datadog (verified via direct API query). Root cause: Datadog rolls `aws.lambda.enhanced.out_of_memory` into 10-second wall-clock-aligned buckets, and the `/api/v1/query` endpoint only returns buckets whose start timestamp is >= the `from` parameter. In the failing run, the LMI cold start was fast: `windowStart = Date.now()` ran at 19:32:11, the function OOMed at 19:32:18, both in the same bucket starting at 19:32:10. The bucket's timestamp (19:32:10) is less than `from = 19:32:11`, so the bucket is excluded. The test polled 21 times across 12 minutes and saw `count = 0` every time, while a direct query with a wider `from` returned `count = 1` for the same data point. Fix: pad `windowStart` 60 s earlier than the actual invoke time so the bucket containing the OOM is always included. The `deadline` budget still runs from `invokeTime`, not the padded value. Apply the same defensive change to `[oom]`. It hasn't flaked on this specifically yet but the same race is possible — workload-dependent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per #1241 (comment) the comment in `oom-go/main.go` was written when `oomMemorySize` was still 192 MB; the stack has since been bumped to 256 MB (so the bottlecap extension has headroom and isn't OOM-killed itself, see the 256 MB rationale in `lib/stacks/oom.ts`). Updates the two stale '192 MB' references in the Go reproducer and adds a pointer to the canonical constant in the stack file so the next person who tweaks one place sees the other. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lym953 and others added 7 commits May 29, 2026 15:52

lym953 commented Jun 2, 2026

View reviewed changes

lym953 changed the title ~~fix(metrics): emit OOM metric on memory equality with per-request dedup~~ [SVLS-9175] feat: emit OOM metric on memory equality with per-request dedup Jun 2, 2026

lym953 marked this pull request as ready for review June 2, 2026 01:38

lym953 requested a review from a team as a code owner June 2, 2026 01:38

lym953 requested review from Copilot and litianningdatadog June 2, 2026 01:38

Copilot started reviewing on behalf of lym953 June 2, 2026 01:38 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread bottlecap/src/logs/lambda/processor.rs

Comment thread integration-tests/tests/oom.test.ts Outdated

Comment thread integration-tests/lambda/oom-go/main.go Outdated

Comment thread integration-tests/lambda/oom-go/main.go Outdated

litianningdatadog reviewed Jun 2, 2026

View reviewed changes

lym953 and others added 11 commits June 2, 2026 11:53

Revert "refactor(logs): drop current_request_id() helper, use payload…

8115a66

… requestId universally" This reverts commit 1c95f5a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SVLS-9175] feat: emit OOM metric on memory equality with per-request dedup#1241

[SVLS-9175] feat: emit OOM metric on memory equality with per-request dedup#1241
lym953 wants to merge 19 commits into
mainfrom
yiming.luo/fix-1237-node-oom-metric

lym953 commented May 29, 2026 •

edited

Loading

Uh oh!

datadog-prod-us1-5 Bot commented May 29, 2026 •

edited

Loading

Uh oh!

lym953 Jun 2, 2026

Uh oh!

lym953 Jun 2, 2026 •

edited

Loading

Uh oh!

lym953 Jun 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

litianningdatadog Jun 2, 2026

Uh oh!

lym953 Jun 2, 2026

Uh oh!

litianningdatadog left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lym953 commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Problem

This PR

Test plan

To reviewers

Details (generated by Claude Code)

Uh oh!

datadog-prod-us1-5 Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

lym953 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

lym953 Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lym953 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

litianningdatadog Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

lym953 Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

litianningdatadog left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lym953 commented May 29, 2026 •

edited

Loading

datadog-prod-us1-5 Bot commented May 29, 2026 •

edited

Loading

lym953 Jun 2, 2026 •

edited

Loading