Add native-backed transport#5699
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bec06a7347
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
bec06a7 to
c775e50
Compare
d8e8638 to
6a597db
Compare
b50d2bd to
f9675b4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9675b4249
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Each trace segment becomes one inner array (one trace chunk). | ||
| chunks = traces.map(&:spans) | ||
|
|
||
| responses = @exporter._native_send_traces(chunks) |
There was a problem hiding this comment.
Preserve rich span fields in native transport
When native_transport is enabled, any trace containing OpenTelemetry span links/events or metastruct data (for example AI Guard payloads) loses those fields here because the transport hands bare Span objects to _native_send_traces; the native converter only copies scalar fields, meta, and metrics (repo-wide rg finds no span_events, span_links, or metastruct handling under ext/libdatadog_api). The existing HTTP serializer writes span_events, meta_struct, and span_links, so this is not a drop-in replacement for those traces and silently corrupts what reaches the agent.
Useful? React with 👍 / 👎.
ivoanjo
left a comment
There was a problem hiding this comment.
Gave a pass, haven't finished yet!
| if native_transport | ||
| x.report("send_traces - Native") do | ||
| native_transport.send_traces(@traces) | ||
| end | ||
| end |
There was a problem hiding this comment.
Minor: I might be worth failing explicitly here (or at least making this optional only with an environment variable) so that if someone sets this up wrong, by default they get an error, and they need to opt-in to "yes I'm ok with skipping this", otherwise CI might be broken and we won't notice.
| queue = Queue.new | ||
|
|
||
| 4.times do | ||
| Thread.new do | ||
| loop do | ||
| client = queue.pop |
There was a problem hiding this comment.
We discussed this can be simplified since every report is blocking so we only need to attend to one report at a time.
| agent_settings = Struct.new(:url, :adapter, :ssl, :hostname, :port, :uds_path, :timeout_seconds) | ||
| .new(agent_url, :net_http, false, '127.0.0.1', @mock_agent.port, nil, 5) | ||
| Datadog::Tracing::Transport::HTTP.default( | ||
| agent_settings: agent_settings, | ||
| logger: Logger.new('/dev/null'), | ||
| ) | ||
| when :native | ||
| require 'datadog/tracing/transport/native' | ||
| agent_settings = Struct.new(:url).new(agent_url) | ||
| Datadog::Tracing::Transport::Native::Transport.new( | ||
| agent_settings: agent_settings, |
There was a problem hiding this comment.
Minor: We discussed this and weren't sure why we need a different agent_settings on each branch, since this should probably match in most cases?
| queue = Queue.new | ||
|
|
||
| 4.times do | ||
| Thread.new do |
There was a problem hiding this comment.
Minor: We can probably simplify this similarly to tracing_transport.rb
| # Use the native trace transport (Rust via C FFI) instead of | ||
| # the default pure-Ruby HTTP transport. | ||
| # | ||
| # The native transport delegates serialization, stats | ||
| # computation, and HTTP sending to the Rust data pipeline. | ||
| # | ||
| # This option is recommended for internal use only. | ||
| # | ||
| # @default `false` | ||
| # @return [Boolean] | ||
| option :native_transport do |o| | ||
| o.default false | ||
| o.type :bool | ||
| end |
There was a problem hiding this comment.
Minor: We discussed here that in the future we might add an environment variable for this, but not for now, we're still missing a few other things
| # Update statistics from the response | ||
| responses.each { |response| update_stats_from_response!(response) } |
There was a problem hiding this comment.
Minor: Noticed that in the original transport we also have something refreshing the Datadog.health_metrics -- might be worth checking if we still care about updating that or not.
| rescue => e | ||
| logger.debug { "Native transport error: #{e.class.name} #{e.message}" } | ||
| update_stats_from_exception!(e) | ||
| [InternalErrorResponse.new(e)] | ||
| end |
There was a problem hiding this comment.
So it's not clear from the original code what it does if there's an error WITH multiple traces.
It looks like it might return a mix of responses and errors, not just the one error.
Also, it's not clear if exceptions are the only things that should go in InternalErrorResponse, or if this also mixes in actual errors from the server, because I guess the old transport might be turning actual errors from server/connection into exceptions... So the semantics here might not match what we had before.
| def tracer_version_string | ||
| defined?(Datadog::VERSION::STRING) ? Datadog::VERSION::STRING : 'unknown' | ||
| end |
There was a problem hiding this comment.
I'm pretty sure many other things would fail if this didn't exist, so I'm not sure we should handle this -- if it's broken, let if fail with a bang and we'll spot it very quickly in CI.
| # Response for internal errors (exceptions raised before reaching | ||
| # the native transport). | ||
| class InternalErrorResponse | ||
| attr_reader :error | ||
|
|
||
| def initialize(error) | ||
| @error = error | ||
| end | ||
|
|
||
| def ok?; false; end | ||
| def internal_error?; true; end | ||
| def server_error?; false; end | ||
| def client_error?; false; end | ||
| def not_found?; false; end | ||
| def unsupported?; false; end | ||
| def payload; nil; end | ||
| def trace_count; 0; end | ||
| def service_rates; nil; end | ||
|
|
||
| def inspect | ||
| "#<#{self.class} error=#{error.inspect}>" | ||
| end |
There was a problem hiding this comment.
Maybe we should unify this with the existing InternalErrorResponse? Unless there's a really good reason to use this one.
| def build_writer(settings, agent_settings, options = settings.tracing.writer_options) | ||
| if (writer = settings.tracing.writer) | ||
| return writer | ||
| end | ||
|
|
||
| if settings.tracing.native_transport | ||
| transport = build_native_transport(agent_settings) | ||
| if transport | ||
| options = options.merge(transport: transport) | ||
| end | ||
| end | ||
|
|
||
| Tracing::Writer.new(agent_settings: agent_settings, **options) | ||
| end | ||
|
|
||
| def build_native_transport(agent_settings) | ||
| require_relative 'transport/native' | ||
|
|
||
| unless Transport::Native.supported? | ||
| Datadog.logger.warn( | ||
| "Native transport requested but not available: #{Transport::Native::UNSUPPORTED_REASON}. " \ | ||
| 'Falling back to default HTTP transport.' | ||
| ) | ||
| return nil | ||
| end | ||
|
|
||
| Transport::Native::Transport.new( | ||
| agent_settings: agent_settings, | ||
| logger: Datadog.logger | ||
| ) | ||
| end |
There was a problem hiding this comment.
We discussed maybe we can extract the building of the old transport here so this code is "symmetric" we have if native create native else create the ruby one.
d5fa393 to
bd141e3
Compare
f9675b4 to
cb933d2
Compare
|
Introduce `Datadog::Tracing::Transport::Native::Transport`, a drop-in replacement for the HTTP transport that delegates to the native trace exporter (Rust via C FFI). - Include `Transport::Statistics` module for stat tracking, calling `update_stats_from_response!` on each response and `update_stats_from_exception!` on rescue - Use `Core::Environment::Ext` constants for language metadata instead of hardcoded strings - `InternalErrorResponse` covers Ruby-side exceptions before reaching the native layer Also adds RBS signature and Steepfile ignore entry.
Add a `native_transport` boolean setting under `tracing` that selects the native trace transport (Rust via C FFI) instead of the default pure-Ruby HTTP transport. When enabled, `build_writer` injects a `Transport::Native::Transport` instance into the `Writer` via the `:transport` option. If the native extension is unavailable, it logs a warning and falls back to the default HTTP transport gracefully. The setting defaults to `false` and is marked as internal-only for now.
Standalone benchmark using `benchmark-ips` that sends 10 traces
(5 spans each, with meta and metrics) to a mock TCP agent via both
the native (Rust) and HTTP (Ruby) transports.
Supports `VALIDATE_BENCHMARK=true` for quick validation in test
suites, and full timed runs for proper measurement.
Usage:
bundle exec ruby benchmarks/tracing_transport.rb
Verify that span data survives the full native transport path:
Ruby Span -> C extension -> Rust serialization -> msgpack -> HTTP -> mock agent
A `CapturingMockAgent` runs in a forked process with threaded
request handling, captures trace payloads via a pipe, and the test
deserializes the msgpack to verify field values.
Tests cover:
- Scalar fields (name, service, resource, type, span_id, parent_id, error)
- String tags (meta) round-trip
- Numeric metrics round-trip
- 64-bit and 128-bit trace IDs
- Multiple spans in one trace chunk
- Multiple trace chunks in one payload
Compare native vs HTTP transport through the full tracing pipeline:
`Datadog::Tracing.trace` -> span creation -> trace flush -> transport.
Uses `SyncWriter` (via `test_mode.async = false`) so each iteration
completes a full round-trip synchronously. Configures the tracer
once per transport mode, not per iteration.
Usage:
bundle exec ruby benchmarks/tracing_transport_e2e.rb
The `Writer` after_send callback calls `response.service_rates` to update the priority sampler with agent-provided sampling rates. Without this method, every successful flush would raise a `NoMethodError` silently swallowed by the event publisher. - Add `service_rates` to the pure-Ruby `Response` class, which parses the JSON `payload` and extracts `rate_by_service` - Add `service_rates` returning `nil` to `InternalErrorResponse` - Add RBS signatures for both
cb933d2 to
66f6fa8
Compare
What does this PR do?
FUP to #5690 (stacked PR).
Wire the native trace exporter C extension from #5690 into the tracer as a usable transport:
Transport::Native::Transport: Ruby class implementing#send_traces/#stats, delegating to the C extension. IncludesTransport::Statisticsfor stat tracking.tracing.native_transportsetting: opt-in boolean (defaultfalse, internal-only) that injects the native transport intoWriter; falls back to HTTP with a warning when the native extension is unavailable.send_tracesisolation) and end-to-end (Datadog::Tracing.tracewithSyncWriter).Motivation:
#5690 provides the C extension but nothing calls it. This PR makes it usable by plugging it into the existing
Writermachinery, with a configuration knob to opt in.Change log entry
None (internal-only setting, off by default).
Additional Notes:
Mock agents in benchmarks and conformance tests run in forked processes with threaded request handling to avoid GVL contention bias.
AI was used to accelerate implementation; all code was reviewed and understood.
How to test the change?
33 additional RSpec examples under
spec/datadog/tracing/transport/native/:Benchmarks: