Skip to content

Add native-backed transport#5699

Open
lloeki wants to merge 6 commits into
lloeki/native-trace-exporter-improvedfrom
lloeki/native-transport
Open

Add native-backed transport#5699
lloeki wants to merge 6 commits into
lloeki/native-trace-exporter-improvedfrom
lloeki/native-transport

Conversation

@lloeki
Copy link
Copy Markdown
Member

@lloeki lloeki commented May 7, 2026

What does this PR do?

FUP to #5690 (stacked PR).

Wire the native trace exporter C extension from #5690 into the tracer as a usable transport:

  • Transport::Native::Transport: Ruby class implementing #send_traces / #stats, delegating to the C extension. Includes Transport::Statistics for stat tracking.
  • tracing.native_transport setting: opt-in boolean (default false, internal-only) that injects the native transport into Writer; falls back to HTTP with a warning when the native extension is unavailable.
  • Wire-level conformance tests: verify span data round-trips correctly through the full path (Ruby Span -> C -> Rust -> msgpack -> HTTP -> mock agent -> msgpack decode).
  • Benchmarks: transport-level (send_traces isolation) and end-to-end (Datadog::Tracing.trace with SyncWriter).

Motivation:

#5690 provides the C extension but nothing calls it. This PR makes it usable by plugging it into the existing Writer machinery, with a configuration knob to opt in.

Change log entry

None (internal-only setting, off by default).

Additional Notes:

Mock agents in benchmarks and conformance tests run in forked processes with threaded request handling to avoid GVL contention bias.

AI was used to accelerate implementation; all code was reviewed and understood.

How to test the change?

33 additional RSpec examples under spec/datadog/tracing/transport/native/:

bundle exec rake clean compile
bundle exec rspec spec/datadog/tracing/transport/native/

Benchmarks:

bundle exec ruby benchmarks/tracing_transport.rb
bundle exec ruby benchmarks/tracing_transport_e2e.rb

@dd-octo-sts dd-octo-sts Bot added the tracing label May 7, 2026
@lloeki
Copy link
Copy Markdown
Member Author

lloeki commented May 7, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bec06a7347

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread lib/datadog/tracing/transport/native.rb
@lloeki lloeki force-pushed the lloeki/native-transport branch from bec06a7 to c775e50 Compare May 7, 2026 09:28
@lloeki lloeki force-pushed the lloeki/native-trace-exporter branch from d8e8638 to 6a597db Compare May 7, 2026 09:28
@lloeki lloeki force-pushed the lloeki/native-transport branch 3 times, most recently from b50d2bd to f9675b4 Compare May 13, 2026 14:43
@lloeki lloeki marked this pull request as ready for review May 26, 2026 09:27
@lloeki lloeki requested review from a team as code owners May 26, 2026 09:27
@lloeki lloeki requested review from mabdinur and marcotc and removed request for a team May 26, 2026 09:27
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9675b4249

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# Each trace segment becomes one inner array (one trace chunk).
chunks = traces.map(&:spans)

responses = @exporter._native_send_traces(chunks)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve rich span fields in native transport

When native_transport is enabled, any trace containing OpenTelemetry span links/events or metastruct data (for example AI Guard payloads) loses those fields here because the transport hands bare Span objects to _native_send_traces; the native converter only copies scalar fields, meta, and metrics (repo-wide rg finds no span_events, span_links, or metastruct handling under ext/libdatadog_api). The existing HTTP serializer writes span_events, meta_struct, and span_links, so this is not a drop-in replacement for those traces and silently corrupts what reaches the agent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave a pass, haven't finished yet!

Comment on lines +37 to +41
if native_transport
x.report("send_traces - Native") do
native_transport.send_traces(@traces)
end
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I might be worth failing explicitly here (or at least making this optional only with an environment variable) so that if someone sets this up wrong, by default they get an error, and they need to opt-in to "yes I'm ok with skipping this", otherwise CI might be broken and we won't notice.

Comment on lines +124 to +129
queue = Queue.new

4.times do
Thread.new do
loop do
client = queue.pop
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this can be simplified since every report is blocking so we only need to attend to one report at a time.

Comment on lines +83 to +93
agent_settings = Struct.new(:url, :adapter, :ssl, :hostname, :port, :uds_path, :timeout_seconds)
.new(agent_url, :net_http, false, '127.0.0.1', @mock_agent.port, nil, 5)
Datadog::Tracing::Transport::HTTP.default(
agent_settings: agent_settings,
logger: Logger.new('/dev/null'),
)
when :native
require 'datadog/tracing/transport/native'
agent_settings = Struct.new(:url).new(agent_url)
Datadog::Tracing::Transport::Native::Transport.new(
agent_settings: agent_settings,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We discussed this and weren't sure why we need a different agent_settings on each branch, since this should probably match in most cases?

Comment on lines +111 to +114
queue = Queue.new

4.times do
Thread.new do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We can probably simplify this similarly to tracing_transport.rb

Comment on lines +468 to +481
# Use the native trace transport (Rust via C FFI) instead of
# the default pure-Ruby HTTP transport.
#
# The native transport delegates serialization, stats
# computation, and HTTP sending to the Rust data pipeline.
#
# This option is recommended for internal use only.
#
# @default `false`
# @return [Boolean]
option :native_transport do |o|
o.default false
o.type :bool
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We discussed here that in the future we might add an environment variable for this, but not for now, we're still missing a few other things

Comment on lines +101 to +102
# Update statistics from the response
responses.each { |response| update_stats_from_response!(response) }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Noticed that in the original transport we also have something refreshing the Datadog.health_metrics -- might be worth checking if we still care about updating that or not.

Comment on lines +105 to +109
rescue => e
logger.debug { "Native transport error: #{e.class.name} #{e.message}" }
update_stats_from_exception!(e)
[InternalErrorResponse.new(e)]
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not clear from the original code what it does if there's an error WITH multiple traces.

It looks like it might return a mix of responses and errors, not just the one error.

Also, it's not clear if exceptions are the only things that should go in InternalErrorResponse, or if this also mixes in actual errors from the server, because I guess the old transport might be turning actual errors from server/connection into exceptions... So the semantics here might not match what we had before.

Comment on lines +113 to +115
def tracer_version_string
defined?(Datadog::VERSION::STRING) ? Datadog::VERSION::STRING : 'unknown'
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure many other things would fail if this didn't exist, so I'm not sure we should handle this -- if it's broken, let if fail with a bang and we'll spot it very quickly in CI.

Comment on lines +118 to +139
# Response for internal errors (exceptions raised before reaching
# the native transport).
class InternalErrorResponse
attr_reader :error

def initialize(error)
@error = error
end

def ok?; false; end
def internal_error?; true; end
def server_error?; false; end
def client_error?; false; end
def not_found?; false; end
def unsupported?; false; end
def payload; nil; end
def trace_count; 0; end
def service_rates; nil; end

def inspect
"#<#{self.class} error=#{error.inspect}>"
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should unify this with the existing InternalErrorResponse? Unless there's a really good reason to use this one.

Comment on lines 102 to +132
def build_writer(settings, agent_settings, options = settings.tracing.writer_options)
if (writer = settings.tracing.writer)
return writer
end

if settings.tracing.native_transport
transport = build_native_transport(agent_settings)
if transport
options = options.merge(transport: transport)
end
end

Tracing::Writer.new(agent_settings: agent_settings, **options)
end

def build_native_transport(agent_settings)
require_relative 'transport/native'

unless Transport::Native.supported?
Datadog.logger.warn(
"Native transport requested but not available: #{Transport::Native::UNSUPPORTED_REASON}. " \
'Falling back to default HTTP transport.'
)
return nil
end

Transport::Native::Transport.new(
agent_settings: agent_settings,
logger: Datadog.logger
)
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed maybe we can extract the building of the old transport here so this code is "symmetric" we have if native create native else create the ruby one.

@lloeki lloeki force-pushed the lloeki/native-trace-exporter branch from d5fa393 to bd141e3 Compare May 28, 2026 11:51
@lloeki lloeki force-pushed the lloeki/native-transport branch from f9675b4 to cb933d2 Compare May 28, 2026 11:51
@datadog-official
Copy link
Copy Markdown

datadog-official Bot commented May 28, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 16 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-rb | install-dependencies: [2.6, amd64]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Compilation error in trace_exporter.c: unknown type name ‘ddog_TracerSpan’ and implicit declaration of function ‘ddog_tracer_span_free’.

DataDog/apm-reliability/dd-trace-rb | install-dependencies: [2.7, amd64]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Compilation error in trace_exporter.c: unknown type name 'ddog_TracerSpan'

DataDog/apm-reliability/dd-trace-rb | install-dependencies: [3.0, amd64]   View in Datadog   GitLab

🔧 Fix in code (Fix with Cursor). Compilation errors during gem installation. Error: unknown type name ‘ddog_TracerSpan’ and implicit declaration of function ‘ddog_tracer_span_free’.

View all 16 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 66f6fa8 | Docs | Datadog PR Page | Give us feedback!

lloeki added 6 commits May 29, 2026 13:51
Introduce `Datadog::Tracing::Transport::Native::Transport`, a drop-in
replacement for the HTTP transport that delegates to the native trace
exporter (Rust via C FFI).

- Include `Transport::Statistics` module for stat tracking, calling
  `update_stats_from_response!` on each response and
  `update_stats_from_exception!` on rescue
- Use `Core::Environment::Ext` constants for language metadata instead
  of hardcoded strings
- `InternalErrorResponse` covers Ruby-side exceptions before reaching
  the native layer

Also adds RBS signature and Steepfile ignore entry.
Add a `native_transport` boolean setting under `tracing` that selects
the native trace transport (Rust via C FFI) instead of the default
pure-Ruby HTTP transport.

When enabled, `build_writer` injects a `Transport::Native::Transport`
instance into the `Writer` via the `:transport` option. If the native
extension is unavailable, it logs a warning and falls back to the
default HTTP transport gracefully.

The setting defaults to `false` and is marked as internal-only for
now.
Standalone benchmark using `benchmark-ips` that sends 10 traces
(5 spans each, with meta and metrics) to a mock TCP agent via both
the native (Rust) and HTTP (Ruby) transports.

Supports `VALIDATE_BENCHMARK=true` for quick validation in test
suites, and full timed runs for proper measurement.

Usage:

    bundle exec ruby benchmarks/tracing_transport.rb
Verify that span data survives the full native transport path:

    Ruby Span -> C extension -> Rust serialization -> msgpack -> HTTP -> mock agent

A `CapturingMockAgent` runs in a forked process with threaded
request handling, captures trace payloads via a pipe, and the test
deserializes the msgpack to verify field values.

Tests cover:
- Scalar fields (name, service, resource, type, span_id, parent_id, error)
- String tags (meta) round-trip
- Numeric metrics round-trip
- 64-bit and 128-bit trace IDs
- Multiple spans in one trace chunk
- Multiple trace chunks in one payload
Compare native vs HTTP transport through the full tracing pipeline:
`Datadog::Tracing.trace` -> span creation -> trace flush -> transport.

Uses `SyncWriter` (via `test_mode.async = false`) so each iteration
completes a full round-trip synchronously. Configures the tracer
once per transport mode, not per iteration.

Usage:

    bundle exec ruby benchmarks/tracing_transport_e2e.rb
The `Writer` after_send callback calls `response.service_rates` to
update the priority sampler with agent-provided sampling rates.
Without this method, every successful flush would raise a
`NoMethodError` silently swallowed by the event publisher.

- Add `service_rates` to the pure-Ruby `Response` class, which
  parses the JSON `payload` and extracts `rate_by_service`
- Add `service_rates` returning `nil` to `InternalErrorResponse`
- Add RBS signatures for both
@lloeki lloeki force-pushed the lloeki/native-transport branch from cb933d2 to 66f6fa8 Compare May 29, 2026 12:02
@lloeki lloeki requested a review from a team as a code owner May 29, 2026 12:02
@lloeki lloeki changed the base branch from lloeki/native-trace-exporter to lloeki/native-trace-exporter-improved May 29, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants