Skip to content

Add distributed tracing instrumentation across all services #670

Description

@Smartdevs17

Context


SubTrackr spans mobile, backend, ML service, smart contracts, and SDK layers. Diagnosing performance issues and errors across this distributed architecture currently requires manually correlating logs from each service—there is no unified trace context propagating across service boundaries.
\

\

Current Limitation

\

  • No trace context propagation across service calls
    \
  • Request latency breakdown by service is unavailable
    \
  • Error correlation across services requires manual log scraping
    \
  • Mobile-to-backend trace context is not propagated
    \
  • ML model inference latency cannot be attributed to specific user requests
    \

\

Expected Outcome


End-to-end distributed tracing with W3C Trace Context propagation across mobile app, backend API, ML service, and webhook callbacks, enabling flame graph visualization of request flows and latency attribution to individual service hops.
\

\

Acceptance Criteria

\

  • W3C Trace Context headers propagated across all HTTP/RPC calls
    \
  • Backend API services emit spans for database queries, external calls, and business logic
    \
  • ML service emits spans for model loading, inference, and feature computation
    \
  • Mobile app emits traces for API calls and local operations
    \
  • Webhook delivery includes trace context for end-to-end correlation
    \
  • Trace sampling strategy configurable (rate-based, error-based, endpoint-based)
    \
  • Integration with OpenTelemetry collector for trace export
    \
  • Flame graph visualization in Grafana/observability dashboard
    \
  • No regression in p95 API latency (<2% overhead from instrumentation)
    \

\

Technical Scope

\

  • Files: backend/services/shared/monitoring.ts, ml-service/main.py, src/services/network/apiClient.ts, backend/services/notification/webhook.ts, infra/ (otel-collector config)
    \
  • APIs: OpenTelemetry JS SDK, OpenTelemetry Python SDK, W3C Trace Context, Zipkin/Jaeger exporters
    \
  • Edge cases: Trace context header size limits, sampling decision consistency across services, dropped trace contexts on retry, privacy (PII in span attributes)

Metadata

Metadata

Labels

200-points200 point issueStellar WaveIssues in the Stellar wave programdrips-waveIssues in the Drips Wave programhighHigh complexity issue

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions