Skip to content

[PECOBLR-1383] Add statement execution hooks for telemetry collection#321

Merged
samikshya-db merged 7 commits intomainfrom
stack/PECOBLR-1383-telemetry-execution-hooks
Apr 6, 2026
Merged

[PECOBLR-1383] Add statement execution hooks for telemetry collection#321
samikshya-db merged 7 commits intomainfrom
stack/PECOBLR-1383-telemetry-execution-hooks

Conversation

@samikshya-db
Copy link
Copy Markdown
Collaborator

Summary

This stacked PR builds on #320 and adds statement execution hooks to complete end-to-end telemetry collection.

Stack: Part 3 of 3


Changes

Exported Methods for Driver Integration

telemetry/interceptor.go

  • ✅ Exported BeforeExecute() - starts metric tracking for a statement
  • ✅ Exported AfterExecute() - records metric with timing and error info
  • ✅ Exported AddTag() - adds tags to current metric context
  • ✅ Exported CompleteStatement() - marks statement complete and flushes

Statement Execution Hooks

connection.go

  • ✅ Added hooks to QueryContext():

    • Calls BeforeExecute() with statement ID from operation handle GUID
    • Uses defer to call AfterExecute() and CompleteStatement()
  • ✅ Added hooks to ExecContext():

    • Calls BeforeExecute() with statement ID
    • Proper error handling (includes stagingErr)
    • Uses defer to call AfterExecute() and CompleteStatement()

Documentation

telemetry/DESIGN.md

  • ✅ Updated Phase 6 to mark as completed
  • ✅ Added statement execution hooks to Phase 7 checklist

Integration Flow

Connection.QueryContext()
    ↓
BeforeExecute(statementID) → creates metricContext with startTime
    ↓
[Statement Execution]
    ↓
AfterExecute(err) → records metric with latency and error
    ↓
CompleteStatement(statementID, failed) → flushes aggregated metrics

Testing

All tests passing

  • ✅ 99 telemetry tests (2.018s)
  • ✅ All driver tests (58.576s)
  • ✅ No breaking changes
  • ✅ Telemetry properly disabled when not configured

End-to-End Telemetry

With this PR, the telemetry system is fully functional end-to-end:

  1. Collection - Metrics collected from QueryContext/ExecContext
  2. Aggregation - Statement-level aggregation with batching
  3. Circuit Breaker - Protection against failing endpoints
  4. Export - HTTP POST with retry and exponential backoff
  5. Feature Flags - Server-side control with 5-level priority
  6. Resource Management - Per-host clients with reference counting

Related Issues


Checklist

  • Export interceptor methods for driver use
  • Add hooks to QueryContext
  • Add hooks to ExecContext
  • Update DESIGN.md checklist
  • All tests passing
  • No breaking changes

@samikshya-db samikshya-db force-pushed the stack/PECOBLR-1381-1382-telemetry-phase6-7 branch from db32fa3 to f388244 Compare February 5, 2026 07:33
@samikshya-db samikshya-db force-pushed the stack/PECOBLR-1383-telemetry-execution-hooks branch from f1c4641 to a5ed499 Compare February 5, 2026 07:33
@samikshya-db samikshya-db force-pushed the stack/PECOBLR-1383-telemetry-execution-hooks branch from a5ed499 to 36e8f99 Compare March 18, 2026 13:03
@samikshya-db samikshya-db force-pushed the stack/PECOBLR-1383-telemetry-execution-hooks branch 2 times, most recently from ef71fd9 to 3b9923d Compare April 2, 2026 11:13
Base automatically changed from stack/PECOBLR-1381-1382-telemetry-phase6-7 to main April 2, 2026 11:19
This commit completes the telemetry implementation by adding hooks to
QueryContext and ExecContext methods to collect actual metrics.

Changes:
- Export BeforeExecute(), AfterExecute(), CompleteStatement() methods
  in telemetry.Interceptor for use by driver package
- Add telemetry hooks to connection.QueryContext():
  - Call BeforeExecute() with statement ID from operation handle GUID
  - Use defer to call AfterExecute() and CompleteStatement()
- Add telemetry hooks to connection.ExecContext():
  - Call BeforeExecute() with statement ID from operation handle GUID
  - Use defer to call AfterExecute() and CompleteStatement()
  - Handle both err and stagingErr for proper error reporting
- Update DESIGN.md:
  - Mark Phase 6 as completed (all checklist items)
  - Add statement execution hooks to Phase 7 checklist

Testing:
- All 99 telemetry tests passing
- All driver tests passing (58.576s)
- No breaking changes to existing functionality

This enables end-to-end telemetry collection from statement execution
through aggregation and export to the Databricks telemetry service.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@samikshya-db samikshya-db force-pushed the stack/PECOBLR-1383-telemetry-execution-hooks branch from 3b9923d to 8fe174d Compare April 2, 2026 11:36
These functions/types are now used by the exported BeforeExecute,
AfterExecute, and CompleteStatement methods wired into connection.go,
so the unused suppression directives are no longer needed.

Co-authored-by: samikshya-chand_data
Copy link
Copy Markdown
Collaborator

@vikrantpuppala vikrantpuppala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add some tests, thanks

connection.go Outdated
var statementID string
if c.telemetry != nil && exStmtResp != nil && exStmtResp.OperationHandle != nil && exStmtResp.OperationHandle.OperationId != nil {
statementID = client.SprintGuid(exStmtResp.OperationHandle.OperationId.GUID)
ctx = c.telemetry.BeforeExecute(ctx, statementID)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be before runQuery?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should write some correctness tests. how do we know the values telemetry is sending are even correct (so that we avoid bugs like these)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @vikrantpuppala

The follow up PR #322 already covers a lot of mock based tests on telemetry. And, it also addresses the runQuery issue in QueryContext, I will add more of these correctness tests in #322 PR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix the fact that ctx = c.telemetry.BeforeExecute(ctx, statementID) needs to be called before runQuery?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally deleted my previous comment inplace of editing it :

Both ExecContext and QueryContext now capture the complete query execution time in telemetry
for both PRs.

}
c.telemetry.AfterExecute(ctx, finalErr)
c.telemetry.CompleteStatement(ctx, statementID, finalErr != nil)
}()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the CloseOperation call fails below, that error is logged but never reflected in telemetry so we're missing capturing some errors in telemetry. Not sure if that's intended?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixing this.


case "error":
// Check if terminal error
if metric.errorType != "" && isTerminalError(&simpleError{msg: metric.errorType}) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using simpleError here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most light-weight string mapping of the error in this (potentially) hot-path. Let me know if you have any specific suggestion though.

samikshya-db added a commit that referenced this pull request Apr 6, 2026
… CloseOperation error tracking

This commit addresses two key review comments from @vikrantpuppala on PR #321:

1. **ExecContext timing fix**: Capture execution start time BEFORE running query
   - Now captures `executeStart := time.Now()` before `runQuery()` call
   - Uses `BeforeExecuteWithTime()` with pre-captured timestamp
   - Matches the pattern already implemented in QueryContext
   - Ensures telemetry accurately measures actual query execution time

2. **CloseOperation error tracking**: Capture cleanup errors in telemetry
   - Added `closeOpErr` variable to track CloseOperation failures
   - Includes CloseOperation errors in telemetry's deferred function
   - Provides observability for resource cleanup issues
   - Operation still returns success to caller (cleanup is best-effort)

These changes ensure telemetry captures the complete statement lifecycle,
including both execution timing and cleanup operations, without impacting
the caller's error handling semantics.

Co-authored-by: Isaac
This addresses @vikrantpuppala's review comment:
"if the CloseOperation call fails below, that error is logged but never
reflected in telemetry so we're missing capturing some errors in telemetry"

Changes:
- Added closeOpErr variable to capture CloseOperation failures
- Include CloseOperation errors in telemetry's deferred function
- Provides observability for resource cleanup issues
- Operation still returns success to caller (cleanup is best-effort)

Note: The timing fix ("shouldn't this be before runQuery?") will be addressed
in the follow-up PR once BeforeExecuteWithTime infrastructure is available.
samikshya-db added a commit that referenced this pull request Apr 6, 2026
… CloseOperation error tracking

This commit addresses two key review comments from @vikrantpuppala on PR #321:

1. **ExecContext timing fix**: Capture execution start time BEFORE running query
   - Now captures `executeStart := time.Now()` before `runQuery()` call
   - Uses `BeforeExecuteWithTime()` with pre-captured timestamp
   - Matches the pattern already implemented in QueryContext
   - Ensures telemetry accurately measures actual query execution time

2. **CloseOperation error tracking**: Capture cleanup errors in telemetry
   - Added `closeOpErr` variable to track CloseOperation failures
   - Includes CloseOperation errors in telemetry's deferred function
   - Provides observability for resource cleanup issues
   - Operation still returns success to caller (cleanup is best-effort)

These changes ensure telemetry captures the complete statement lifecycle,
including both execution timing and cleanup operations, without impacting
the caller's error handling semantics.

Co-authored-by: Isaac
…cution

This addresses the second part of @vikrantpuppala's review comment:
"shouldn't this be before runQuery?"

Changes:
- Capture executeStart = time.Now() BEFORE calling runQuery()
- Use BeforeExecuteWithTime() with the pre-captured timestamp
- Ensures telemetry measures actual query execution time accurately

Without this fix, telemetry would miss ~100-1000μs of execution time
(the time between query start and getting the operation handle).

Now ExecContext matches the pattern already implemented in QueryContext.
- Add sessionID field to metricContext struct
- Update BeforeExecute to accept sessionID parameter
- Add BeforeExecuteWithTime method for custom start times
- Update connection.go to pass sessionID in BeforeExecute call

This enables proper session tracking in telemetry and allows
capturing accurate execution times by providing a custom start time.
Capture execution start time before runQuery and use BeforeExecuteWithTime
to ensure telemetry accurately reflects actual query execution time.

This completes the timing fix for both ExecContext and QueryContext.
@samikshya-db samikshya-db merged commit 6dd935f into main Apr 6, 2026
2 of 3 checks passed
@samikshya-db samikshya-db deleted the stack/PECOBLR-1383-telemetry-execution-hooks branch April 6, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants