chore(sentry): lower polling rates#271
Conversation
As we're getting high loads on Sentry spans, we reduce the sampling rate from 1.0 to 0.1 for production and 1.0 on staging. The reason for this is to primarily lower the amount of traces sent. However, we keep it 100% on staging to detect bugs and as traffic is lower there
Coverage Report
File CoverageNo changed files found. |
Jipperism
left a comment
There was a problem hiding this comment.
I wonder if this is permanently necessary. Resolving those 429's would make us return back to the old situation where we have a trace for every error. We could also adjust the sampling rate based on the error type, so apply the sampling rate only on 429 which is a very typical "once-its-there-you-get-many" error?
|
This would happen again on the next 429 and the sampling rate was at the level than we almost hit our monthly budget in 1 week. As far as I see, sentry now samples 100% of the calls to/by our API and Indexer and this would mitigate "once-its-there-you-get-many" issues |
|
@bitbeckers I'm having a look but it seems to be a uniform sample rate. Doesn't that mean that it will randomly select 10% of the errors to the sentry backend. So if we get 10,000 429's, it would mean that other errors have a pretty high chance of getting lost in the noise? I think we'd like to send 10% of 429's, and otherwise 100% of the errors. |
|
@Jipperism this is tracing, not error monitoring. As fas as I can see, the tracing sample influences how many events will be selected at given random selection rate (ref: tracing) The monitoring of error is not specified because it defaults to sampling 100% of the errors (ref: error monitoring) For reference, in the last 14 days we've had 10.6M spans and 36K errors |
Jipperism
left a comment
There was a problem hiding this comment.
Taking the differences between traces and error logs into account, I think it's good to go.
As we're getting high loads on Sentry spans, we reduce the sampling rate from 1.0 to 0.1 for production and 1.0 on staging.
The reason for this is to primarily lower the amount of traces sent. However, we keep it 100% on staging to detect bugs and as traffic is lower there
Closing #270