Skip to content

Fix: StaleEntryError at ChannelBroker SMS timeout#2403

Open
matiasgarciaisaia wants to merge 1 commit into
mainfrom
fix/channelbroker-timeout-stale-entry
Open

Fix: StaleEntryError at ChannelBroker SMS timeout#2403
matiasgarciaisaia wants to merge 1 commit into
mainfrom
fix/channelbroker-timeout-stale-entry

Conversation

@matiasgarciaisaia

Copy link
Copy Markdown
Member

When the ChannelBroker tries to timeout an expired contact attempt, updating the respondent stats made the Respondent model itself stale, so the ChannelBroker was crashing.

The fix uses the updated model to properly update the respondent.

Related to the changes in #2380

@matiasgarciaisaia

Copy link
Copy Markdown
Member Author

Sample log (redacted for privacy):

2026-06-08T22:15:37.604343648Z 2026-06-08T22:15:37.604Z request_id=GLc65homU260bXoAAADE [info] GET /
2026-06-08T22:15:37.604969885Z 2026-06-08T22:15:37.604Z request_id=GLc65homU260bXoAAADE [debug] Processing with AskWeb.PageController.index/2
2026-06-08T22:15:37.605106574Z   Parameters: %{"path" => []}
2026-06-08T22:15:37.605116775Z   Pipelines: [:browser]
2026-06-08T22:15:37.605926417Z 2026-06-08T22:15:37.605Z request_id=GLc65homU260bXoAAADE [info] Sent 200 in 969µs
2026-06-08T22:15:48.232746690Z 2026-06-08T22:15:48.232Z [debug] QUERY OK source="respondent_stats" db=13.4ms idle=1795.8ms
2026-06-08T22:15:48.232824727Z SELECT r0.`survey_id`, r0.`questionnaire_id`, r0.`state`, r0.`disposition`, r0.`quota_bucket_id`, r0.`mode`, r0.`count` FROM `respondent_stats` AS r0 []
2026-06-08T22:15:48.500052012Z 2026-06-08T22:15:48.499Z [info] ChannelBroker.handle_info[collect_garbage]: channel_type="sms" config=%{gc_active_idle_minutes: 5, gc_interval_minutes: 2, shut_down_minutes: 30}
2026-06-08T22:15:48.518546070Z 2026-06-08T22:15:48.518Z [debug] QUERY OK source="channel_broker_queue" db=17.0ms queue=1.0ms idle=1076.9ms
2026-06-08T22:15:48.518603933Z SELECT c0.`respondent_id` FROM `channel_broker_queue` AS c0 WHERE ((c0.`channel_id` = ?) AND NOT (c0.`last_contact` IS NULL)) [39]
2026-06-08T22:15:48.520435664Z 2026-06-08T22:15:48.520Z [debug] QUERY OK source="respondents" db=0.8ms queue=0.7ms idle=1095.3ms
2026-06-08T22:15:48.520470697Z SELECT r0.`id` FROM `respondents` AS r0 WHERE (r0.`id` IN (?,?,?,?,?,?,?) AND (r0.`state` = 'active')) [4398655, 4505733, 4525077, 4525714, 4526670, 4528500, 4530064]
2026-06-08T22:15:48.566046053Z 2026-06-08T22:15:48.565Z [debug] QUERY OK source="channel_broker_queue" db=44.8ms queue=0.3ms idle=1097.2ms
2026-06-08T22:15:48.566087977Z DELETE c0.* FROM `channel_broker_queue` AS c0 WHERE (((c0.`channel_id` = ?) AND NOT (c0.`respondent_id` IN (?,?,?,?,?,?,?))) AND NOT (c0.`last_contact` IS NULL)) [39, 4398655, 4505733, 4525077, 4525714, 4526670, 4528500, 4530064]
2026-06-08T22:15:48.568016552Z 2026-06-08T22:15:48.567Z [debug] QUERY OK source="channels" db=1.3ms idle=1143.0ms
2026-06-08T22:15:48.568043264Z SELECT c0.`id`, c0.`name`, c0.`type`, c0.`provider`, c0.`base_url`, c0.`settings`, c0.`patterns`, c0.`paused`, c0.`user_id`, c0.`inserted_at`, c0.`updated_at` FROM `channels` AS c0 WHERE (c0.`id` = ?) [39]
2026-06-08T22:15:48.569345370Z 2026-06-08T22:15:48.569Z [debug] QUERY OK source="oauth_tokens" db=0.8ms idle=1144.9ms
2026-06-08T22:15:48.569363874Z SELECT o0.`id`, o0.`provider`, o0.`base_url`, o0.`access_token`, o0.`expires_at`, o0.`user_id`, o0.`inserted_at`, o0.`updated_at` FROM `oauth_tokens` AS o0 WHERE (((o0.`provider` = ?) AND (o0.`base_url` = ?)) AND (o0.`user_id` = ?)) ["nuntium", "https://nuntium.example.com", 5]
2026-06-08T22:15:48.653061151Z 2026-06-08T22:15:48.652Z [debug] QUERY OK db=6.5ms queue=0.4ms idle=1222.3ms
2026-06-08T22:15:48.653111394Z UPDATE `oauth_tokens` SET `access_token` = ?, `expires_at` = ?, `updated_at` = ? WHERE `id` = ? [%{"access_token" => "DDDJJJJJEEEEFFFAAAAAA", "expires_at" => 1780957847, "other_params" => %{}, "refresh_token" => "ABCDEFGHIJKL", "token_type" => "Bearer"}, ~U[2026-06-08 22:30:47Z], ~U[2026-06-08 22:15:48Z], 20]
2026-06-08T22:15:48.683682368Z 2026-06-08T22:15:48.683Z [debug] QUERY OK source="channel_broker_queue" db=29.0ms queue=0.8ms idle=1230.0ms
2026-06-08T22:15:48.683732786Z SELECT c0.`channel_id`, c0.`respondent_id`, c0.`channel_state` FROM `channel_broker_queue` AS c0 WHERE ((c0.`channel_id` = ?) AND (c0.`last_contact` < ?)) [39, ~U[2026-06-08 22:10:48Z]]
2026-06-08T22:15:48.876450456Z 2026-06-08T22:15:48.876Z [debug] QUERY OK source="channel_broker_queue" db=0.6ms queue=0.4ms idle=1451.8ms
2026-06-08T22:15:48.876499236Z SELECT TRUE FROM `channel_broker_queue` AS c0 WHERE (((c0.`channel_id` = ?) AND (c0.`last_contact` IS NULL)) AND ((c0.`not_before` IS NULL) OR (c0.`not_before` <= ?))) LIMIT 1 [39, ~U[2026-06-08 22:16:48Z]]
2026-06-08T22:15:48.894897495Z 2026-06-08T22:15:48.894Z [debug] QUERY OK source="channel_broker_queue" db=17.8ms idle=1453.1ms
2026-06-08T22:15:48.894943386Z SELECT CAST(coalesce(sum(coalesce(c0.`contacts`, 0)), 0) AS unsigned) FROM `channel_broker_queue` AS c0 WHERE ((c0.`channel_id` = ?) AND NOT (c0.`last_contact` IS NULL)) [39]
2026-06-08T22:15:48.944845665Z 2026-06-08T22:15:48.944Z [debug] QUERY OK source="channel_broker_queue" db=48.7ms queue=0.7ms idle=1471.6ms
2026-06-08T22:15:48.944888348Z SELECT c0.`channel_id`, c0.`respondent_id`, c0.`queued_at`, c0.`priority`, c0.`size`, c0.`token`, c0.`not_before`, c0.`not_after`, c0.`reply`, c0.`last_contact`, c0.`contacts`, c0.`channel_state` FROM `channel_broker_queue` AS c0 WHERE (((c0.`channel_id` = ?) AND (c0.`last_contact` IS NULL)) AND ((c0.`not_before` <= ?) OR (c0.`not_before` IS NULL))) ORDER BY c0.`priority`, c0.`queued_at` LIMIT 1 [39, ~U[2026-06-08 22:16:48Z]]
2026-06-08T22:15:48.952243361Z 2026-06-08T22:15:48.951Z [debug] QUERY OK db=6.5ms queue=0.4ms idle=1521.6ms
2026-06-08T22:15:48.952295630Z UPDATE `channel_broker_queue` SET `contacts` = ?, `last_contact` = ? WHERE `channel_id` = ? AND `respondent_id` = ? [1, ~U[2026-06-08 22:15:48Z], 39, 4505454]
2026-06-08T22:15:48.953714593Z 2026-06-08T22:15:48.953Z [debug] QUERY OK source="respondents" db=0.5ms queue=0.5ms idle=720.0ms
2026-06-08T22:15:48.953732593Z SELECT r0.`id`, r0.`phone_number`, r0.`sanitized_phone_number`, r0.`canonical_phone_number`, r0.`hashed_number`, r0.`section_order`, r0.`state`, r0.`disposition`, r0.`completed_at`, r0.`timeout_at`, r0.`session`, r0.`mode`, r0.`effective_modes`, r0.`language`, r0.`questionnaire_id`, r0.`survey_id`, r0.`respondent_group_id`, r0.`quota_bucket_id`, r0.`retry_stat_id`, r0.`lock_version`, r0.`stats`, r0.`user_stopped`, r0.`inserted_at`, r0.`updated_at` FROM `respondents` AS r0 WHERE (r0.`id` = ?) [4505454]
2026-06-08T22:15:48.961580664Z 2026-06-08T22:15:48.961Z [debug] QUERY OK source="channel_broker_queue" db=6.8ms queue=0.4ms idle=531.0ms
2026-06-08T22:15:48.961605933Z DELETE c0.* FROM `channel_broker_queue` AS c0 WHERE ((c0.`channel_id` = ?) AND (c0.`respondent_id` = ?)) [39, 4505454]
2026-06-08T22:15:48.963814070Z 2026-06-08T22:15:48.962Z [debug] QUERY OK source="channels" db=0.8ms idle=538.4ms
2026-06-08T22:15:48.963832966Z SELECT c0.`id`, c0.`name`, c0.`type`, c0.`provider`, c0.`base_url`, c0.`settings`, c0.`patterns`, c0.`paused`, c0.`user_id`, c0.`inserted_at`, c0.`updated_at` FROM `channels` AS c0 WHERE (c0.`id` = ?) [39]
2026-06-08T22:15:48.963839507Z 2026-06-08T22:15:48.963Z [debug] QUERY OK source="channels" db=0.3ms idle=539.6ms
2026-06-08T22:15:48.963844489Z SELECT c0.`id`, c0.`name`, c0.`type`, c0.`provider`, c0.`base_url`, c0.`settings`, c0.`patterns`, c0.`paused`, c0.`user_id`, c0.`inserted_at`, c0.`updated_at` FROM `channels` AS c0 WHERE (c0.`id` = ?) [39]
2026-06-08T22:15:48.966404569Z 2026-06-08T22:15:48.966Z [debug] QUERY OK source="questionnaires" db=2.3ms queue=0.4ms idle=540.3ms
2026-06-08T22:15:48.966423507Z SELECT q0.`id`, q0.`name`, q0.`description`, q0.`modes`, q0.`steps`, q0.`quota_completed_steps`, q0.`settings`, q0.`languages`, q0.`default_language`, q0.`valid`, q0.`deleted`, q0.`partial_relevant_config`, q0.`archived`, q0.`snapshot_of`, q0.`project_id`, q0.`inserted_at`, q0.`updated_at` FROM `questionnaires` AS q0 WHERE (q0.`id` = ?) [550]
2026-06-08T22:15:48.978190537Z 2026-06-08T22:15:48.977Z [debug] QUERY OK source="respondents" db=0.8ms idle=553.1ms
2026-06-08T22:15:48.978228828Z SELECT r0.`id`, r0.`phone_number`, r0.`sanitized_phone_number`, r0.`canonical_phone_number`, r0.`hashed_number`, r0.`section_order`, r0.`state`, r0.`disposition`, r0.`completed_at`, r0.`timeout_at`, r0.`session`, r0.`mode`, r0.`effective_modes`, r0.`language`, r0.`questionnaire_id`, r0.`survey_id`, r0.`respondent_group_id`, r0.`quota_bucket_id`, r0.`retry_stat_id`, r0.`lock_version`, r0.`stats`, r0.`user_stopped`, r0.`inserted_at`, r0.`updated_at` FROM `respondents` AS r0 WHERE (r0.`id` = ?) [4505454]
2026-06-08T22:15:48.980900076Z 2026-06-08T22:15:48.980Z [debug] QUERY OK source="respondents" db=2.3ms idle=554.9ms
2026-06-08T22:15:48.980927526Z SELECT r0.`id`, r0.`phone_number`, r0.`sanitized_phone_number`, r0.`canonical_phone_number`, r0.`hashed_number`, r0.`section_order`, r0.`state`, r0.`disposition`, r0.`completed_at`, r0.`timeout_at`, r0.`session`, r0.`mode`, r0.`effective_modes`, r0.`language`, r0.`questionnaire_id`, r0.`survey_id`, r0.`respondent_group_id`, r0.`quota_bucket_id`, r0.`retry_stat_id`, r0.`lock_version`, r0.`stats`, r0.`user_stopped`, r0.`inserted_at`, r0.`updated_at` FROM `respondents` AS r0 WHERE (r0.`id` = ?) [4505454]
2026-06-08T22:15:48.981368978Z 2026-06-08T22:15:48.981Z [debug] QUERY OK db=0.2ms idle=557.6ms
2026-06-08T22:15:48.981429497Z begin []
2026-06-08T22:15:48.990429492Z 2026-06-08T22:15:48.989Z [debug] QUERY OK db=1.2ms
2026-06-08T22:15:48.990458127Z UPDATE `respondents` SET `lock_version` = ?, `stats` = ?, `updated_at` = ? WHERE `id` = ? AND `lock_version` = ? [17, %Ask.Stats{attempts: %{"sms" => 2}, call_durations: %{}, pending_call: false, total_call_time: nil, total_call_time_seconds: nil, total_received_sms: 0, total_sent_sms: 3}, ~U[2026-06-08 22:15:48Z], 4505454, 16]
2026-06-08T22:15:48.998085161Z 2026-06-08T22:15:48.995Z [debug] QUERY OK db=6.2ms
2026-06-08T22:15:48.998105162Z commit []
2026-06-08T22:15:48.998238131Z 2026-06-08T22:15:48.998Z [debug] QUERY OK source="surveys" db=0.6ms queue=0.4ms idle=573.7ms
2026-06-08T22:15:48.998248729Z SELECT s0.`id`, s0.`name`, s0.`description`, s0.`mode`, s0.`state`, s0.`locked`, s0.`exit_code`, s0.`exit_message`, s0.`cutoff`, s0.`count_partial_results`, s0.`schedule`, s0.`started_at`, s0.`first_window_started_at`, s0.`ended_at`, s0.`last_window_ends_at`, s0.`sms_retry_configuration`, s0.`ivr_retry_configuration`, s0.`mobileweb_retry_configuration`, s0.`fallback_delay`, s0.`quota_vars`, s0.`comparisons`, s0.`simulation`, s0.`floip_package_id`, s0.`incentives_enabled`, s0.`generates_panel_survey`, s0.`project_id`, s0.`folder_id`, s0.`panel_survey_id`, s0.`inserted_at`, s0.`updated_at`, s0.`id` FROM `surveys` AS s0 WHERE (s0.`id` = ?) [494]
2026-06-08T22:15:49.004585254Z 2026-06-08T22:15:49.004Z [debug] QUERY OK db=0.2ms idle=486.0ms
2026-06-08T22:15:49.004647252Z begin []
2026-06-08T22:15:49.006068858Z 2026-06-08T22:15:49.005Z [debug] QUERY OK db=0.4ms
2026-06-08T22:15:49.006096839Z UPDATE `respondents` SET `lock_version` = ?, `session` = ?, `timeout_at` = ?, `updated_at` = ? WHERE `id` = ? AND `lock_version` = ? [17, %{count_partial_results: true, current_delay: 1560, current_mode: %{channel_id: 39, mode: "sms", retries: []}, fallback_delay: 1560, fallback_mode: %{channel_id: 39, mode: "mobileweb", retries: [1560]}, flow: %{current_step: [0, 0], in_quota_completed_steps: false, language: "fil", mode: "sms", questionnaire_id: 550, retries: 0, section_order: [0, 1, 5, 3, 6, 2, 4, 7]}, respondent_id: 4505454, schedule: "{\"timezone\":\"Asia/Manila\",\"start_time\":\"08:00:00\",\"start_date\":null,\"end_time\":\"20:00:00\",\"end_date\":null,\"day_of_week\":[\"sun\",\"mon\",\"tue\",\"wed\",\"thu\",\"fri\",\"sat\"],\"blocked_days\":[]}", token: "9a01e828-b09b-41cd-89f1-74a799c11308"}, ~U[2026-06-10 01:59:48Z], ~U[2026-06-08 22:15:49Z], 4505454, 16]
2026-06-08T22:15:49.007144296Z 2026-06-08T22:15:49.006Z [debug] QUERY OK db=0.1ms
2026-06-08T22:15:49.007159723Z rollback []
2026-06-08T22:15:49.010010794Z 2026-06-08T22:15:49.007Z [error] GenServer {:channel_broker_registry, 39} terminating
2026-06-08T22:15:49.010031449Z ** (Ecto.StaleEntryError) attempted to update a stale struct:
2026-06-08T22:15:49.010037471Z
2026-06-08T22:15:49.010043929Z %Ask.Respondent{questionnaire_id: 550, session: %{"count_partial_results" => true, "current_delay" => 1560, "current_mode" => %{"channel_id" => 39, "mode" => "sms", "retries" => []}, "fallback_delay" => 1560, "fallback_mode" => %{"channel_id" => 39, "mode" => "mobileweb", "retries" => [1560]}, "flow" => %{"current_step" => [0, 0], "in_quota_completed_steps" => false, "language" => "fil", "mode" => "sms", "questionnaire_id" => 550, "retries" => 0, "section_order" => [0, 1, 5, 3, 6, 2, 4, 7]}, "respondent_id" => 4505454, "schedule" => "{\"timezone\":\"Asia/Manila\",\"start_time\":\"08:00:00\",\"start_date\":null,\"end_time\":\"20:00:00\",\"end_date\":null,\"day_of_week\":[\"sun\",\"mon\",\"tue\",\"wed\",\"thu\",\"fri\",\"sat\"],\"blocked_days\":[]}", "token" => "d73a699c-131a-4ec2-82d0-2d1fef6f54fa"}, retry_stat: #Ecto.Association.NotLoaded<association :retry_stat is not loaded>, effective_modes: ["sms"], state: :active, sanitized_phone_number: "123456789012", user_stopped: false, inserted_at: ~U[2026-06-03 20:52:16Z], disposition: :contacted, survey_log_entries: #Ecto.Association.NotLoaded<association :survey_log_entries is not loaded>, quota_bucket_id: nil, completed_at: nil, survey: #Ecto.Association.NotLoaded<association :survey is not loaded>, survey_id: 494, id: 4505454, section_order: [0, 1, 5, 3, 6, 2, 4, 7], phone_number: "00123456789012", experiment_name: nil, mode: ["sms", "mobileweb"], questionnaire: #Ecto.Association.NotLoaded<association :questionnaire is not loaded>, responses: #Ecto.Association.NotLoaded<association :responses is not loaded>, language: "fil", timeout_at: ~U[2026-06-10 00:00:00Z], lock_version: 16, respondent_group: #Ecto.Association.NotLoaded<association :respondent_group is not loaded>, updated_at: ~U[2026-06-08 12:03:26Z], canonical_phone_number: "00123456789012", __meta__: #Ecto.Schema.Metadata<:loaded, "respondents">, quota_bucket: #Ecto.Association.NotLoaded<association :quota_bucket is not loaded>, stats: %Ask.Stats{attempts: %{"sms" => 2}, call_durations: %{}, pending_call: false, total_call_time: nil, total_call_time_seconds: nil, total_received_sms: 0, total_sent_sms: 2}, retry_stat_id: 6165699, hashed_number: "r1234567890ab", respondent_group_id: 501}
2026-06-08T22:15:49.010095203Z
2026-06-08T22:15:49.010099219Z     (ecto 3.9.4) lib/ecto/repo/schema.ex:773: Ecto.Repo.Schema.apply/4
2026-06-08T22:15:49.010103976Z     (ecto 3.9.4) lib/ecto/repo/schema.ex:459: anonymous fn/15 in Ecto.Repo.Schema.do_update/4
2026-06-08T22:15:49.010108379Z     (ecto 3.9.4) lib/ecto/repo/schema.ex:995: anonymous fn/3 in Ecto.Repo.Schema.wrap_in_transaction/6
2026-06-08T22:15:49.010113308Z     (ecto_sql 3.9.2) lib/ecto/adapters/sql.ex:1203: anonymous fn/3 in Ecto.Adapters.SQL.checkout_or_transaction/4
2026-06-08T22:15:49.010117856Z     (db_connection 2.4.3) lib/db_connection.ex:1611: DBConnection.run_transaction/4
2026-06-08T22:15:49.010122766Z     (ecto 3.9.4) lib/ecto/repo/schema.ex:278: Ecto.Repo.Schema.update!/4
2026-06-08T22:15:49.010127084Z     (ask 0.31.0) lib/ask/runtime/survey.ex:245: Ask.Runtime.Survey.contact_attempt_expired/2
2026-06-08T22:15:49.010130835Z     (ask 0.31.0) lib/ask/runtime/channel_broker.ex:425: anonymous fn/4 in Ask.Runtime.ChannelBroker.contact_or_mark_as_expired/4
2026-06-08T22:15:49.010135094Z Last message: {:collect_garbage}
2026-06-08T22:15:49.010139405Z State: %Ask.Runtime.ChannelBrokerState{capacity: 10, channel_id: 39, channel_type: "sms", config: %{gc_active_idle_minutes: 5, gc_interval_minutes: 2, shut_down_minutes: 30}, runtime_channel: %Ask.Runtime.NuntiumChannel{base_url: "https://nuntium.example.com", name: "nuntium-sample-channel", oauth_token: %OAuth2.AccessToken{access_token: "AAAAAAAAAADDDDDDDDDDDDDDD", expires_at: 1780956964, other_params: %{}, refresh_token: "ABCDEFGHIJKL", token_type: "Bearer"}, settings: %{"capacity" => 10, "nuntium_account" => "nuntium-sample-account", "nuntium_channel" => "sample-channel"}}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant