Skip to content

message_not_in_streaming_state after undocumented timeouts hit #1859

@beaugunderson

Description

@beaugunderson

Reproducible in:

$ pip freeze | grep slack
slack-sdk==3.41.0

$ python --version
Python 3.14.2

Reproduces in production on Python 3.14 on Linux.

The Slack SDK version

slack-sdk==3.41.0

Python runtime version

Python 3.14.2

OS info

macOS 26.4.1 (Darwin 25.4.0) for local repro; Linux (Aptible) in production. Behavior is server-side, so the host OS shouldn't matter.

Steps to reproduce:

Any long-running AI-agent bot that calls chat.appendStream over a window of more than a few minutes reliably reproduces this. Minimal case:

from slack_sdk import WebClient
import time

client = WebClient(token=...)
stream = client.chat_stream(
    channel=CHANNEL,
    thread_ts=THREAD_TS,
    recipient_team_id=TEAM_ID,
    recipient_user_id=USER_ID,
)
for i in range(60):
    stream.append(markdown_text=f"tick {i}\n")
    time.sleep(15)  # 15 minutes total
stream.stop()
  1. Call chat_stream(...) and capture the ChatStream helper.
  2. Call stream.append(...) periodically (even with active traffic — this is not just an idle issue).
  3. At some point before the loop finishes, stream.append() raises SlackApiError with {'ok': False, 'error': 'message_not_in_streaming_state'}, and stream.stop() then raises the same error, leaving the message frozen in Slack's UI as a gray "Something went wrong" pill indefinitely.

Expected result:

One or more of (ranked by usefulness):

  1. Document the empirical TTL / idle behavior on the chat.startStream / chat.appendStream / chat.stopStream reference pages and/or the ChatStream module docstring. Even a "streams may be closed by the server after an undocumented window; recommend rotating every N minutes" note would save teams days of debugging.
  2. Expose ChatStream.started_at / ChatStream.age (or equivalent) so callers can implement proactive rotation without subclassing.
  3. Add an optional auto_rotate_after: timedelta | None = None parameter that stops the current stream and starts a fresh one under the hood before the server kills it.
  4. Classify message_not_in_streaming_state as a distinct, typed exception (e.g. StreamExpiredError) rather than a generic SlackApiError, so apps can branch on it without string-matching the error code.

Actual result:

chat.appendStream raises:

slack_sdk.errors.SlackApiError: The request to the Slack API failed.
(url: https://slack.com/api/chat.appendStream)
The server responded with: {'ok': False, 'error': 'message_not_in_streaming_state'}

chat.stopStream called on the same message afterward returns the same error, so the orphaned streaming message cannot be cleanly closed — Slack's UI then renders it as a permanent "Something went wrong" pill.

Real traceback from our production logs (Slack bot investigating a user question):

Traceback (most recent call last):
  File "/app/main.py", line 920, in _send_keepalive
    self._append_with_retry(chunks)
  File "/app/main.py", line 943, in _append_with_retry
    self._client.chat_appendStream(
        channel=self._channel,
        ts=ts,
        chunks=chunks,
    )
  File "/app/.venv/lib/python3.14/site-packages/slack_sdk/web/client.py", line 2654, in chat_appendStream
    return self.api_call("chat.appendStream", json=kwargs)
  ...
slack_sdk.errors.SlackApiError: The request to the Slack API failed.
(url: https://slack.com/api/chat.appendStream)
The server responded with: {'ok': False, 'error': 'message_not_in_streaming_state'}

This is a widespread production issue. Several AI-agent bots are independently working around it in production:

  • AuraHQ-ai/aura#421 — observed "undocumented ~30s idle timeout", 38 occurrences in 6 days. Workaround: 20s keepalive during tool calls.
  • AuraHQ-ai/aura#177 — unhandled message_not_in_streaming_state crashed response pipeline.
  • AuraHQ-ai/aura#702 — ~96 errors in 2 weeks, 67% silent data loss with a naïve post-message fallback. Workaround: capture the frozen stream's ts and chat.update it with the finalized content.
  • hrygo/hotplex-legacy#237 — observed ~5 minute wallclock TTL even with active traffic. Workaround: proactive rotation at ~4 min.
  • hrygo/hotplex-legacy#336 — P1 report after a 70+ turn session.
  • getsentry/junior#202 — Sentry's internal Slack bot tracking the same error.

Observed empirical timeouts span roughly 30 seconds of idle to ~5 minutes wallclock across these reports — they diverge enough that both an idle timer and a hard lifetime appear to be in play on the server side.

I've checked: the chat.appendStream, chat.startStream, chat.stopStream and chat_stream module docs, plus the changelog from Oct 2025 through Apr 2026 — none document a stream lifetime or error recovery path.

I understand the underlying timeout is a platform/server-side behavior; I'm filing here because (a) the SDK is where the ChatStream helper lives, (b) this is the right place to at least document the behavior and surface a rotation primitive, and (c) slackapi/python-slack-sdk is the repo teams actually find when debugging message_not_in_streaming_state. If this needs to go to platform feedback instead, happy to cross-post — let me know.

Requirements

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions