Skip to content

Expose Answering Machine / Voicemail Detection (AMD) signal for outbound SIP calls (create_sip_participant / wait_until_answered) #6125

@areebkhan-tech

Description

@areebkhan-tech

Feature Type

Would make my life easier

Feature Description

For outbound SIP calls, create_sip_participant(..., wait_until_answered=True) returns successfully when the callee answers — but a carrier voicemail answers the call with a SIP 200 OK exactly like a human does. So the agent treats voicemail pickup as a successful human answer: it plays the welcome message, starts billing/recording, and "talks" to the voicemail greeting.

I captured the trunk-side SIP for both a human-answered call and a voicemail call (Twilio Elastic SIP Trunking). They are structurally identical:

Both: INVITE → 100 → 183 Session Progress (early media) → 200 OK
No Diversion / History-Info / Reason headers on the voicemail call (the carrier/Twilio flattens them)
Server: Twilio on both
The only SIP-level difference is time-to-answer (human ≈ 6s here), which is an unreliable heuristic
So there is no signaling-level way to distinguish them; the distinction only exists in the media (voicemail = long continuous greeting immediately on answer; human = short utterance + pause).

Workarounds / Alternatives

Currently doing audio-level AMD inside the agent: after the SIP participant's
track is subscribed, we inspect the first few seconds of their audio using the
existing VAD/STT — a long continuous greeting starting immediately on answer is
treated as voicemail, a short utterance + pause as human. The welcome message
and egress recording are gated on that classification. It works but is heuristic
and every developer has to re-implement it.

Other alternatives considered and ruled out:

  • Time-to-answer heuristic (INVITE → 200 OK): unreliable — voicemail can pick up
    as fast as a human.
  • Carrier SIP Diversion / History-Info headers: not exposed through Twilio
    Elastic SIP Trunking.
  • Twilio Programmable Voice MachineDetection (AnsweredBy): not available for
    SIP-trunk calls.

Additional Context

Carrier diversion headers are unavailable through Twilio Elastic SIP Trunking, and Twilio's own MachineDetection is a Programmable Voice feature that doesn't apply to SIP-trunk calls — which is why an SDK-side option would be valuable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions