Feature Type
Would make my life easier
Feature Description
For outbound SIP calls, create_sip_participant(..., wait_until_answered=True) returns successfully when the callee answers — but a carrier voicemail answers the call with a SIP 200 OK exactly like a human does. So the agent treats voicemail pickup as a successful human answer: it plays the welcome message, starts billing/recording, and "talks" to the voicemail greeting.
I captured the trunk-side SIP for both a human-answered call and a voicemail call (Twilio Elastic SIP Trunking). They are structurally identical:
Both: INVITE → 100 → 183 Session Progress (early media) → 200 OK
No Diversion / History-Info / Reason headers on the voicemail call (the carrier/Twilio flattens them)
Server: Twilio on both
The only SIP-level difference is time-to-answer (human ≈ 6s here), which is an unreliable heuristic
So there is no signaling-level way to distinguish them; the distinction only exists in the media (voicemail = long continuous greeting immediately on answer; human = short utterance + pause).
Workarounds / Alternatives
Currently doing audio-level AMD inside the agent: after the SIP participant's
track is subscribed, we inspect the first few seconds of their audio using the
existing VAD/STT — a long continuous greeting starting immediately on answer is
treated as voicemail, a short utterance + pause as human. The welcome message
and egress recording are gated on that classification. It works but is heuristic
and every developer has to re-implement it.
Other alternatives considered and ruled out:
- Time-to-answer heuristic (INVITE → 200 OK): unreliable — voicemail can pick up
as fast as a human.
- Carrier SIP Diversion / History-Info headers: not exposed through Twilio
Elastic SIP Trunking.
- Twilio Programmable Voice MachineDetection (AnsweredBy): not available for
SIP-trunk calls.
Additional Context
Carrier diversion headers are unavailable through Twilio Elastic SIP Trunking, and Twilio's own MachineDetection is a Programmable Voice feature that doesn't apply to SIP-trunk calls — which is why an SDK-side option would be valuable.
Feature Type
Would make my life easier
Feature Description
For outbound SIP calls, create_sip_participant(..., wait_until_answered=True) returns successfully when the callee answers — but a carrier voicemail answers the call with a SIP 200 OK exactly like a human does. So the agent treats voicemail pickup as a successful human answer: it plays the welcome message, starts billing/recording, and "talks" to the voicemail greeting.
I captured the trunk-side SIP for both a human-answered call and a voicemail call (Twilio Elastic SIP Trunking). They are structurally identical:
Both: INVITE → 100 → 183 Session Progress (early media) → 200 OK
No Diversion / History-Info / Reason headers on the voicemail call (the carrier/Twilio flattens them)
Server: Twilio on both
The only SIP-level difference is time-to-answer (human ≈ 6s here), which is an unreliable heuristic
So there is no signaling-level way to distinguish them; the distinction only exists in the media (voicemail = long continuous greeting immediately on answer; human = short utterance + pause).
Workarounds / Alternatives
Currently doing audio-level AMD inside the agent: after the SIP participant's
track is subscribed, we inspect the first few seconds of their audio using the
existing VAD/STT — a long continuous greeting starting immediately on answer is
treated as voicemail, a short utterance + pause as human. The welcome message
and egress recording are gated on that classification. It works but is heuristic
and every developer has to re-implement it.
Other alternatives considered and ruled out:
as fast as a human.
Elastic SIP Trunking.
SIP-trunk calls.
Additional Context
Carrier diversion headers are unavailable through Twilio Elastic SIP Trunking, and Twilio's own MachineDetection is a Programmable Voice feature that doesn't apply to SIP-trunk calls — which is why an SDK-side option would be valuable.