Warning
This project is intended as a sample for learning purposes only
This repository showcases an Interactive Voice Response (IVR) system built with the Azure OpenAI Realtime API. It offers directory services through natural language voice interactions and integrates with Twilio for real-world phone call capabilities.
Callers can request phone numbers for businesses in natural language. The AI Assistant (nicknamed "Tellie") will respond with spoken answers in real-time
Please view the Quick Start Guide for guidance on setting up this project locally.
- Python 3.11 - Core application runtime
- FastAPI - Modern web framework for API endpoints
- Semantic Kernel - Microsoft's AI orchestration framework
- Twilio WebSocket - Real-time telephony integration
- Docker - Containerization and deployment
- HTML5/CSS3 - Web interface for testing and diagnostics
- JavaScript (ES6 Modules) - Client-side audio processing and WebSocket handling
- Web Audio API - Browser-based audio capture and playback
- MediaRecorder API - Audio recording capabilities
- NumPy - Numerical computing for audio manipulation
- Resampy - High-quality audio resampling
- audioop - Audio format conversion (μ-law ↔ linear PCM)
- Real-time Voice Interaction: Bidirectional audio streaming between callers and AI
- Directory Services: Lookup phone numbers for Australian businesses
- Multi-format Audio Support: Handles both Twilio (8kHz μ-law) and browser (24kHz PCM) formats
- Diagnostic Monitoring: Real-time logging and debugging capabilities
- Session Management: Unique session tracking for each call
- TwiML Integration: Seamless Twilio webhook handling
- Telemetry: OpenTelemetry integration for observability
%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#4a90e2", "primaryTextColor": "#333", "primaryBorderColor": "#333", "lineColor": "#666", "clusterBkg": "#ffffff", "clusterBorder": "#333"}, "flowchart": {"curve": "basis"}, "themeCSS": ".edge-thickness-normal {stroke-width: 3px;} .edge-pattern-solid {stroke-width: 3px;}"}}%%
graph TB
subgraph "External Services"
A[Twilio Phone Network]
B[Azure OpenAI Realtime API]
end
subgraph "Application Layer"
D[FastAPI Server<br/>main.py]
E[WebSocket Handlers<br/>websocket_handlers.py]
F[Plugins<br/>business_assistant_plugin.py<br/>call_operations_plugin.py<br/>directory_search_plugin.py]
G[Directory Data<br/>directory_data.py]
T[Telemetry<br/>telemetry.py]
end
subgraph "Client Interfaces"
H[Twilio WebSocket<br/>8kHz μ-law Audio]
I[Browser Client<br/>24kHz PCM Audio]
end
subgraph "Audio Processing Pipeline"
K[Audio Processing<br/>Resampling & Format Conversion]
M[Semantic Kernel<br/>AI Orchestration]
end
A -->|TwiML Webhook| D
A <-->|WebSocket Stream| H
H <-->|Audio Data| E
I <-->|WebSocket| E
E <-->|Processed Audio| K
K <-->|AI Processing| M
M <-->|Realtime API| B
F -->|Directory Lookup| G
M -->|Function Calls| F
D -->|Telemetry| T
E -->|Telemetry| T
F -->|Telemetry| T
%% Simplified styling with fewer colors
classDef external fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#333
classDef application fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#333
classDef client fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#333
classDef processing fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#333
%% Apply classes to nodes
class A,B external
class D,E,F,G,T application
class H,I client
class K,M processing
- Incoming Call: Twilio receives a phone call and sends a webhook to
/twilio-webhook - TwiML Response: Server responds with TwiML instructions to connect the call to a WebSocket stream
- WebSocket Connection: Twilio establishes a WebSocket connection to
/wsendpoint - Session Creation: Server generates unique session ID and initializes diagnostic logging
- Audio Streaming: Bidirectional audio streaming begins between caller and Azure OpenAI
- AI Processing: Azure OpenAI processes speech and generates responses using Semantic Kernel plugins
- Function Execution: AI can call directory lookup or call operation functions as needed
- Response Delivery: AI-generated speech is converted and streamed back to the caller
- Inbound Audio: Twilio sends 8kHz μ-law encoded audio chunks
- Format Conversion: μ-law → 16-bit linear PCM
- Resampling: 8kHz → 24kHz for Azure OpenAI compatibility
- AI Processing: Azure OpenAI processes audio and generates responses
- Response Processing: 24kHz PCM from Azure → 8kHz μ-law for Twilio
- Delivery: Converted audio sent back to caller via Twilio WebSocket
The AI assistant has access to three main plugins:
- BusinessAssistantPlugin: Handles user requests and orchestrates other plugins.
- DirectorySearchPlugin: Searches the directory for business phone numbers.
- CallOperationsPlugin: Handles call termination and transfer.
GET /- Serves the web testing interfacePOST /twilio-webhook- Handles Twilio call webhooksGET /ext-svc-sample/directory- Directory service APIGET /health- Health check endpointWebSocket /ws- Main audio streaming endpoint for clients using query parameter authentication.WebSocket /ws/auth/{token}- WebSocket endpoint for clients providing token via URL path (e.g., Twilio).WebSocket /ws/diagnostic- Real-time diagnostic logging.
Once deployed:
- Open
http://localhost:8000in your browser - Click "Start Call" to simulate a Twilio WebSocket connection
- Use your microphone to interact with Tellie
- Monitor the diagnostic log for real-time system information
twilio-aoai-realtime/
├── main.py # FastAPI application and routing
├── config.py # Configuration and environment variables
├── websocket_handlers.py # WebSocket audio processing logic
├── telemetry.py # OpenTelemetry integration
├── directory_data.py # Business directory data
├── prompts/ # System prompts for the AI assistant
│ ├── system_prompt_backend_agent.txt
│ └── system_prompt_realtime.txt
├── plugins/ # Semantic Kernel plugins
│ ├── business_assistant_plugin.py
│ ├── call_operations_plugin.py
│ └── directory_search_plugin.py
├── models/ # Data models
│ └── data_models.py
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── docker-compose.yml # Multi-container deployment
├── index.html # Web testing interface
├── css/ # Styling for the web interface
│ ├── styles.css
│ └── dialog.css
├── images/ # Images for the web interface
│ ├── favicon.ico
│ ├── microsoft-logo.svg
│ └── ms-favicon.ico
└── js/ # Frontend JavaScript modules
├── main.js # Main application logic
├── audio.js # Audio processing utilities
├── websocket.js # WebSocket communication
├── diagnostics.js # Diagnostic logging
├── ui.js # User interface management
├── utils.js # Utility functions
├── config.js # Frontend configuration
├── badge.js # Badge component for UI
├── version.js # Version information
└── audio.js # Audio processing