Skip to content
This repository was archived by the owner on Feb 11, 2026. It is now read-only.

Azure-Samples/agentic-ivr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agentic IVR - Realtime Conversations with Azure OpenAI

Warning

This project is intended as a sample for learning purposes only

This repository showcases an Interactive Voice Response (IVR) system built with the Azure OpenAI Realtime API. It offers directory services through natural language voice interactions and integrates with Twilio for real-world phone call capabilities.

Callers can request phone numbers for businesses in natural language. The AI Assistant (nicknamed "Tellie") will respond with spoken answers in real-time

Quickstart

Please view the Quick Start Guide for guidance on setting up this project locally.

Technologies Used

Backend Technologies

  • Python 3.11 - Core application runtime
  • FastAPI - Modern web framework for API endpoints
  • Semantic Kernel - Microsoft's AI orchestration framework
  • Twilio WebSocket - Real-time telephony integration
  • Docker - Containerization and deployment

Frontend Technologies

  • HTML5/CSS3 - Web interface for testing and diagnostics
  • JavaScript (ES6 Modules) - Client-side audio processing and WebSocket handling
  • Web Audio API - Browser-based audio capture and playback
  • MediaRecorder API - Audio recording capabilities

Audio Processing Libraries

  • NumPy - Numerical computing for audio manipulation
  • Resampy - High-quality audio resampling
  • audioop - Audio format conversion (μ-law ↔ linear PCM)

Key Features

  • Real-time Voice Interaction: Bidirectional audio streaming between callers and AI
  • Directory Services: Lookup phone numbers for Australian businesses
  • Multi-format Audio Support: Handles both Twilio (8kHz μ-law) and browser (24kHz PCM) formats
  • Diagnostic Monitoring: Real-time logging and debugging capabilities
  • Session Management: Unique session tracking for each call
  • TwiML Integration: Seamless Twilio webhook handling
  • Telemetry: OpenTelemetry integration for observability

System Architecture

%%{init: {"theme": "base", "themeVariables": {"primaryColor": "#4a90e2", "primaryTextColor": "#333", "primaryBorderColor": "#333", "lineColor": "#666", "clusterBkg": "#ffffff", "clusterBorder": "#333"}, "flowchart": {"curve": "basis"}, "themeCSS": ".edge-thickness-normal {stroke-width: 3px;} .edge-pattern-solid {stroke-width: 3px;}"}}%%
graph TB
    subgraph "External Services"
        A[Twilio Phone Network]
        B[Azure OpenAI Realtime API]
    end

    subgraph "Application Layer"
        D[FastAPI Server<br/>main.py]
        E[WebSocket Handlers<br/>websocket_handlers.py]
        F[Plugins<br/>business_assistant_plugin.py<br/>call_operations_plugin.py<br/>directory_search_plugin.py]
        G[Directory Data<br/>directory_data.py]
        T[Telemetry<br/>telemetry.py]
    end

    subgraph "Client Interfaces"
        H[Twilio WebSocket<br/>8kHz μ-law Audio]
        I[Browser Client<br/>24kHz PCM Audio]
    end

    subgraph "Audio Processing Pipeline"
        K[Audio Processing<br/>Resampling & Format Conversion]
        M[Semantic Kernel<br/>AI Orchestration]
    end

    A -->|TwiML Webhook| D
    A <-->|WebSocket Stream| H
    H <-->|Audio Data| E
    I <-->|WebSocket| E

    E <-->|Processed Audio| K
    K <-->|AI Processing| M
    M <-->|Realtime API| B

    F -->|Directory Lookup| G
    M -->|Function Calls| F
    D -->|Telemetry| T
    E -->|Telemetry| T
    F -->|Telemetry| T

    %% Simplified styling with fewer colors
    classDef external fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#333
    classDef application fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#333
    classDef client fill:#e8f5e8,stroke:#388e3c,stroke-width:2px,color:#333
    classDef processing fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#333

    %% Apply classes to nodes
    class A,B external
    class D,E,F,G,T application
    class H,I client
    class K,M processing
Loading

How It Works

Call Flow

  1. Incoming Call: Twilio receives a phone call and sends a webhook to /twilio-webhook
  2. TwiML Response: Server responds with TwiML instructions to connect the call to a WebSocket stream
  3. WebSocket Connection: Twilio establishes a WebSocket connection to /ws endpoint
  4. Session Creation: Server generates unique session ID and initializes diagnostic logging
  5. Audio Streaming: Bidirectional audio streaming begins between caller and Azure OpenAI
  6. AI Processing: Azure OpenAI processes speech and generates responses using Semantic Kernel plugins
  7. Function Execution: AI can call directory lookup or call operation functions as needed
  8. Response Delivery: AI-generated speech is converted and streamed back to the caller

Audio Processing Pipeline

  1. Inbound Audio: Twilio sends 8kHz μ-law encoded audio chunks
  2. Format Conversion: μ-law → 16-bit linear PCM
  3. Resampling: 8kHz → 24kHz for Azure OpenAI compatibility
  4. AI Processing: Azure OpenAI processes audio and generates responses
  5. Response Processing: 24kHz PCM from Azure → 8kHz μ-law for Twilio
  6. Delivery: Converted audio sent back to caller via Twilio WebSocket

Plugin System

The AI assistant has access to three main plugins:

  • BusinessAssistantPlugin: Handles user requests and orchestrates other plugins.
  • DirectorySearchPlugin: Searches the directory for business phone numbers.
  • CallOperationsPlugin: Handles call termination and transfer.

API Endpoints

  • GET / - Serves the web testing interface
  • POST /twilio-webhook - Handles Twilio call webhooks
  • GET /ext-svc-sample/directory - Directory service API
  • GET /health - Health check endpoint
  • WebSocket /ws - Main audio streaming endpoint for clients using query parameter authentication.
  • WebSocket /ws/auth/{token} - WebSocket endpoint for clients providing token via URL path (e.g., Twilio).
  • WebSocket /ws/diagnostic - Real-time diagnostic logging.

Testing

Once deployed:

  1. Open http://localhost:8000 in your browser
  2. Click "Start Call" to simulate a Twilio WebSocket connection
  3. Use your microphone to interact with Tellie
  4. Monitor the diagnostic log for real-time system information

Project Structure

twilio-aoai-realtime/
├── main.py                 # FastAPI application and routing
├── config.py              # Configuration and environment variables
├── websocket_handlers.py  # WebSocket audio processing logic
├── telemetry.py           # OpenTelemetry integration
├── directory_data.py      # Business directory data
├── prompts/               # System prompts for the AI assistant
│   ├── system_prompt_backend_agent.txt
│   └── system_prompt_realtime.txt
├── plugins/               # Semantic Kernel plugins
│   ├── business_assistant_plugin.py
│   ├── call_operations_plugin.py
│   └── directory_search_plugin.py
├── models/                # Data models
│   └── data_models.py
├── requirements.txt       # Python dependencies
├── Dockerfile             # Container configuration
├── docker-compose.yml     # Multi-container deployment
├── index.html             # Web testing interface
├── css/                   # Styling for the web interface
│   ├── styles.css
│   └── dialog.css
├── images/                # Images for the web interface
│   ├── favicon.ico
│   ├── microsoft-logo.svg
│   └── ms-favicon.ico
└── js/                    # Frontend JavaScript modules
    ├── main.js            # Main application logic
    ├── audio.js           # Audio processing utilities
    ├── websocket.js       # WebSocket communication
    ├── diagnostics.js     # Diagnostic logging
    ├── ui.js              # User interface management
    ├── utils.js           # Utility functions
    ├── config.js          # Frontend configuration
    ├── badge.js           # Badge component for UI
    ├── version.js         # Version information
    └── audio.js           # Audio processing

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors