OPAA is designed for organizations that need data sovereignty and control. Whether deploying on-premises in a data center, in private cloud infrastructure, or as a managed service, the same OPAA system adapts to different deployment models.
This feature describes how OPAA is deployed, configured, scaled, and operated across different infrastructure environments.
OPAA supports three deployment models:
- On-Premises β Complete control, on your infrastructure
- Private Cloud β Your cloud account, your control (AWS, Azure, GCP)
- Managed Service β Hosted by OPAA team, shared infrastructure (optional future)
All use the same codebase. Model choice made at deployment time.
βββββββββββββββββββββββββββββββββββββββ
β Organization Firewall / Proxy β
ββββββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββββ
β OPAA Kubernetes Cluster β
β (or Docker Compose) β
β β
β βββββββββββββββββββββββ β
β β Web UI Service β β
β β Chat Bot Services β β
β β API Server β β
β ββββββββββββ¬βββββββββββ β
β β β
β ββββββββββββΌβββββββββββ β
β β Orchestration β β
β β Service β β
β ββββββββββββ¬βββββββββββ β
β β β
β ββββββββββββΌβββββββββββ β
β β RAG Engine β β
β β LLM Integrations β β
β ββββββββββββ¬βββββββββββ β
β β β
β ββββββββββββΌβββββββββββ β
β β Vector Database β β
β β Cache Layer β β
β β Storage β β
β βββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββββββββββββββββ
β Data Sources (Confluence, Email, FS) β
β (may be outside firewall or inside) β
βββββββββββββββββββββββββββββββββββββββββ
Production-grade on-premises deployment.
Infrastructure:
- Load balancer for ingress
- Persistent volumes (local, NFS, block storage)
- Secrets management (etcd, HashiCorp Vault)
- Network policies for security
- Monitoring (Prometheus, ELK stack)
Deployment:
- Helm charts provided for easy installation
- Health checks, resource limits, auto-scaling configured
- Log aggregation pre-configured
Simpler deployment for smaller teams.
Services:
- opaa-app (main application: REST API, chat server, web UI)
- postgres (database for metadata)
- postgres-pgvector (vector storage with pgvector)
- redis (caching)
Example:
version: '3.8'
services:
postgres:
image: postgres:15-pgvector
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql
redis:
image: redis:7-alpine
opaa-app:
image: opaa:latest
ports:
- "8080:8080"
environment:
DATABASE_URL: postgres://...
REDIS_URL: redis://redis:6379
LLM_PROVIDER: ${LLM_PROVIDER}
LLM_API_KEY: ${LLM_API_KEY}
depends_on:
- postgres
- redis
volumes:
postgres_data:Deployment on VMs or physical servers:
- System packages (Python 3.11+, PostgreSQL, Redis)
- Systemd services for process management
- Manual health checks and restart logic
- Complex but full control
All configuration via environment variables (12-factor app):
# Database
DATABASE_URL=postgresql://user:pass@localhost/opaa
REDIS_URL=redis://localhost:6379
# LLM Configuration
LLM_PROVIDER=openai
LLM_API_KEY=${OPENAI_API_KEY}
LLM_API_BASE=https://api.openai.com/v1
LLM_MODEL=gpt-4
LLM_EMBEDDING_MODEL=text-embedding-3-small
# Vector Database
VECTOR_DB=pgvector # or elasticsearch, milvus
ELASTICSEARCH_HOST=localhost:9200 # if using ES
# Indexing
INDEXING_SCHEDULE=daily-2am
CONFLUENCE_URL=https://wiki.company.com
CONFLUENCE_TOKEN=${CONFLUENCE_API_TOKEN}
EMAIL_IMAP_HOST=imap.gmail.com
EMAIL_IMAP_PASSWORD=${EMAIL_PASSWORD}
# Security & Auth
SECRET_KEY=${SECRET_KEY_32_BYTES}
OAUTH_CLIENT_ID=${AUTH0_CLIENT_ID}
OAUTH_CLIENT_SECRET=${AUTH0_CLIENT_SECRET}
CORS_ORIGINS=https://company.intranet
# Features
ENABLE_API=true
ENABLE_WEB_UI=true
ENABLE_CHAT_INTEGRATIONS=true
MAX_CONCURRENT_INDEXING_JOBS=4
LOG_LEVEL=infoFor complex setups, YAML config file:
# config.yaml
llm:
provider: openai
api_key: ${LLM_API_KEY}
models:
qa_generation: gpt-4
summarization: gpt-3.5-turbo
embeddings: text-embedding-3-small
temperature: 0.3
vector_db:
type: elasticsearch
hosts:
- elasticsearch.company.com:9200
index_prefix: opaa
data_sources:
confluence:
enabled: true
url: https://wiki.company.com
auth_token: ${CONFLUENCE_TOKEN}
schedule: "0 2 * * *" # 2 AM daily
email:
enabled: true
imap_host: imap.gmail.com
email: archive@company.com
password: ${EMAIL_PASSWORD}
schedule: "*/6 * * * *" # Every 6 hours
file_system:
enabled: true
paths:
- /mnt/shared-docs
- /mnt/team-wikis
schedule: "*/30 * * * *" # Every 30 minutes
security:
enable_auth: true
auth_type: oauth2
oauth_provider: auth0
api_key_enabled: true
performance:
max_concurrent_indexing: 4
embedding_batch_size: 100
vector_search_top_k: 20OPAA is designed to scale from small teams to large enterprises. Concrete hardware requirements and sizing recommendations will be defined once the technology stack is established. The architecture supports:
- Small deployments: Single-server Docker Compose setup for teams and small organizations
- Medium deployments: Multi-node Kubernetes cluster for mid-size organizations
- Large deployments: Distributed infrastructure with horizontal scaling for enterprises
- Use lightweight vector database (e.g. PostgreSQL + pgvector) for smaller deployments
- Use cost-effective embedding models
- Route simple queries to faster/cheaper LLM providers
- Cache frequent answers to reduce LLM API costs
- Batch indexing during off-peak hours
Typical AWS architecture:
Application Load Balancer
β
ECS Cluster (OPAA services)
β
RDS PostgreSQL (metadata)
β
OpenSearch (vector DB)
β
S3 (document storage)
β
Data sources (S3, Confluence, etc.)
Services used:
- ECS or EKS for container orchestration
- RDS for PostgreSQL
- OpenSearch for vector database
- S3 for data storage and backups
- Lambda for scheduled indexing jobs
- CloudWatch for monitoring
- VPC for network isolation
Advantages:
- Managed services reduce operational burden
- Auto-scaling built-in
- Backup and disaster recovery easy
- IAM for access control
- Same data privacy as on-premises (within AWS)
Similar to AWS:
- Azure Container Instances or AKS
- Azure Database for PostgreSQL
- Azure Search (vector search)
- Azure Blob Storage
- Same patterns, different provider
Similar pattern:
- GKE for Kubernetes
- Cloud SQL for PostgreSQL
- Vertex AI Vector Search
- Cloud Storage
For production deployments:
Multiple Replicas:
- API servers: 3+ replicas
- Vector DB: Replicated/sharded
- PostgreSQL: Primary + standby replicas
- Redis: Sentinel mode or Cluster
Load Balancing:
- Load balancer distributes traffic
- Health checks enable automatic failover
- Circuit breakers for service degradation
Database Replication:
- PostgreSQL streaming replication
- Vector DB replication (varies by backend)
- Regular backup validation
Backup Strategy:
- Daily full backups of PostgreSQL
- Incremental backups of vector embeddings
- Document backups (source-of-truth, not critical)
- Backup stored in separate region/account
Recovery:
- RTO (Recovery Time Objective): 1 hour
- RPO (Recovery Point Objective): 1 day
- Regular DR drills (quarterly)
- Runbooks for common failures
Failover:
- Automatic failover for k8s services
- Manual failover for databases (< 30 minutes)
- Documented procedures for all services
Firewall Rules:
- OPAA cluster only accessible from internal network
- Outbound: Only to configured data sources and LLM APIs
- VPN/SSH access for administration
- DDoS protection at perimeter
Data Encryption:
- TLS 1.3 for all network traffic (internal and external)
- Encrypted secrets management (Vault, k8s Secrets)
- Database encryption at rest
- Disk encryption on servers
Authentication:
- SSO integration (OIDC, SAML)
- API tokens with scopes
- Service accounts for automation
Authorization:
- RBAC for admin functions
- Document-level permissions
- Workspace isolation
- Audit logging of all access
OPAA designed to support:
- GDPR: Data retention policies, data deletion
- HIPAA: Encryption, audit trails
- SOC 2: Access controls, monitoring
- ISO 27001: Security controls framework
Key performance indicators:
Performance:
- API response time (p50, p95, p99)
- Vector search latency
- LLM generation time
- Page load time (web UI)
Reliability:
- API uptime %
- Error rates (5xx, 4xx)
- Failed indexing jobs
- Queue lengths
Cost:
- LLM tokens/day
- Embedding cost
- Infrastructure cost
- Cost per query
Usage:
- Queries per day
- Active users
- Top questions
- Most-used documents
Alert on:
- API error rate > 1%
- Response time P95 > 2 seconds
- Vector DB disk > 80% full
- Indexing job failures > 3 in a row
- LLM API failures
- High costs (exceeding budget)
Central logging of:
- API requests and responses (no sensitive data)
- Errors and exceptions
- Indexing progress and failures
- Admin actions
- User feedback
Standard log format: JSON with timestamps, service, severity
Critical:
- PostgreSQL database (metadata, user settings)
- Vector embeddings (regeneratable but expensive)
Important:
- Configuration files
- Custom integrations/plugins
- Admin settings
Not needed:
- Source documents (can be re-indexed from source)
- Cached embeddings (can be regenerated)
- PostgreSQL: Daily full backup + hourly incremental
- Vector DB: Daily after each indexing run
- Config: Versioned in Git (separate repo)
- Monthly restore testing (to ensure backups work)
- Documented restore procedures
- Estimated restore time: < 4 hours for full restore
For zero-downtime upgrades:
- Deploy new version to "green" environment
- Run tests on green
- Switch load balancer from blue to green
- Keep blue as rollback
Downtime: 0 (for users), a few minutes total
Alternative: Gradual rollout
- Stop 1 pod, start 1 new version
- Wait for health checks
- Repeat until all pods updated
- Automatic rollback if health checks fail
- API versions maintained (v1, v2, etc.)
- Database schema migrations non-breaking
- Old features deprecated gradually, not removed abruptly
For serving multiple organizations:
Isolation Levels:
- Separate instances (simplest, most isolation)
- Shared infrastructure, separate databases (medium isolation)
- Shared database, row-level security (maximum density)
OPAA designed for option 3:
- Workspace IDs in all data
- Row-level security policies
- Separate vector embeddings per workspace (optional)
- Cost allocation per tenant
- Data Sources: Pull documents, handle credentials
- Authentication: SSO provider integration
- Monitoring: Send metrics to observability stack
- LLM Providers: API access for generation and embeddings
- Should we provide managed OPAA as a service?
- Should deployments auto-update?
- Should we support GitOps (configuration in Git)?
- Should we provide Terraform/CloudFormation templates?
- Should we support multi-region deployments?
- Should we provide Helm charts for community use?
- Availability: 99.9% uptime
- Performance: P95 response time < 2 seconds
- Deployment Time: New version deployed in < 15 minutes
- Scaling: Can handle 10x query volume with 3x infrastructure cost
- Recovery: Restore from backup in < 4 hours