Skip to content

Ellerbach/azure-ai-search-simulator

Repository files navigation

Azure AI Search Simulator

A local simulator for Azure AI Search that allows developers to learn, experiment, and test Azure AI Search concepts without requiring an actual Azure subscription.

Build Status .NET License

Overview

The Azure AI Search Simulator provides a local implementation of the Azure AI Search REST API, enabling you to:

  • πŸ” Learn Azure AI Search concepts in a safe, cost-free environment
  • πŸ§ͺ Test your search configurations before deploying to Azure
  • πŸš€ Develop search-powered applications without Azure dependencies
  • πŸ“š Experiment with indexing pipelines and skillsets

Features

βœ… Implemented

  • Index Management: Create, update, delete, and list search indexes
  • Document Operations: Upload, merge, mergeOrUpload, and delete documents (Push model)
  • Full-Text Search: Simple and Lucene query syntax
  • Filtering: Basic OData filter expressions (eq, ne, gt, lt, ge, le, search.in)
  • Sorting & Paging: OrderBy, top, skip support
  • Field Selection: $select parameter support
  • Highlighting: Search result highlighting
  • Faceted Navigation: Value facets and interval/range facets
  • Autocomplete: Term-based autocomplete
  • Suggestions: Prefix-based suggestions
  • Vector Search: Cosine similarity with Collection(Edm.Single) fields
  • Hybrid Search: Combined text and vector search scoring
  • Authentication: API keys, simulated JWT tokens, and Entra ID (Azure AD) support
  • Role-Based Access Control: Full RBAC with 6 Azure Search roles
  • Managed Identity: Resource-level identity for data sources, indexers, and skills
  • Storage: LiteDB for index metadata, Lucene.NET for document indexing
  • Data Sources: Azure Blob Storage, ADLS Gen2, and file system connectors
  • Indexers: Automated document ingestion with field mappings and status tracking (Pull Mode)
  • Document Cracking: Extract text/metadata from PDF, Word, Excel, HTML, JSON, CSV, TXT
  • Skillsets: Skill pipeline with text transformation and embedding skills
  • Document Extraction Skill: Extract content from base64/URL file_data with content-type detection and parsingMode support
  • Index Projections: One-to-many indexing β€” fan out chunks into a secondary index
  • Azure OpenAI Embedding Skill: Generate vector embeddings via Azure OpenAI API
  • Local Embedding Models: Generate embeddings locally via ONNX Runtime (no Azure OpenAI required)
  • Custom Web API Skill: Call external REST APIs for custom processing
  • Error Handling: OData-compliant error responses
  • Docker Support: Containerized deployment with docker-compose
  • Azure SDK Compatibility: Works with official Azure.Search.Documents SDK
  • Search Debug: Query diagnostics with subscore breakdown for hybrid/vector searches (debug parameter)
  • Synonym Maps: CRUD management, Solr format, query-time synonym expansion
  • Scoring Profiles: Text weights, freshness, magnitude, distance, and tag functions with interpolation and aggregation modes
  • Similarity Algorithms: Configurable BM25 (k1/b parameters) and ClassicSimilarity (TF-IDF). Per-index similarity with @search.features support

πŸ”œ Planned (Future Phases)

  • Azure SQL / Cosmos DB connectors
  • Admin UI dashboard

Quick Start

Prerequisites

Installation

# Clone the repository
git clone https://github.com/your-org/azure-ai-search-simulator.git
cd azure-ai-search-simulator

# Build the solution
dotnet build

# Run the simulator (HTTPS - recommended for Azure SDK compatibility)
dotnet run --project src/AzureAISearchSimulator.Api --urls "https://localhost:7250"

# API available at https://localhost:7250

Running with Docker

Using the Pre-built Image (Recommended)

A pre-built Docker image is published to GitHub Container Registry on every release:

# Pull the latest image
docker pull ghcr.io/Ellerbach/azure-ai-search-simulator:latest

# Or pull a specific version
docker pull ghcr.io/Ellerbach/azure-ai-search-simulator:1.0.0

# Run the container
docker run -d --name azure-ai-search-simulator \
  -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v lucene-indexes:/app/lucene-indexes \
  -v ./logs:/app/logs \
  -v ./files:/app/files \
  -v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
  ghcr.io/Ellerbach/azure-ai-search-simulator:latest

# API available at https://localhost:7250 (HTTPS) or http://localhost:5250 (HTTP)

You can also use it in a docker-compose.yml:

services:
  search-simulator:
    image: ghcr.io/Ellerbach/azure-ai-search-simulator:latest
    ports:
      - "7250:8443"
      - "5250:8080"
    environment:
      - SimulatorSettings__AdminApiKey=admin-key-12345
      - SimulatorSettings__QueryApiKey=query-key-67890
    volumes:
      - search-data:/app/data
      - lucene-indexes:/app/lucene-indexes
      - ./logs:/app/logs
      - ./files:/app/files
      # Mount ONNX models for local embedding (download first with scripts/Download-EmbeddingModel.ps1)
      - ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro

volumes:
  search-data:
  lucene-indexes:

Available image tags:

Tag Description
latest Latest release
x.y.z (e.g. 1.0.0) Specific version
x.y (e.g. 1.0) Latest patch for a minor version
x (e.g. 1) Latest minor/patch for a major version

Building from Source

If you prefer to build the image yourself:

# Build and run with docker-compose
docker-compose up -d

# Or build the image manually
docker build -t azure-ai-search-simulator .
docker run -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v lucene-indexes:/app/lucene-indexes \
  -v ./logs:/app/logs \
  -v ./files:/app/files \
  -v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
  azure-ai-search-simulator

# API available at https://localhost:7250 (HTTPS) or http://localhost:5250 (HTTP)

Docker Volume Mapping

The container exposes four mount points for data persistence and file access:

Mount Point Purpose Recommended Mount
/app/data LiteDB database (index metadata, data sources, indexer state) Named volume
/app/lucene-indexes Lucene search index files Named volume
/app/logs Serilog log files (simulator-{date}.log) Bind mount for easy host access
/app/files Documents for indexer file processing (pull mode) Bind mount to your documents folder
/app/models ONNX embedding models for local local:// skill mode Bind mount (read-only)

Example: Mount a local documents folder for indexer processing

# Mount your documents folder so indexers can access them inside the container
docker run -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v ./your-documents:/app/files \
  azure-ai-search-simulator

Example: Mount ONNX models for local embedding

# Download a model first, then mount the models directory
.\scripts\Download-EmbeddingModel.ps1 -ModelName all-MiniLM-L6-v2

docker run -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
  azure-ai-search-simulator

Then create a filesystem data source pointing to /app/files:

PUT https://localhost:7250/datasources/my-files?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "my-files",
  "type": "filesystem",
  "credentials": { "connectionString": "/app/files" },
  "container": { "name": "subfolder" }
}

Note: The Docker image generates a self-signed certificate for HTTPS. You'll need to skip certificate validation in your client (see Azure SDK example below).

Using with Azure SDK

The simulator is compatible with the official Azure.Search.Documents SDK. Note that the SDK requires HTTPS.

using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;

// Point to local simulator (HTTPS required)
var endpoint = new Uri("https://localhost:7250");
var credential = new AzureKeyCredential("admin-key-12345");

// Skip certificate validation for local development
var handler = new HttpClientHandler
{
    ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator
};
var options = new SearchClientOptions
{
    Transport = new Azure.Core.Pipeline.HttpClientTransport(handler)
};

// Create clients
var indexClient = new SearchIndexClient(endpoint, credential, options);
var searchClient = new SearchClient(endpoint, "my-index", credential, options);

// Create an index
var index = new SearchIndex("my-index")
{
    Fields = new[]
    {
        new SimpleField("id", SearchFieldDataType.String) { IsKey = true },
        new SearchableField("title") { IsFilterable = true },
        new SearchableField("content"),
        new SimpleField("rating", SearchFieldDataType.Double) { IsFilterable = true, IsSortable = true }
    }
};

await indexClient.CreateIndexAsync(index);

// Upload documents
var documents = new[]
{
    new { id = "1", title = "Document One", content = "This is the first document", rating = 4.5 },
    new { id = "2", title = "Document Two", content = "This is the second document", rating = 3.8 }
};

await searchClient.UploadDocumentsAsync(documents);

// Search
var results = await searchClient.SearchAsync<SearchDocument>("first");
await foreach (var result in results.Value.GetResultsAsync())
{
    Console.WriteLine($"Found: {result.Document["title"]} (Score: {result.Score})");
}

Using with Azure SDK (Python)

The simulator also works with the official azure-search-documents Python SDK:

import requests
from azure.core.credentials import AzureKeyCredential
from azure.core.pipeline.transport import RequestsTransport
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex, SimpleField, SearchableField, SearchFieldDataType
)

# Point to local simulator (HTTPS required)
endpoint = "https://localhost:7250"
credential = AzureKeyCredential("admin-key-12345")

# Skip certificate validation for local development (self-signed cert)
session = requests.Session()
session.verify = False
transport = RequestsTransport(session=session, connection_verify=False)

# Create clients
index_client = SearchIndexClient(endpoint, credential, transport=transport, connection_verify=False)
search_client = SearchClient(endpoint, "my-index", credential, transport=transport, connection_verify=False)

# Create an index
index = SearchIndex(
    name="my-index",
    fields=[
        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
        SearchableField(name="title", filterable=True),
        SearchableField(name="content"),
        SimpleField(name="rating", type=SearchFieldDataType.Double, filterable=True, sortable=True),
    ],
)
index_client.create_index(index)

# Upload documents
documents = [
    {"id": "1", "title": "Document One", "content": "This is the first document", "rating": 4.5},
    {"id": "2", "title": "Document Two", "content": "This is the second document", "rating": 3.8},
]
search_client.upload_documents(documents)

# Search
results = search_client.search("first")
for result in results:
    print(f"Found: {result['title']} (Score: {result['@search.score']})")

Using REST API

### Create an index
POST https://localhost:7250/indexes?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "hotels",
  "fields": [
    { "name": "hotelId", "type": "Edm.String", "key": true },
    { "name": "hotelName", "type": "Edm.String", "searchable": true },
    { "name": "description", "type": "Edm.String", "searchable": true },
    { "name": "rating", "type": "Edm.Double", "filterable": true, "sortable": true }
  ]
}

### Upload documents
POST https://localhost:7250/indexes/hotels/docs/index?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "value": [
    {
      "@search.action": "upload",
      "hotelId": "1",
      "hotelName": "Fancy Hotel",
      "description": "A luxury hotel with great amenities",
      "rating": 4.8
    }
  ]
}

### Search
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890

{
  "search": "luxury",
  "filter": "rating ge 4",
  "orderby": "rating desc",
  "top": 10
}

### Vector Search
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890

{
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [0.01, 0.02, ...],
      "fields": "descriptionVector",
      "k": 10
    }
  ]
}

### Hybrid Search (Text + Vector)
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890

{
  "search": "luxury hotel",
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [0.01, 0.02, ...],
      "fields": "descriptionVector",
      "k": 10
    }
  ]
}

### Create Data Source (Pull Model)
PUT https://localhost:7250/datasources/my-files?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "my-files",
  "type": "filesystem",
  "credentials": {
    "connectionString": "c:\\data\\documents"
  },
  "container": {
    "name": "pdfs"
  }
}

### Create Indexer
PUT https://localhost:7250/indexers/my-indexer?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "my-indexer",
  "dataSourceName": "my-files",
  "targetIndexName": "documents",
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id"
    }
  ]
}

### Run Indexer
POST https://localhost:7250/indexers/my-indexer/run?api-version=2024-07-01
api-key: admin-key-12345

Authentication

The simulator supports three authentication modes that can be enabled simultaneously:

API Key (Default)

api-key: admin-key-12345

Simulated Tokens (Local Development)

Generate JWT tokens locally for testing RBAC without Azure:

### Get a token with Search Index Data Contributor role
GET https://localhost:7250/admin/token/quick/data-contributor
api-key: admin-key-12345

### Use the token
GET https://localhost:7250/indexes
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

Entra ID (Real Azure AD)

Validate real Azure AD tokens for production-like testing:

var credential = new DefaultAzureCredential();
var searchClient = new SearchClient(endpoint, "my-index", credential);

Role-Based Access Control

The simulator enforces Azure AI Search RBAC:

Role Permissions
Search Service Contributor Manage indexes, indexers, data sources, skillsets
Search Index Data Contributor Upload, merge, delete documents
Search Index Data Reader Search, suggest, autocomplete

πŸ“š See docs/AUTHENTICATION.md for the complete authentication guide.

Configuration

Edit appsettings.json to customize the simulator:

{
  "SimulatorSettings": {
    "ServiceName": "local-search-simulator",
    "DataDirectory": "./data",
    "AdminApiKey": "admin-key-12345",
    "QueryApiKey": "query-key-67890",
    "MaxIndexes": 50,
    "MaxDocumentsPerIndex": 100000
  }
}

Diagnostic Logging

Enable verbose logging to debug indexer and skill pipeline execution:

{
  "DiagnosticLogging": {
    "Enabled": true,
    "LogDocumentDetails": true,
    "LogSkillExecution": true,
    "LogSkillInputPayloads": true,
    "LogSkillOutputPayloads": true,
    "LogEnrichedDocumentState": false,
    "LogFieldMappings": true,
    "MaxStringLogLength": 500
  }
}

Logs are written to logs/simulator-{date}.log and console. Look for [DIAGNOSTIC] prefixed entries.

Documentation

Samples

All .http sample files use environment variables via $dotenv. To get started:

  1. Copy .env.example to .env in the workspace root
  2. Fill in your values (API keys, storage credentials, etc.)
  3. The .env file is gitignored and will not be committed
Sample Description
AzureSdkSample C# console app demonstrating Azure.Search.Documents SDK compatibility
AzureSearchNotebook Python Jupyter notebook with comprehensive search demos and skillset integration
IndexerTestNotebook Python notebook for testing indexers with JSON metadata files
EmbeddingSkillNotebook Python notebook demonstrating Azure OpenAI Embedding skill, vector search, and hybrid search with RRF fusion
CustomSkillSample ASP.NET Core API implementing custom Web API skills (text stats, keywords, sentiment, summarization)
sample-requests.http REST Client file with comprehensive API examples
compare-requests.http REST Client file to send identical requests to the simulator and real Azure AI Search side by side
synonym-map-sample.http REST Client file demonstrating synonym maps (CRUD + search expansion), with [SIM] / [AZURE] pairs
Compare-Results.ps1 PowerShell script that automates comparison and shows a color-coded diff of responses
pull-mode-test.http REST Client file for testing indexer pull mode workflow
local-embedding-sample.http REST Client file demonstrating local ONNX embedding skill (no Azure OpenAI required)
index-projection-sample.http REST Client file demonstrating index projections (one-to-many chunking into a secondary index)
scoring-profile-sample.http REST Client file demonstrating scoring profiles (text weights, magnitude, freshness, tag, combined)
similarity-sample.http REST Client file demonstrating similarity algorithms (default BM25, custom BM25 k1/b, ClassicSimilarity TF-IDF)
Download-EmbeddingModel.ps1 PowerShell script to download ONNX embedding models from HuggingFace

Comparing Simulator vs Real Azure AI Search

You can run the same queries against both the local simulator and a real Azure AI Search service to verify behavioral parity:

  1. In .env, set BASE_URL / ADMIN_KEY / QUERY_KEY (simulator) and AZURE_BASE_URL / AZURE_ADMIN_KEY / AZURE_QUERY_KEY (real service).
  2. Interactive β€” Open compare-requests.http and click "Send Request" on matched [SIM] / [AZURE] pairs. Use VS Code split tabs to view both responses.
  3. Automated β€” Run the PowerShell comparison script:
# Run all scenarios (create index, upload docs, search, cleanup)
.\scripts\Compare-Results.ps1

# Run a single scenario
.\scripts\Compare-Results.ps1 -Scenario SimpleSearch

# Show full JSON even when responses match
.\scripts\Compare-Results.ps1 -ShowFullResponse

The script highlights MATCH (green) or DIFFERENCES (red) for each scenario, ignoring dynamic fields like @odata.etag and @odata.context.

Project Structure

AzureAISearchSimulator/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ AzureAISearchSimulator.Api/        # REST API layer
β”‚   β”œβ”€β”€ AzureAISearchSimulator.Core/       # Business logic
β”‚   β”œβ”€β”€ AzureAISearchSimulator.Search/     # Lucene.NET search engine & skills
β”‚   β”œβ”€β”€ AzureAISearchSimulator.DataSources/# Data source connectors
β”‚   └── AzureAISearchSimulator.Storage/    # Persistence layer
β”œβ”€β”€ tests/
β”œβ”€β”€ samples/
└── docs/

Supported vs Azure AI Search

Feature Azure AI Search Simulator
Full-text search βœ… βœ…
Filtering & facets βœ… βœ…
Vector search βœ… βœ… (cosine similarity)
Hybrid search βœ… βœ…
Highlighting βœ… βœ…
Autocomplete βœ… βœ…
Suggestions βœ… βœ…
Indexers βœ… βœ… (Blob, ADLS, filesystem)
Skillsets (utility) βœ… βœ…
Custom Web API Skill βœ… βœ…
Azure OpenAI Embedding βœ… βœ…
Document Cracking βœ… βœ…
Semantic search βœ… ❌
AI skills (OCR, etc.) βœ… ❌
Managed Identity βœ… βœ… (simulated)
Entra ID Authentication βœ… βœ…
Scoring Profiles βœ… βœ…
Similarity Algorithms βœ… (BM25, Classic) βœ… (BM25, Classic)
Scale (millions of docs) βœ… Limited

Skills Support

Skill Category Azure AI Search Simulator
Utility Skills (Split, Merge, Shaper, Conditional) βœ… βœ…
Document Extraction Skill βœ… βœ…
Custom Web API Skill βœ… βœ…
Azure OpenAI Embedding Skill βœ… βœ…
AI Vision Skills (OCR, Image Analysis) βœ… ❌
AI Language Skills (Entity Recognition, Sentiment, PII, etc.) βœ… ❌
Translation Skill βœ… ❌
GenAI Prompt Skill βœ… ❌

Tip: Use the Custom Web API Skill to implement your own versions of missing AI skills. See samples/CustomSkillSample for examples.

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

Azure AI Search Simulator provides a lightweight environment to emulate Azure AI Search in pull or push modes. It offers a compatible API surface and works seamlessly with the official SDK, enabling local development, testing, and debugging without requiring a live search service.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors