Azure AI Search Simulator

A local simulator for Azure AI Search that allows developers to learn, experiment, and test Azure AI Search concepts without requiring an actual Azure subscription.

Overview

The Azure AI Search Simulator provides a local implementation of the Azure AI Search REST API, enabling you to:

🔍 Learn Azure AI Search concepts in a safe, cost-free environment
🧪 Test your search configurations before deploying to Azure
🚀 Develop search-powered applications without Azure dependencies
📚 Experiment with indexing pipelines and skillsets

Features

✅ Implemented

Index Management: Create, update, delete, and list search indexes
Document Operations: Upload, merge, mergeOrUpload, and delete documents (Push model)
Full-Text Search: Simple and Lucene query syntax
Filtering: Basic OData filter expressions (eq, ne, gt, lt, ge, le, search.in)
Sorting & Paging: OrderBy, top, skip support
Field Selection: $select parameter support
Highlighting: Search result highlighting
Faceted Navigation: Value facets and interval/range facets
Autocomplete: Term-based autocomplete
Suggestions: Prefix-based suggestions
Vector Search: Cosine similarity with Collection(Edm.Single) fields
Hybrid Search: Combined text and vector search scoring
Authentication: API keys, simulated JWT tokens, and Entra ID (Azure AD) support
Role-Based Access Control: Full RBAC with 6 Azure Search roles
Managed Identity: Resource-level identity for data sources, indexers, and skills
Storage: LiteDB for index metadata, Lucene.NET for document indexing
Data Sources: Azure Blob Storage, ADLS Gen2, and file system connectors
Indexers: Automated document ingestion with field mappings and status tracking (Pull Mode)
Document Cracking: Extract text/metadata from PDF, Word, Excel, HTML, JSON, CSV, TXT
Skillsets: Skill pipeline with text transformation and embedding skills
Document Extraction Skill: Extract content from base64/URL file_data with content-type detection and parsingMode support
Index Projections: One-to-many indexing — fan out chunks into a secondary index
Azure OpenAI Embedding Skill: Generate vector embeddings via Azure OpenAI API
Local Embedding Models: Generate embeddings locally via ONNX Runtime (no Azure OpenAI required)
Custom Web API Skill: Call external REST APIs for custom processing
Error Handling: OData-compliant error responses
Docker Support: Containerized deployment with docker-compose
Azure SDK Compatibility: Works with official Azure.Search.Documents SDK
Search Debug: Query diagnostics with subscore breakdown for hybrid/vector searches (debug parameter)
Synonym Maps: CRUD management, Solr format, query-time synonym expansion
Scoring Profiles: Text weights, freshness, magnitude, distance, and tag functions with interpolation and aggregation modes
Similarity Algorithms: Configurable BM25 (k1/b parameters) and ClassicSimilarity (TF-IDF). Per-index similarity with @search.features support

🔜 Planned (Future Phases)

Azure SQL / Cosmos DB connectors
Admin UI dashboard

Quick Start

Prerequisites

.NET 10.0 SDK
Visual Studio 2022 / VS Code / JetBrains Rider

Installation

# Clone the repository
git clone https://github.com/your-org/azure-ai-search-simulator.git
cd azure-ai-search-simulator

# Build the solution
dotnet build

# Run the simulator (HTTPS - recommended for Azure SDK compatibility)
dotnet run --project src/AzureAISearchSimulator.Api --urls "https://localhost:7250"

# API available at https://localhost:7250

Running with Docker

Using the Pre-built Image (Recommended)

A pre-built Docker image is published to GitHub Container Registry on every release:

# Pull the latest image
docker pull ghcr.io/Ellerbach/azure-ai-search-simulator:latest

# Or pull a specific version
docker pull ghcr.io/Ellerbach/azure-ai-search-simulator:1.0.0

# Run the container
docker run -d --name azure-ai-search-simulator \
  -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v lucene-indexes:/app/lucene-indexes \
  -v ./logs:/app/logs \
  -v ./files:/app/files \
  -v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
  ghcr.io/Ellerbach/azure-ai-search-simulator:latest

# API available at https://localhost:7250 (HTTPS) or http://localhost:5250 (HTTP)

You can also use it in a docker-compose.yml:

services:
  search-simulator:
    image: ghcr.io/Ellerbach/azure-ai-search-simulator:latest
    ports:
      - "7250:8443"
      - "5250:8080"
    environment:
      - SimulatorSettings__AdminApiKey=admin-key-12345
      - SimulatorSettings__QueryApiKey=query-key-67890
    volumes:
      - search-data:/app/data
      - lucene-indexes:/app/lucene-indexes
      - ./logs:/app/logs
      - ./files:/app/files
      # Mount ONNX models for local embedding (download first with scripts/Download-EmbeddingModel.ps1)
      - ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro

volumes:
  search-data:
  lucene-indexes:

Available image tags:

Tag	Description
`latest`	Latest release
`x.y.z` (e.g. `1.0.0`)	Specific version
`x.y` (e.g. `1.0`)	Latest patch for a minor version
`x` (e.g. `1`)	Latest minor/patch for a major version

Building from Source

If you prefer to build the image yourself:

# Build and run with docker-compose
docker-compose up -d

# Or build the image manually
docker build -t azure-ai-search-simulator .
docker run -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v lucene-indexes:/app/lucene-indexes \
  -v ./logs:/app/logs \
  -v ./files:/app/files \
  -v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
  azure-ai-search-simulator

# API available at https://localhost:7250 (HTTPS) or http://localhost:5250 (HTTP)

Docker Volume Mapping

The container exposes four mount points for data persistence and file access:

Mount Point	Purpose	Recommended Mount
`/app/data`	LiteDB database (index metadata, data sources, indexer state)	Named volume
`/app/lucene-indexes`	Lucene search index files	Named volume
`/app/logs`	Serilog log files (`simulator-{date}.log`)	Bind mount for easy host access
`/app/files`	Documents for indexer file processing (pull mode)	Bind mount to your documents folder
`/app/models`	ONNX embedding models for local `local://` skill mode	Bind mount (read-only)

Example: Mount a local documents folder for indexer processing

# Mount your documents folder so indexers can access them inside the container
docker run -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v ./your-documents:/app/files \
  azure-ai-search-simulator

Example: Mount ONNX models for local embedding

# Download a model first, then mount the models directory
.\scripts\Download-EmbeddingModel.ps1 -ModelName all-MiniLM-L6-v2

docker run -p 7250:8443 -p 5250:8080 \
  -v search-data:/app/data \
  -v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
  azure-ai-search-simulator

Then create a filesystem data source pointing to /app/files:

PUT https://localhost:7250/datasources/my-files?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "my-files",
  "type": "filesystem",
  "credentials": { "connectionString": "/app/files" },
  "container": { "name": "subfolder" }
}

Note: The Docker image generates a self-signed certificate for HTTPS. You'll need to skip certificate validation in your client (see Azure SDK example below).

Using with Azure SDK

The simulator is compatible with the official Azure.Search.Documents SDK. Note that the SDK requires HTTPS.

using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;

// Point to local simulator (HTTPS required)
var endpoint = new Uri("https://localhost:7250");
var credential = new AzureKeyCredential("admin-key-12345");

// Skip certificate validation for local development
var handler = new HttpClientHandler
{
    ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator
};
var options = new SearchClientOptions
{
    Transport = new Azure.Core.Pipeline.HttpClientTransport(handler)
};

// Create clients
var indexClient = new SearchIndexClient(endpoint, credential, options);
var searchClient = new SearchClient(endpoint, "my-index", credential, options);

// Create an index
var index = new SearchIndex("my-index")
{
    Fields = new[]
    {
        new SimpleField("id", SearchFieldDataType.String) { IsKey = true },
        new SearchableField("title") { IsFilterable = true },
        new SearchableField("content"),
        new SimpleField("rating", SearchFieldDataType.Double) { IsFilterable = true, IsSortable = true }
    }
};

await indexClient.CreateIndexAsync(index);

// Upload documents
var documents = new[]
{
    new { id = "1", title = "Document One", content = "This is the first document", rating = 4.5 },
    new { id = "2", title = "Document Two", content = "This is the second document", rating = 3.8 }
};

await searchClient.UploadDocumentsAsync(documents);

// Search
var results = await searchClient.SearchAsync<SearchDocument>("first");
await foreach (var result in results.Value.GetResultsAsync())
{
    Console.WriteLine($"Found: {result.Document["title"]} (Score: {result.Score})");
}

Using with Azure SDK (Python)

The simulator also works with the official azure-search-documents Python SDK:

import requests
from azure.core.credentials import AzureKeyCredential
from azure.core.pipeline.transport import RequestsTransport
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex, SimpleField, SearchableField, SearchFieldDataType
)

# Point to local simulator (HTTPS required)
endpoint = "https://localhost:7250"
credential = AzureKeyCredential("admin-key-12345")

# Skip certificate validation for local development (self-signed cert)
session = requests.Session()
session.verify = False
transport = RequestsTransport(session=session, connection_verify=False)

# Create clients
index_client = SearchIndexClient(endpoint, credential, transport=transport, connection_verify=False)
search_client = SearchClient(endpoint, "my-index", credential, transport=transport, connection_verify=False)

# Create an index
index = SearchIndex(
    name="my-index",
    fields=[
        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
        SearchableField(name="title", filterable=True),
        SearchableField(name="content"),
        SimpleField(name="rating", type=SearchFieldDataType.Double, filterable=True, sortable=True),
    ],
)
index_client.create_index(index)

# Upload documents
documents = [
    {"id": "1", "title": "Document One", "content": "This is the first document", "rating": 4.5},
    {"id": "2", "title": "Document Two", "content": "This is the second document", "rating": 3.8},
]
search_client.upload_documents(documents)

# Search
results = search_client.search("first")
for result in results:
    print(f"Found: {result['title']} (Score: {result['@search.score']})")

Using REST API

### Create an index
POST https://localhost:7250/indexes?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "hotels",
  "fields": [
    { "name": "hotelId", "type": "Edm.String", "key": true },
    { "name": "hotelName", "type": "Edm.String", "searchable": true },
    { "name": "description", "type": "Edm.String", "searchable": true },
    { "name": "rating", "type": "Edm.Double", "filterable": true, "sortable": true }
  ]
}

### Upload documents
POST https://localhost:7250/indexes/hotels/docs/index?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "value": [
    {
      "@search.action": "upload",
      "hotelId": "1",
      "hotelName": "Fancy Hotel",
      "description": "A luxury hotel with great amenities",
      "rating": 4.8
    }
  ]
}

### Search
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890

{
  "search": "luxury",
  "filter": "rating ge 4",
  "orderby": "rating desc",
  "top": 10
}

### Vector Search
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890

{
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [0.01, 0.02, ...],
      "fields": "descriptionVector",
      "k": 10
    }
  ]
}

### Hybrid Search (Text + Vector)
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890

{
  "search": "luxury hotel",
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [0.01, 0.02, ...],
      "fields": "descriptionVector",
      "k": 10
    }
  ]
}

### Create Data Source (Pull Model)
PUT https://localhost:7250/datasources/my-files?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "my-files",
  "type": "filesystem",
  "credentials": {
    "connectionString": "c:\\data\\documents"
  },
  "container": {
    "name": "pdfs"
  }
}

### Create Indexer
PUT https://localhost:7250/indexers/my-indexer?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345

{
  "name": "my-indexer",
  "dataSourceName": "my-files",
  "targetIndexName": "documents",
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id"
    }
  ]
}

### Run Indexer
POST https://localhost:7250/indexers/my-indexer/run?api-version=2024-07-01
api-key: admin-key-12345

Authentication

The simulator supports three authentication modes that can be enabled simultaneously:

API Key (Default)

api-key: admin-key-12345

Simulated Tokens (Local Development)

Generate JWT tokens locally for testing RBAC without Azure:

### Get a token with Search Index Data Contributor role
GET https://localhost:7250/admin/token/quick/data-contributor
api-key: admin-key-12345

### Use the token
GET https://localhost:7250/indexes
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

Entra ID (Real Azure AD)

Validate real Azure AD tokens for production-like testing:

var credential = new DefaultAzureCredential();
var searchClient = new SearchClient(endpoint, "my-index", credential);

Role-Based Access Control

The simulator enforces Azure AI Search RBAC:

Role	Permissions
Search Service Contributor	Manage indexes, indexers, data sources, skillsets
Search Index Data Contributor	Upload, merge, delete documents
Search Index Data Reader	Search, suggest, autocomplete

📚 See docs/AUTHENTICATION.md for the complete authentication guide.

Configuration

Edit appsettings.json to customize the simulator:

{
  "SimulatorSettings": {
    "ServiceName": "local-search-simulator",
    "DataDirectory": "./data",
    "AdminApiKey": "admin-key-12345",
    "QueryApiKey": "query-key-67890",
    "MaxIndexes": 50,
    "MaxDocumentsPerIndex": 100000
  }
}

Diagnostic Logging

Enable verbose logging to debug indexer and skill pipeline execution:

{
  "DiagnosticLogging": {
    "Enabled": true,
    "LogDocumentDetails": true,
    "LogSkillExecution": true,
    "LogSkillInputPayloads": true,
    "LogSkillOutputPayloads": true,
    "LogEnrichedDocumentState": false,
    "LogFieldMappings": true,
    "MaxStringLogLength": 500
  }
}

Logs are written to logs/simulator-{date}.log and console. Look for [DIAGNOSTIC] prefixed entries.

Documentation

Development Plan - Full project plan and architecture
API Reference - Complete REST API documentation
Configuration Guide - Detailed configuration options
Authentication Guide - API keys, JWT tokens, Entra ID, and RBAC
Limitations - Differences from Azure AI Search

Samples

All .http sample files use environment variables via $dotenv. To get started:

Copy .env.example to .env in the workspace root
Fill in your values (API keys, storage credentials, etc.)
The .env file is gitignored and will not be committed

Sample	Description
AzureSdkSample	C# console app demonstrating Azure.Search.Documents SDK compatibility
AzureSearchNotebook	Python Jupyter notebook with comprehensive search demos and skillset integration
IndexerTestNotebook	Python notebook for testing indexers with JSON metadata files
EmbeddingSkillNotebook	Python notebook demonstrating Azure OpenAI Embedding skill, vector search, and hybrid search with RRF fusion
CustomSkillSample	ASP.NET Core API implementing custom Web API skills (text stats, keywords, sentiment, summarization)
sample-requests.http	REST Client file with comprehensive API examples
compare-requests.http	REST Client file to send identical requests to the simulator and real Azure AI Search side by side
synonym-map-sample.http	REST Client file demonstrating synonym maps (CRUD + search expansion), with `[SIM]` / `[AZURE]` pairs
Compare-Results.ps1	PowerShell script that automates comparison and shows a color-coded diff of responses
pull-mode-test.http	REST Client file for testing indexer pull mode workflow
local-embedding-sample.http	REST Client file demonstrating local ONNX embedding skill (no Azure OpenAI required)
index-projection-sample.http	REST Client file demonstrating index projections (one-to-many chunking into a secondary index)
scoring-profile-sample.http	REST Client file demonstrating scoring profiles (text weights, magnitude, freshness, tag, combined)
similarity-sample.http	REST Client file demonstrating similarity algorithms (default BM25, custom BM25 k1/b, ClassicSimilarity TF-IDF)
Download-EmbeddingModel.ps1	PowerShell script to download ONNX embedding models from HuggingFace

Comparing Simulator vs Real Azure AI Search

You can run the same queries against both the local simulator and a real Azure AI Search service to verify behavioral parity:

In .env, set BASE_URL / ADMIN_KEY / QUERY_KEY (simulator) and AZURE_BASE_URL / AZURE_ADMIN_KEY / AZURE_QUERY_KEY (real service).
Interactive — Open compare-requests.http and click "Send Request" on matched [SIM] / [AZURE] pairs. Use VS Code split tabs to view both responses.
Automated — Run the PowerShell comparison script:

# Run all scenarios (create index, upload docs, search, cleanup)
.\scripts\Compare-Results.ps1

# Run a single scenario
.\scripts\Compare-Results.ps1 -Scenario SimpleSearch

# Show full JSON even when responses match
.\scripts\Compare-Results.ps1 -ShowFullResponse

The script highlights MATCH (green) or DIFFERENCES (red) for each scenario, ignoring dynamic fields like @odata.etag and @odata.context.

Project Structure

AzureAISearchSimulator/
├── src/
│   ├── AzureAISearchSimulator.Api/        # REST API layer
│   ├── AzureAISearchSimulator.Core/       # Business logic
│   ├── AzureAISearchSimulator.Search/     # Lucene.NET search engine & skills
│   ├── AzureAISearchSimulator.DataSources/# Data source connectors
│   └── AzureAISearchSimulator.Storage/    # Persistence layer
├── tests/
├── samples/
└── docs/

Supported vs Azure AI Search

Feature	Azure AI Search	Simulator
Full-text search	✅	✅
Filtering & facets	✅	✅
Vector search	✅	✅ (cosine similarity)
Hybrid search	✅	✅
Highlighting	✅	✅
Autocomplete	✅	✅
Suggestions	✅	✅
Indexers	✅	✅ (Blob, ADLS, filesystem)
Skillsets (utility)	✅	✅
Custom Web API Skill	✅	✅
Azure OpenAI Embedding	✅	✅
Document Cracking	✅	✅
Semantic search	✅	❌
AI skills (OCR, etc.)	✅	❌
Managed Identity	✅	✅ (simulated)
Entra ID Authentication	✅	✅
Scoring Profiles	✅	✅
Similarity Algorithms	✅ (BM25, Classic)	✅ (BM25, Classic)
Scale (millions of docs)	✅	Limited

Skills Support

Skill Category	Azure AI Search	Simulator
Utility Skills (Split, Merge, Shaper, Conditional)	✅	✅
Document Extraction Skill	✅	✅
Custom Web API Skill	✅	✅
Azure OpenAI Embedding Skill	✅	✅
AI Vision Skills (OCR, Image Analysis)	✅	❌
AI Language Skills (Entity Recognition, Sentiment, PII, etc.)	✅	❌
Translation Skill	✅	❌
GenAI Prompt Skill	✅	❌

Tip: Use the Custom Web API Skill to implement your own versions of missing AI skills. See samples/CustomSkillSample for examples.

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with Lucene.NET
Built with HNSW
Using ONNX for local embeddings
Inspired by Azure AI Search

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.github		.github
docs		docs
samples		samples
scripts		scripts
src		src
tests		tests
tools/GenerateSampleDocs		tools/GenerateSampleDocs
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AzureAISearchSimulator.sln		AzureAISearchSimulator.sln
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Azure AI Search Simulator

Overview

Features

✅ Implemented

🔜 Planned (Future Phases)

Quick Start

Prerequisites

Installation

Running with Docker

Using the Pre-built Image (Recommended)

Building from Source

Docker Volume Mapping

Using with Azure SDK

Using with Azure SDK (Python)

Using REST API

Authentication

API Key (Default)

Simulated Tokens (Local Development)

Entra ID (Real Azure AD)

Role-Based Access Control

Configuration

Diagnostic Logging

Documentation

Samples

Comparing Simulator vs Real Azure AI Search

Project Structure

Supported vs Azure AI Search

Skills Support

Contributing

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages