A local simulator for Azure AI Search that allows developers to learn, experiment, and test Azure AI Search concepts without requiring an actual Azure subscription.
The Azure AI Search Simulator provides a local implementation of the Azure AI Search REST API, enabling you to:
- π Learn Azure AI Search concepts in a safe, cost-free environment
- π§ͺ Test your search configurations before deploying to Azure
- π Develop search-powered applications without Azure dependencies
- π Experiment with indexing pipelines and skillsets
- Index Management: Create, update, delete, and list search indexes
- Document Operations: Upload, merge, mergeOrUpload, and delete documents (Push model)
- Full-Text Search: Simple and Lucene query syntax
- Filtering: Basic OData filter expressions (eq, ne, gt, lt, ge, le, search.in)
- Sorting & Paging: OrderBy, top, skip support
- Field Selection: $select parameter support
- Highlighting: Search result highlighting
- Faceted Navigation: Value facets and interval/range facets
- Autocomplete: Term-based autocomplete
- Suggestions: Prefix-based suggestions
- Vector Search: Cosine similarity with
Collection(Edm.Single)fields - Hybrid Search: Combined text and vector search scoring
- Authentication: API keys, simulated JWT tokens, and Entra ID (Azure AD) support
- Role-Based Access Control: Full RBAC with 6 Azure Search roles
- Managed Identity: Resource-level identity for data sources, indexers, and skills
- Storage: LiteDB for index metadata, Lucene.NET for document indexing
- Data Sources: Azure Blob Storage, ADLS Gen2, and file system connectors
- Indexers: Automated document ingestion with field mappings and status tracking (Pull Mode)
- Document Cracking: Extract text/metadata from PDF, Word, Excel, HTML, JSON, CSV, TXT
- Skillsets: Skill pipeline with text transformation and embedding skills
- Document Extraction Skill: Extract content from base64/URL
file_datawith content-type detection andparsingModesupport - Index Projections: One-to-many indexing β fan out chunks into a secondary index
- Azure OpenAI Embedding Skill: Generate vector embeddings via Azure OpenAI API
- Local Embedding Models: Generate embeddings locally via ONNX Runtime (no Azure OpenAI required)
- Custom Web API Skill: Call external REST APIs for custom processing
- Error Handling: OData-compliant error responses
- Docker Support: Containerized deployment with docker-compose
- Azure SDK Compatibility: Works with official Azure.Search.Documents SDK
- Search Debug: Query diagnostics with subscore breakdown for hybrid/vector searches (
debugparameter) - Synonym Maps: CRUD management, Solr format, query-time synonym expansion
- Scoring Profiles: Text weights, freshness, magnitude, distance, and tag functions with interpolation and aggregation modes
- Similarity Algorithms: Configurable BM25 (k1/b parameters) and ClassicSimilarity (TF-IDF). Per-index similarity with
@search.featuressupport
- Azure SQL / Cosmos DB connectors
- Admin UI dashboard
- .NET 10.0 SDK
- Visual Studio 2022 / VS Code / JetBrains Rider
# Clone the repository
git clone https://github.com/your-org/azure-ai-search-simulator.git
cd azure-ai-search-simulator
# Build the solution
dotnet build
# Run the simulator (HTTPS - recommended for Azure SDK compatibility)
dotnet run --project src/AzureAISearchSimulator.Api --urls "https://localhost:7250"
# API available at https://localhost:7250A pre-built Docker image is published to GitHub Container Registry on every release:
# Pull the latest image
docker pull ghcr.io/Ellerbach/azure-ai-search-simulator:latest
# Or pull a specific version
docker pull ghcr.io/Ellerbach/azure-ai-search-simulator:1.0.0
# Run the container
docker run -d --name azure-ai-search-simulator \
-p 7250:8443 -p 5250:8080 \
-v search-data:/app/data \
-v lucene-indexes:/app/lucene-indexes \
-v ./logs:/app/logs \
-v ./files:/app/files \
-v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
ghcr.io/Ellerbach/azure-ai-search-simulator:latest
# API available at https://localhost:7250 (HTTPS) or http://localhost:5250 (HTTP)You can also use it in a docker-compose.yml:
services:
search-simulator:
image: ghcr.io/Ellerbach/azure-ai-search-simulator:latest
ports:
- "7250:8443"
- "5250:8080"
environment:
- SimulatorSettings__AdminApiKey=admin-key-12345
- SimulatorSettings__QueryApiKey=query-key-67890
volumes:
- search-data:/app/data
- lucene-indexes:/app/lucene-indexes
- ./logs:/app/logs
- ./files:/app/files
# Mount ONNX models for local embedding (download first with scripts/Download-EmbeddingModel.ps1)
- ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro
volumes:
search-data:
lucene-indexes:Available image tags:
| Tag | Description |
|---|---|
latest |
Latest release |
x.y.z (e.g. 1.0.0) |
Specific version |
x.y (e.g. 1.0) |
Latest patch for a minor version |
x (e.g. 1) |
Latest minor/patch for a major version |
If you prefer to build the image yourself:
# Build and run with docker-compose
docker-compose up -d
# Or build the image manually
docker build -t azure-ai-search-simulator .
docker run -p 7250:8443 -p 5250:8080 \
-v search-data:/app/data \
-v lucene-indexes:/app/lucene-indexes \
-v ./logs:/app/logs \
-v ./files:/app/files \
-v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
azure-ai-search-simulator
# API available at https://localhost:7250 (HTTPS) or http://localhost:5250 (HTTP)The container exposes four mount points for data persistence and file access:
| Mount Point | Purpose | Recommended Mount |
|---|---|---|
/app/data |
LiteDB database (index metadata, data sources, indexer state) | Named volume |
/app/lucene-indexes |
Lucene search index files | Named volume |
/app/logs |
Serilog log files (simulator-{date}.log) |
Bind mount for easy host access |
/app/files |
Documents for indexer file processing (pull mode) | Bind mount to your documents folder |
/app/models |
ONNX embedding models for local local:// skill mode |
Bind mount (read-only) |
Example: Mount a local documents folder for indexer processing
# Mount your documents folder so indexers can access them inside the container
docker run -p 7250:8443 -p 5250:8080 \
-v search-data:/app/data \
-v ./your-documents:/app/files \
azure-ai-search-simulatorExample: Mount ONNX models for local embedding
# Download a model first, then mount the models directory
.\scripts\Download-EmbeddingModel.ps1 -ModelName all-MiniLM-L6-v2
docker run -p 7250:8443 -p 5250:8080 \
-v search-data:/app/data \
-v ./src/AzureAISearchSimulator.Api/data/models:/app/models:ro \
azure-ai-search-simulatorThen create a filesystem data source pointing to /app/files:
PUT https://localhost:7250/datasources/my-files?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345
{
"name": "my-files",
"type": "filesystem",
"credentials": { "connectionString": "/app/files" },
"container": { "name": "subfolder" }
}Note: The Docker image generates a self-signed certificate for HTTPS. You'll need to skip certificate validation in your client (see Azure SDK example below).
The simulator is compatible with the official Azure.Search.Documents SDK. Note that the SDK requires HTTPS.
using Azure;
using Azure.Search.Documents;
using Azure.Search.Documents.Indexes;
// Point to local simulator (HTTPS required)
var endpoint = new Uri("https://localhost:7250");
var credential = new AzureKeyCredential("admin-key-12345");
// Skip certificate validation for local development
var handler = new HttpClientHandler
{
ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator
};
var options = new SearchClientOptions
{
Transport = new Azure.Core.Pipeline.HttpClientTransport(handler)
};
// Create clients
var indexClient = new SearchIndexClient(endpoint, credential, options);
var searchClient = new SearchClient(endpoint, "my-index", credential, options);
// Create an index
var index = new SearchIndex("my-index")
{
Fields = new[]
{
new SimpleField("id", SearchFieldDataType.String) { IsKey = true },
new SearchableField("title") { IsFilterable = true },
new SearchableField("content"),
new SimpleField("rating", SearchFieldDataType.Double) { IsFilterable = true, IsSortable = true }
}
};
await indexClient.CreateIndexAsync(index);
// Upload documents
var documents = new[]
{
new { id = "1", title = "Document One", content = "This is the first document", rating = 4.5 },
new { id = "2", title = "Document Two", content = "This is the second document", rating = 3.8 }
};
await searchClient.UploadDocumentsAsync(documents);
// Search
var results = await searchClient.SearchAsync<SearchDocument>("first");
await foreach (var result in results.Value.GetResultsAsync())
{
Console.WriteLine($"Found: {result.Document["title"]} (Score: {result.Score})");
}The simulator also works with the official azure-search-documents Python SDK:
import requests
from azure.core.credentials import AzureKeyCredential
from azure.core.pipeline.transport import RequestsTransport
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex, SimpleField, SearchableField, SearchFieldDataType
)
# Point to local simulator (HTTPS required)
endpoint = "https://localhost:7250"
credential = AzureKeyCredential("admin-key-12345")
# Skip certificate validation for local development (self-signed cert)
session = requests.Session()
session.verify = False
transport = RequestsTransport(session=session, connection_verify=False)
# Create clients
index_client = SearchIndexClient(endpoint, credential, transport=transport, connection_verify=False)
search_client = SearchClient(endpoint, "my-index", credential, transport=transport, connection_verify=False)
# Create an index
index = SearchIndex(
name="my-index",
fields=[
SimpleField(name="id", type=SearchFieldDataType.String, key=True),
SearchableField(name="title", filterable=True),
SearchableField(name="content"),
SimpleField(name="rating", type=SearchFieldDataType.Double, filterable=True, sortable=True),
],
)
index_client.create_index(index)
# Upload documents
documents = [
{"id": "1", "title": "Document One", "content": "This is the first document", "rating": 4.5},
{"id": "2", "title": "Document Two", "content": "This is the second document", "rating": 3.8},
]
search_client.upload_documents(documents)
# Search
results = search_client.search("first")
for result in results:
print(f"Found: {result['title']} (Score: {result['@search.score']})")### Create an index
POST https://localhost:7250/indexes?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345
{
"name": "hotels",
"fields": [
{ "name": "hotelId", "type": "Edm.String", "key": true },
{ "name": "hotelName", "type": "Edm.String", "searchable": true },
{ "name": "description", "type": "Edm.String", "searchable": true },
{ "name": "rating", "type": "Edm.Double", "filterable": true, "sortable": true }
]
}
### Upload documents
POST https://localhost:7250/indexes/hotels/docs/index?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345
{
"value": [
{
"@search.action": "upload",
"hotelId": "1",
"hotelName": "Fancy Hotel",
"description": "A luxury hotel with great amenities",
"rating": 4.8
}
]
}
### Search
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890
{
"search": "luxury",
"filter": "rating ge 4",
"orderby": "rating desc",
"top": 10
}
### Vector Search
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890
{
"vectorQueries": [
{
"kind": "vector",
"vector": [0.01, 0.02, ...],
"fields": "descriptionVector",
"k": 10
}
]
}
### Hybrid Search (Text + Vector)
POST https://localhost:7250/indexes/hotels/docs/search?api-version=2024-07-01
Content-Type: application/json
api-key: query-key-67890
{
"search": "luxury hotel",
"vectorQueries": [
{
"kind": "vector",
"vector": [0.01, 0.02, ...],
"fields": "descriptionVector",
"k": 10
}
]
}
### Create Data Source (Pull Model)
PUT https://localhost:7250/datasources/my-files?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345
{
"name": "my-files",
"type": "filesystem",
"credentials": {
"connectionString": "c:\\data\\documents"
},
"container": {
"name": "pdfs"
}
}
### Create Indexer
PUT https://localhost:7250/indexers/my-indexer?api-version=2024-07-01
Content-Type: application/json
api-key: admin-key-12345
{
"name": "my-indexer",
"dataSourceName": "my-files",
"targetIndexName": "documents",
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "id"
}
]
}
### Run Indexer
POST https://localhost:7250/indexers/my-indexer/run?api-version=2024-07-01
api-key: admin-key-12345The simulator supports three authentication modes that can be enabled simultaneously:
api-key: admin-key-12345Generate JWT tokens locally for testing RBAC without Azure:
### Get a token with Search Index Data Contributor role
GET https://localhost:7250/admin/token/quick/data-contributor
api-key: admin-key-12345
### Use the token
GET https://localhost:7250/indexes
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...Validate real Azure AD tokens for production-like testing:
var credential = new DefaultAzureCredential();
var searchClient = new SearchClient(endpoint, "my-index", credential);The simulator enforces Azure AI Search RBAC:
| Role | Permissions |
|---|---|
| Search Service Contributor | Manage indexes, indexers, data sources, skillsets |
| Search Index Data Contributor | Upload, merge, delete documents |
| Search Index Data Reader | Search, suggest, autocomplete |
π See docs/AUTHENTICATION.md for the complete authentication guide.
Edit appsettings.json to customize the simulator:
{
"SimulatorSettings": {
"ServiceName": "local-search-simulator",
"DataDirectory": "./data",
"AdminApiKey": "admin-key-12345",
"QueryApiKey": "query-key-67890",
"MaxIndexes": 50,
"MaxDocumentsPerIndex": 100000
}
}Enable verbose logging to debug indexer and skill pipeline execution:
{
"DiagnosticLogging": {
"Enabled": true,
"LogDocumentDetails": true,
"LogSkillExecution": true,
"LogSkillInputPayloads": true,
"LogSkillOutputPayloads": true,
"LogEnrichedDocumentState": false,
"LogFieldMappings": true,
"MaxStringLogLength": 500
}
}Logs are written to logs/simulator-{date}.log and console. Look for [DIAGNOSTIC] prefixed entries.
- Development Plan - Full project plan and architecture
- API Reference - Complete REST API documentation
- Configuration Guide - Detailed configuration options
- Authentication Guide - API keys, JWT tokens, Entra ID, and RBAC
- Limitations - Differences from Azure AI Search
All .http sample files use environment variables via $dotenv. To get started:
- Copy
.env.exampleto.envin the workspace root - Fill in your values (API keys, storage credentials, etc.)
- The
.envfile is gitignored and will not be committed
| Sample | Description |
|---|---|
| AzureSdkSample | C# console app demonstrating Azure.Search.Documents SDK compatibility |
| AzureSearchNotebook | Python Jupyter notebook with comprehensive search demos and skillset integration |
| IndexerTestNotebook | Python notebook for testing indexers with JSON metadata files |
| EmbeddingSkillNotebook | Python notebook demonstrating Azure OpenAI Embedding skill, vector search, and hybrid search with RRF fusion |
| CustomSkillSample | ASP.NET Core API implementing custom Web API skills (text stats, keywords, sentiment, summarization) |
| sample-requests.http | REST Client file with comprehensive API examples |
| compare-requests.http | REST Client file to send identical requests to the simulator and real Azure AI Search side by side |
| synonym-map-sample.http | REST Client file demonstrating synonym maps (CRUD + search expansion), with [SIM] / [AZURE] pairs |
| Compare-Results.ps1 | PowerShell script that automates comparison and shows a color-coded diff of responses |
| pull-mode-test.http | REST Client file for testing indexer pull mode workflow |
| local-embedding-sample.http | REST Client file demonstrating local ONNX embedding skill (no Azure OpenAI required) |
| index-projection-sample.http | REST Client file demonstrating index projections (one-to-many chunking into a secondary index) |
| scoring-profile-sample.http | REST Client file demonstrating scoring profiles (text weights, magnitude, freshness, tag, combined) |
| similarity-sample.http | REST Client file demonstrating similarity algorithms (default BM25, custom BM25 k1/b, ClassicSimilarity TF-IDF) |
| Download-EmbeddingModel.ps1 | PowerShell script to download ONNX embedding models from HuggingFace |
You can run the same queries against both the local simulator and a real Azure AI Search service to verify behavioral parity:
- In
.env, setBASE_URL/ADMIN_KEY/QUERY_KEY(simulator) andAZURE_BASE_URL/AZURE_ADMIN_KEY/AZURE_QUERY_KEY(real service). - Interactive β Open compare-requests.http and click "Send Request" on matched
[SIM]/[AZURE]pairs. Use VS Code split tabs to view both responses. - Automated β Run the PowerShell comparison script:
# Run all scenarios (create index, upload docs, search, cleanup)
.\scripts\Compare-Results.ps1
# Run a single scenario
.\scripts\Compare-Results.ps1 -Scenario SimpleSearch
# Show full JSON even when responses match
.\scripts\Compare-Results.ps1 -ShowFullResponseThe script highlights MATCH (green) or DIFFERENCES (red) for each scenario, ignoring dynamic fields like @odata.etag and @odata.context.
AzureAISearchSimulator/
βββ src/
β βββ AzureAISearchSimulator.Api/ # REST API layer
β βββ AzureAISearchSimulator.Core/ # Business logic
β βββ AzureAISearchSimulator.Search/ # Lucene.NET search engine & skills
β βββ AzureAISearchSimulator.DataSources/# Data source connectors
β βββ AzureAISearchSimulator.Storage/ # Persistence layer
βββ tests/
βββ samples/
βββ docs/
| Feature | Azure AI Search | Simulator |
|---|---|---|
| Full-text search | β | β |
| Filtering & facets | β | β |
| Vector search | β | β (cosine similarity) |
| Hybrid search | β | β |
| Highlighting | β | β |
| Autocomplete | β | β |
| Suggestions | β | β |
| Indexers | β | β (Blob, ADLS, filesystem) |
| Skillsets (utility) | β | β |
| Custom Web API Skill | β | β |
| Azure OpenAI Embedding | β | β |
| Document Cracking | β | β |
| Semantic search | β | β |
| AI skills (OCR, etc.) | β | β |
| Managed Identity | β | β (simulated) |
| Entra ID Authentication | β | β |
| Scoring Profiles | β | β |
| Similarity Algorithms | β (BM25, Classic) | β (BM25, Classic) |
| Scale (millions of docs) | β | Limited |
| Skill Category | Azure AI Search | Simulator |
|---|---|---|
| Utility Skills (Split, Merge, Shaper, Conditional) | β | β |
| Document Extraction Skill | β | β |
| Custom Web API Skill | β | β |
| Azure OpenAI Embedding Skill | β | β |
| AI Vision Skills (OCR, Image Analysis) | β | β |
| AI Language Skills (Entity Recognition, Sentiment, PII, etc.) | β | β |
| Translation Skill | β | β |
| GenAI Prompt Skill | β | β |
Tip: Use the Custom Web API Skill to implement your own versions of missing AI skills. See samples/CustomSkillSample for examples.
Contributions are welcome! Please read our Contributing Guide for details.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Lucene.NET
- Built with HNSW
- Using ONNX for local embeddings
- Inspired by Azure AI Search