From 487b023230b13edb5f5215787deb3b7dd5bf2df8 Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Tue, 9 Jun 2026 07:23:51 +0200 Subject: [PATCH 01/10] Align semantic similarity docs with aigw 2 --- app/ai-gateway/semantic-similarity.md | 76 +++++++++++---------------- 1 file changed, 31 insertions(+), 45 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index b25cee3146..b7944b24d2 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -1,44 +1,31 @@ --- -title: "Embedding-based similarity matching in Kong AI gateway plugins" +title: "Embedding-based similarity matching in {{site.ai_gateway}} policies" layout: reference content_type: reference -description: This reference explains how {{site.ai_gateway}} plugins use embedding-based similarity to compare prompts with various inputs—such as cached entries, upstream targets, document chunks, or allow/deny lists. +description: This reference explains how {{site.ai_gateway}} policies use embedding-based similarity to compare prompts with various inputs—such as cached entries, routing targets, document chunks, or allow/deny lists. breadcrumbs: - /ai-gateway/ works_on: - - on-prem - konnect products: - - gateway - ai-gateway tags: - ai - load-balancing -plugins: - - ai-proxy-advanced - - ai-semantic-cache - - ai-rag-injector - - ai-semantic-prompt-guard - - ai-semantic-response-guard - min_version: - gateway: '3.10' + ai-gateway: '2.0.0' related_resources: - text: "{{site.ai_gateway}}" url: /ai-gateway/ - - text: "{{site.ai_gateway}} plugins" + - text: Policy entity + url: /ai-gateway/entities/ai-policy/ + - text: "{{site.ai_gateway}} policies" url: /plugins/?category=ai - - text: Use AI Semantic Prompt Guard plugin to govern your LLM traffic - url: /how-to/use-ai-semantic-prompt-guard-plugin/ - - text: Ensure chatbots adhere to compliance policies with the AI RAG Injector plugin - url: /how-to/use-ai-rag-injector-plugin/ - - text: Control prompt size with the AI Compressor plugin - url: /how-to/compress-llm-prompts/ - text: Semantic processing and vector similarity search with Kong and Redis url: https://konghq.com/blog/engineering/semantic-processing-and-vector-similarity-search-with-kong-and-redis - text: Vector embeddings @@ -62,59 +49,58 @@ Vector embeddings power a range of LLM workflows, including semantic search, doc ## Semantic similarity in {{site.ai_gateway}} -In {{site.ai_gateway}}, several plugins leverage embedding-based similarity: +In {{site.ai_gateway}} 2.0+, several policies leverage embedding-based similarity. These are implemented as [Policy entities](/ai-gateway/entities/ai-policy/) with a `type` field that specifies the policy implementation. Each policy can be attached to Models, Agents, MCPs, Consumers, or deployed globally. {% table %} columns: - - title: Plugin - key: plugin + - title: Policy Type + key: policy - title: Description key: description rows: - - plugin: "[AI Proxy Advanced](/plugins/ai-semantic-prompt-guard/)" - description: Performs semantic routing by embedding each upstream’s description at config time and storing the results in a selected vector database. At runtime, it embeds the prompt and queries vector database to route requests to the most semantically appropriate upstream. - - plugin: "[AI Semantic Cache](/plugins/ai-semantic-cache/)" + - policy: "[ai-proxy-advanced](/plugins/ai-proxy-advanced/)" + description: Performs semantic routing by embedding each target’s description at config time and storing the results in a selected vector database. At runtime, it embeds the prompt and queries the vector database to route requests to the most semantically appropriate target. + - policy: "[ai-semantic-cache](/plugins/ai-semantic-cache/)" description: Indexes previous prompts and responses as embeddings. On each request, it searches for semantically similar inputs and serves cached responses when possible to reduce redundant LLM calls. - - plugin: "[AI RAG Injector](/plugins/ai-rag-injector/)" + - policy: "[ai-rag-injector](/plugins/ai-rag-injector/)" description: Retrieves semantically relevant chunks from a vector database. It embeds the prompt, performs a similarity search, and injects the results into the prompt to enable retrieval-augmented generation. - - plugin: "[AI Semantic Prompt Guard](/plugins/ai-semantic-prompt-guard/)" + - policy: "[ai-semantic-prompt-guard](/plugins/ai-semantic-prompt-guard/)" description: Compares incoming prompts against allow/deny lists using embedding similarity to detect and block misuse patterns. - - plugin: | - [AI Semantic Response Guard](/plugins/ai-semantic-response-guard/) {% new_in 3.12 %} + - policy: "[ai-semantic-response-guard](/plugins/ai-semantic-response-guard/)" description: Filters LLM responses by comparing their semantic content against predefined allow and deny lists. It analyzes the full response body, generates embeddings, and enforces rules to block unsafe or unwanted outputs before returning them to the client. {% endtable %} ### Vector databases -To compare embeddings efficiently, {{site.ai_gateway}} semantic plugins rely on vector databases. These specialized data stores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. +To compare embeddings efficiently, {{site.ai_gateway}} semantic policies rely on vector databases. These specialized data stores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. -When a plugin needs to find semantically similar content—whether it’s a past prompt, an upstream description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the plugin to make decisions like caching, routing, injecting, or blocking. +When a policy needs to find semantically similar content—whether it’s a past prompt, a routing target description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the policy to make decisions like caching, routing, injecting, or blocking. {% include_cached /plugins/ai-vector-db.md name=page.name %} -The selected database stores the embeddings generated by the plugin (either at config time or runtime), and determines the accuracy and performance of semantic operations. +The selected database stores the embeddings generated by the policy (either at config time or runtime), and determines the accuracy and performance of semantic operations. ### What is compared for similarity? -Each plugin applies similarity search slightly differently depending on its goal. These comparisons determine whether the plugin routes, blocks, reuses, or enriches a prompt based on meaning rather than syntax. +Each policy applies similarity search slightly differently depending on its goal. These comparisons determine whether the policy routes, blocks, reuses, or enriches a prompt based on meaning rather than syntax. -The following table describes how each AI plugin compares embeddings: +The following table describes how each semantic policy compares embeddings: {% table %} columns: - - title: Plugin - key: plugin + - title: Policy Type + key: policy - title: Compared embeddings key: comparison rows: - - plugin: "AI Proxy Advanced" - comparison: "Prompt vs. `description` field of each upstream target" - - plugin: "AI Semantic Prompt Guard" + - policy: "[ai-proxy-advanced](/plugins/ai-proxy-advanced/)" + comparison: "Prompt vs. `description` field of each target" + - policy: "[ai-semantic-prompt-guard](/plugins/ai-semantic-prompt-guard/)" comparison: "Prompt vs. allowlist and denylist prompts" - - plugin: "AI Semantic Cache" + - policy: "[ai-semantic-cache](/plugins/ai-semantic-cache/)" comparison: "Prompt vs. cached prompt keys" - - plugin: "AI RAG Injector" + - policy: "[ai-rag-injector](/plugins/ai-rag-injector/)" comparison: "Prompt vs. vectorized document chunks" {% endtable %} @@ -219,7 +205,7 @@ rows: ### Cosine and Euclidean similarity -{{site.ai_gateway}} supports both cosine similarity and Euclidean distance for vector comparisons, allowing you to choose the method best suited for your use case. You can configure the method using `config.vectordb.distance_metric` setting in the respective plugin. +{{site.ai_gateway}} supports both cosine similarity and Euclidean distance for vector comparisons, allowing you to choose the method best suited for your use case. You can configure the method using `config.vectordb.distance_metric` setting in the respective policy. * Use `cosine` for nuanced semantic similarity (for example, document comparison, text clustering), especially when content length varies or dataset diversity is high. * Use `euclidean` when magnitude matters (for example, images, sensor data) or you're working with dense, well-aligned feature sets. @@ -274,7 +260,7 @@ rows: ## Similarity threshold -The `vectordb.threshold` parameter controls how strictly the vector database evaluates similarity during a query. It is passed directly to the vector engine—such as Redis or PGVector—and defines which results qualify as matches. In Redis, for example, this maps to the `distance_threshold` query parameter. By default, Redis sets this to `0.2`, but you can override it to suit your use case. +The `config.vectordb.threshold` parameter controls how strictly the vector database evaluates similarity during a query. It is passed directly to the vector engine—such as Redis or PGVector—and defines which results qualify as matches. In Redis, for example, this maps to the `distance_threshold` query parameter. By default, Redis sets this to `0.2`, but you can override it to suit your use case. The threshold defines how permissive the matching is. **Higher threshold values allow looser matches, while lower values enforce stricter matching.** The threshold range is 0 to 1. @@ -288,11 +274,11 @@ In both cases, if the [{{site.base_gateway}} logs](/gateway/logs/) indicate "no The optimal threshold depends on the selected distance metric, the embedding model's dimensionality, and the variation in your data. Tuning may be required for best results. {:.info} -> In Kong's AI semantic plugins, this threshold is **not** post-processed or filtered by the plugin itself. The plugin sends it directly to the vector database, which uses it to determine matching documents based on the configured **distance metric**. +> In {{site.ai_gateway}} semantic policies, this threshold is **not** post-processed or filtered by the policy itself. The policy sends it directly to the vector database, which uses it to determine matching documents based on the configured **distance metric**. ### Threshold sensitivity and cache hit effectiveness -The closer your similarity threshold is to `1`, the more likely you are to get **cache misses** when using plugins like **AI Semantic Cache**. This is because a higher threshold makes the similarity filter more strict, so only embeddings that are nearly identical to the query will qualify as a match. In practice, this means even small variations in phrasing, structure, or context can cause the system to miss otherwise semantically similar entries and fall back to calling the LLM again. +The closer your similarity threshold is to `1`, the more likely you are to get **cache misses** when using the **ai-semantic-cache** policy. This is because a higher threshold makes the similarity filter more strict, so only embeddings that are nearly identical to the query will qualify as a match. In practice, this means even small variations in phrasing, structure, or context can cause the system to miss otherwise semantically similar entries and fall back to calling the LLM again. This happens because vector embeddings are not perfectly robust to minor semantic shifts, especially for short or ambiguous prompts. Raising the threshold narrows the match window, so you're effectively demanding a near-exact match in a complex vector space, which is rare unless the input is repeated verbatim. From da88ca1c7f1006ba159d3f515dc973debeb1b922 Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Tue, 9 Jun 2026 12:52:16 +0200 Subject: [PATCH 02/10] update semantic similarity reference --- app/ai-gateway/semantic-similarity.md | 62 ++++++++------------------- 1 file changed, 18 insertions(+), 44 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index b7944b24d2..1937b5815e 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -49,61 +49,35 @@ Vector embeddings power a range of LLM workflows, including semantic search, doc ## Semantic similarity in {{site.ai_gateway}} -In {{site.ai_gateway}} 2.0+, several policies leverage embedding-based similarity. These are implemented as [Policy entities](/ai-gateway/entities/ai-policy/) with a `type` field that specifies the policy implementation. Each policy can be attached to Models, Agents, MCPs, Consumers, or deployed globally. +In {{site.ai_gateway}} {% new_in 2.0.0 %}, semantic similarity enables intelligent request routing, caching, and content filtering based on meaning rather than exact matches. A [Model](/ai-gateway/entities/ai-model/) can leverage semantic similarity in two ways: -{% table %} -columns: - - title: Policy Type - key: policy - - title: Description - key: description -rows: - - policy: "[ai-proxy-advanced](/plugins/ai-proxy-advanced/)" - description: Performs semantic routing by embedding each target’s description at config time and storing the results in a selected vector database. At runtime, it embeds the prompt and queries the vector database to route requests to the most semantically appropriate target. - - policy: "[ai-semantic-cache](/plugins/ai-semantic-cache/)" - description: Indexes previous prompts and responses as embeddings. On each request, it searches for semantically similar inputs and serves cached responses when possible to reduce redundant LLM calls. - - policy: "[ai-rag-injector](/plugins/ai-rag-injector/)" - description: Retrieves semantically relevant chunks from a vector database. It embeds the prompt, performs a similarity search, and injects the results into the prompt to enable retrieval-augmented generation. - - policy: "[ai-semantic-prompt-guard](/plugins/ai-semantic-prompt-guard/)" - description: Compares incoming prompts against allow/deny lists using embedding similarity to detect and block misuse patterns. - - policy: "[ai-semantic-response-guard](/plugins/ai-semantic-response-guard/)" - description: Filters LLM responses by comparing their semantic content against predefined allow and deny lists. It analyzes the full response body, generates embeddings, and enforces rules to block unsafe or unwanted outputs before returning them to the client. -{% endtable %} +1. **Semantic load balancing**: Route requests to upstream providers based on how semantically similar the prompt is to each provider's capabilities, using the `semantic` load balancing algorithm. +2. **Semantic policies**: Attach policies like AI Semantic Cache or AI Semantic Prompt Guard to add similarity-based caching, retrieval-augmented generation (RAG), and guardrailing. ### Vector databases -To compare embeddings efficiently, {{site.ai_gateway}} semantic policies rely on vector databases. These specialized data stores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. +To compare embeddings efficiently, {{site.ai_gateway}} semantic features rely on vector databases. These specialized data stores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. -When a policy needs to find semantically similar content—whether it’s a past prompt, a routing target description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the policy to make decisions like caching, routing, injecting, or blocking. +When a Model’s semantic load balancer or an attached semantic policy needs to find semantically similar content—whether it’s a prompt, a routing target description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the Model or policy to make decisions like routing, caching, injecting, or blocking. -{% include_cached /plugins/ai-vector-db.md name=page.name %} +A Model’s [semantic load balancer](/ai-gateway/entities/ai-model/#algorithms) stores vector representations of each target model’s semantic description at configuration time, and uses the vector database to compare incoming prompts against those stored vectors. Semantic policies use the same vector database to perform similarity searches at request time. -The selected database stores the embeddings generated by the policy (either at config time or runtime), and determines the accuracy and performance of semantic operations. +The selected database stores the embeddings generated by the Model or policies (either at config time or runtime), and determines the accuracy and performance of semantic operations. {{site.ai_gateway}} supports Redis with Vector Similarity Search (VSS) and PostgreSQL with the pgvector extension. -### What is compared for similarity? +### Similarity in Model load balancing and policies -Each policy applies similarity search slightly differently depending on its goal. These comparisons determine whether the policy routes, blocks, reuses, or enriches a prompt based on meaning rather than syntax. +Semantic similarity is applied differently depending on the feature: -The following table describes how each semantic policy compares embeddings: +**Model semantic load balancing** (`semantic` algorithm): +- Compares incoming prompt embeddings against stored embeddings of each target model's semantic description. +- Routes requests to the target whose description is most semantically similar to the prompt. +- Embeddings are computed at config time for targets and at request time for prompts. - -{% table %} -columns: - - title: Policy Type - key: policy - - title: Compared embeddings - key: comparison -rows: - - policy: "[ai-proxy-advanced](/plugins/ai-proxy-advanced/)" - comparison: "Prompt vs. `description` field of each target" - - policy: "[ai-semantic-prompt-guard](/plugins/ai-semantic-prompt-guard/)" - comparison: "Prompt vs. allowlist and denylist prompts" - - policy: "[ai-semantic-cache](/plugins/ai-semantic-cache/)" - comparison: "Prompt vs. cached prompt keys" - - policy: "[ai-rag-injector](/plugins/ai-rag-injector/)" - comparison: "Prompt vs. vectorized document chunks" -{% endtable %} - +**Semantic policies**: +- Each semantic policy applies similarity search slightly differently based on its goal. +- AI Semantic Cache compares prompts against cached prompt keys to find reusable responses. +- AI RAG Injector compares prompts against vectorized document chunks to retrieve relevant context. +- AI Semantic Prompt Guard and AI Semantic Response Guard compare content against allowlists and denylists to detect misuse patterns. From 1318526622b06029c5d7560bdef0700bbf5ab5ac Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Wed, 10 Jun 2026 10:31:34 +0200 Subject: [PATCH 03/10] Multiple fixes --- app/ai-gateway/semantic-similarity.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index 1937b5815e..bd8410dcf5 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -24,8 +24,8 @@ related_resources: url: /ai-gateway/ - text: Policy entity url: /ai-gateway/entities/ai-policy/ - - text: "{{site.ai_gateway}} policies" - url: /plugins/?category=ai + - text: "{{site.ai_gateway}} Model entity" + url: /ai-gateway/entities/ai-model/ - text: Semantic processing and vector similarity search with Kong and Redis url: https://konghq.com/blog/engineering/semantic-processing-and-vector-similarity-search-with-kong-and-redis - text: Vector embeddings @@ -52,7 +52,7 @@ Vector embeddings power a range of LLM workflows, including semantic search, doc In {{site.ai_gateway}} {% new_in 2.0.0 %}, semantic similarity enables intelligent request routing, caching, and content filtering based on meaning rather than exact matches. A [Model](/ai-gateway/entities/ai-model/) can leverage semantic similarity in two ways: 1. **Semantic load balancing**: Route requests to upstream providers based on how semantically similar the prompt is to each provider's capabilities, using the `semantic` load balancing algorithm. -2. **Semantic policies**: Attach policies like AI Semantic Cache or AI Semantic Prompt Guard to add similarity-based caching, retrieval-augmented generation (RAG), and guardrailing. +2. **Semantic policies**: Attach policies like AI Semantic Cache or AI Semantic Prompt Guard to add similarity-based caching, retrieval-augmented generation (RAG), and guardrails. ### Vector databases @@ -77,7 +77,7 @@ Semantic similarity is applied differently depending on the feature: - Each semantic policy applies similarity search slightly differently based on its goal. - AI Semantic Cache compares prompts against cached prompt keys to find reusable responses. - AI RAG Injector compares prompts against vectorized document chunks to retrieve relevant context. -- AI Semantic Prompt Guard and AI Semantic Response Guard compare content against allowlists and denylists to detect misuse patterns. +- AI Semantic Prompt Guard and AI Semantic Response Guard compare content against allow and deny lists to detect misuse patterns. @@ -87,7 +87,7 @@ Embedding models work by converting text into high-dimensional floating-point ar Dimensionality determines how many numerical features represent each piece of content—similar to how a detailed profile might have dimensions for age, interests, location, and preferences. Higher dimensions create more detailed "fingerprints" that capture nuanced relationships, with smaller distances between vectors indicating stronger conceptual similarity and larger distances showing weaker associations. -For example, this request to the OpenAI [/embeddings API](/plugins/ai-proxy/examples/embeddings-route-type/) via {{site.ai_gateway}}: +For example, this request to the OpenAI `/embeddings` API via {{site.ai_gateway}}: ```json { @@ -252,7 +252,7 @@ The optimal threshold depends on the selected distance metric, the embedding mod ### Threshold sensitivity and cache hit effectiveness -The closer your similarity threshold is to `1`, the more likely you are to get **cache misses** when using the **ai-semantic-cache** policy. This is because a higher threshold makes the similarity filter more strict, so only embeddings that are nearly identical to the query will qualify as a match. In practice, this means even small variations in phrasing, structure, or context can cause the system to miss otherwise semantically similar entries and fall back to calling the LLM again. +The closer your similarity threshold is to `1`, the more likely you are to get **cache misses** when using the **AI Semantic Cache** policy. This is because a higher threshold makes the similarity filter more strict, so only embeddings that are nearly identical to the query will qualify as a match. In practice, this means even small variations in phrasing, structure, or context can cause the system to miss otherwise semantically similar entries and fall back to calling the LLM again. This happens because vector embeddings are not perfectly robust to minor semantic shifts, especially for short or ambiguous prompts. Raising the threshold narrows the match window, so you're effectively demanding a near-exact match in a complex vector space, which is rare unless the input is repeated verbatim. From 1f3ab35dc660763f8ce7f92aaf28834170b67987 Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Wed, 10 Jun 2026 11:31:23 +0200 Subject: [PATCH 04/10] Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- app/ai-gateway/semantic-similarity.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index bd8410dcf5..33561bdbca 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -58,7 +58,7 @@ In {{site.ai_gateway}} {% new_in 2.0.0 %}, semantic similarity enables intellige To compare embeddings efficiently, {{site.ai_gateway}} semantic features rely on vector databases. These specialized data stores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. -When a Model’s semantic load balancer or an attached semantic policy needs to find semantically similar content—whether it’s a prompt, a routing target description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the Model or policy to make decisions like routing, caching, injecting, or blocking. +When a Model’s semantic load balancer or an attached semantic policy needs to find semantically similar content—whether it’s a prompt, a target model description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the Model or policy to make decisions like routing, caching, injecting, or blocking. A Model’s [semantic load balancer](/ai-gateway/entities/ai-model/#algorithms) stores vector representations of each target model’s semantic description at configuration time, and uses the vector database to compare incoming prompts against those stored vectors. Semantic policies use the same vector database to perform similarity searches at request time. @@ -179,7 +179,7 @@ rows: ### Cosine and Euclidean similarity -{{site.ai_gateway}} supports both cosine similarity and Euclidean distance for vector comparisons, allowing you to choose the method best suited for your use case. You can configure the method using `config.vectordb.distance_metric` setting in the respective policy. +{{site.ai_gateway}} supports both cosine similarity and Euclidean distance for vector comparisons, allowing you to choose the method best suited for your use case. You can configure the method using the `config.vectordb.distance_metric` setting in the respective policy. * Use `cosine` for nuanced semantic similarity (for example, document comparison, text clustering), especially when content length varies or dataset diversity is high. * Use `euclidean` when magnitude matters (for example, images, sensor data) or you're working with dense, well-aligned feature sets. @@ -234,7 +234,7 @@ rows: ## Similarity threshold -The `config.vectordb.threshold` parameter controls how strictly the vector database evaluates similarity during a query. It is passed directly to the vector engine—such as Redis or PGVector—and defines which results qualify as matches. In Redis, for example, this maps to the `distance_threshold` query parameter. By default, Redis sets this to `0.2`, but you can override it to suit your use case. +The `config.vectordb.threshold` parameter controls how strictly the vector database evaluates similarity during a query. It is passed directly to the vector engine—such as Redis or PostgreSQL (pgvector)—and defines which results qualify as matches. In Redis, for example, this maps to the `distance_threshold` query parameter. By default, Redis sets this to `0.2`, but you can override it to suit your use case. The threshold defines how permissive the matching is. **Higher threshold values allow looser matches, while lower values enforce stricter matching.** The threshold range is 0 to 1. From d0233cb4b7fcb6b875123907c7307f79115a0602 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 10 Jun 2026 11:11:07 +0000 Subject: [PATCH 05/10] Broaden semantic similarity page wording --- app/ai-gateway/semantic-similarity.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index 33561bdbca..3250453023 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -1,8 +1,8 @@ --- -title: "Embedding-based similarity matching in {{site.ai_gateway}} policies" +title: "Embedding-based similarity matching in {{site.ai_gateway}}" layout: reference content_type: reference -description: This reference explains how {{site.ai_gateway}} policies use embedding-based similarity to compare prompts with various inputs—such as cached entries, routing targets, document chunks, or allow/deny lists. +description: This reference explains how {{site.ai_gateway}} uses embedding-based similarity to compare prompts with various inputs—such as cached entries, target model descriptions, document chunks, or allow/deny lists. breadcrumbs: - /ai-gateway/ From b332ab8c154965bfce39595afa10d79e4df192fa Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Thu, 11 Jun 2026 06:49:40 +0200 Subject: [PATCH 06/10] Apply suggestions from code review Co-authored-by: jbaross --- app/ai-gateway/semantic-similarity.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index 3250453023..c23787f82c 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -49,24 +49,23 @@ Vector embeddings power a range of LLM workflows, including semantic search, doc ## Semantic similarity in {{site.ai_gateway}} -In {{site.ai_gateway}} {% new_in 2.0.0 %}, semantic similarity enables intelligent request routing, caching, and content filtering based on meaning rather than exact matches. A [Model](/ai-gateway/entities/ai-model/) can leverage semantic similarity in two ways: +In {{site.ai_gateway}} {% new_in 2.0.0 %}, you can perform intelligent request routing, caching, and content filtering based on meaning rather than exact matches by using semantic similarity queries. A [Model](/ai-gateway/entities/ai-model/) can leverage semantic similarity in two ways: 1. **Semantic load balancing**: Route requests to upstream providers based on how semantically similar the prompt is to each provider's capabilities, using the `semantic` load balancing algorithm. 2. **Semantic policies**: Attach policies like AI Semantic Cache or AI Semantic Prompt Guard to add similarity-based caching, retrieval-augmented generation (RAG), and guardrails. ### Vector databases -To compare embeddings efficiently, {{site.ai_gateway}} semantic features rely on vector databases. These specialized data stores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. +To store and compare embeddings efficiently, {{site.ai_gateway}} semantic features rely on vector databases. These specialized datastores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. -When a Model’s semantic load balancer or an attached semantic policy needs to find semantically similar content—whether it’s a prompt, a target model description, or a document chunk—it sends a query to a vector database. The database returns the closest matches, allowing the Model or policy to make decisions like routing, caching, injecting, or blocking. -A Model’s [semantic load balancer](/ai-gateway/entities/ai-model/#algorithms) stores vector representations of each target model’s semantic description at configuration time, and uses the vector database to compare incoming prompts against those stored vectors. Semantic policies use the same vector database to perform similarity searches at request time. +A Model Entity’s [semantic load balancer](/ai-gateway/entities/ai-model/#algorithms) stores vector representations of each target model’s semantic description at configuration time, and uses the vector database to compare incoming prompts against those stored vectors. Semantic policies use the same vector database to perform similarity searches at request time. The selected database stores the embeddings generated by the Model or policies (either at config time or runtime), and determines the accuracy and performance of semantic operations. {{site.ai_gateway}} supports Redis with Vector Similarity Search (VSS) and PostgreSQL with the pgvector extension. ### Similarity in Model load balancing and policies -Semantic similarity is applied differently depending on the feature: +Semantic similarity is used differently depending on the feature: **Model semantic load balancing** (`semantic` algorithm): - Compares incoming prompt embeddings against stored embeddings of each target model's semantic description. @@ -74,10 +73,10 @@ Semantic similarity is applied differently depending on the feature: - Embeddings are computed at config time for targets and at request time for prompts. **Semantic policies**: -- Each semantic policy applies similarity search slightly differently based on its goal. +- Each semantic policy uses similarity search slightly differently based on its goal. - AI Semantic Cache compares prompts against cached prompt keys to find reusable responses. - AI RAG Injector compares prompts against vectorized document chunks to retrieve relevant context. -- AI Semantic Prompt Guard and AI Semantic Response Guard compare content against allow and deny lists to detect misuse patterns. +- AI Semantic Prompt Guard and AI Semantic Response Guard compare content against vectorised allow and deny lists to detect misuse patterns semantically. @@ -85,7 +84,7 @@ Semantic similarity is applied differently depending on the feature: Embedding models work by converting text into high-dimensional floating-point arrays where mathematical distance reflects semantic relationship. In other words, ingested text data becomes points in a vector space, which enables similarity searches in vector databases, and the dimension of embeddings plays a critical role for this. -Dimensionality determines how many numerical features represent each piece of content—similar to how a detailed profile might have dimensions for age, interests, location, and preferences. Higher dimensions create more detailed "fingerprints" that capture nuanced relationships, with smaller distances between vectors indicating stronger conceptual similarity and larger distances showing weaker associations. +Dimensionality determines how many numerical features represent each piece of content—similar to how a detailed profile might have dimensions for age, interests, location, and preferences. A higher number of dimensions creates more detailed "fingerprints" that capture nuanced relationships. Smaller distances between vectors indicate stronger conceptual similarity and larger distances show weaker associations. For example, this request to the OpenAI `/embeddings` API via {{site.ai_gateway}}: @@ -147,7 +146,7 @@ The `embedding` array contains 20 floating-point numbers—each one representing If you use embedding models that support defining the dimensionality of the embedding output, you should consider how to balance accuracy and performance based on your use case. -However, dimensionality extremes at the far ends of the spectrum present significant drawbacks: +However, extremes at the far ends of the spectrum present significant drawbacks: {% table %} columns: From b5bb14a64a374d5f3b4eb1d8db83b6499355409b Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Thu, 11 Jun 2026 07:16:09 +0200 Subject: [PATCH 07/10] apply feedback from review --- app/ai-gateway/semantic-similarity.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index c23787f82c..e1a39f7049 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -49,31 +49,33 @@ Vector embeddings power a range of LLM workflows, including semantic search, doc ## Semantic similarity in {{site.ai_gateway}} -In {{site.ai_gateway}} {% new_in 2.0.0 %}, you can perform intelligent request routing, caching, and content filtering based on meaning rather than exact matches by using semantic similarity queries. A [Model](/ai-gateway/entities/ai-model/) can leverage semantic similarity in two ways: +Based on meaning rather than exact matches, {{site.ai_gateway}} can perform intelligent request routing, caching, and content filtering using semantic similarity queries. A [Model](/ai-gateway/entities/ai-model/) can leverage semantic similarity in two ways: 1. **Semantic load balancing**: Route requests to upstream providers based on how semantically similar the prompt is to each provider's capabilities, using the `semantic` load balancing algorithm. -2. **Semantic policies**: Attach policies like AI Semantic Cache or AI Semantic Prompt Guard to add similarity-based caching, retrieval-augmented generation (RAG), and guardrails. +2. **Semantic Policies**: Attach Policies like AI Semantic Cache or AI Semantic Prompt Guard to add similarity-based caching, retrieval-augmented generation (RAG), and guardrails. ### Vector databases To store and compare embeddings efficiently, {{site.ai_gateway}} semantic features rely on vector databases. These specialized datastores index high-dimensional embeddings and enable **fast similarity search** based on distance metrics like cosine similarity or Euclidean distance. - A Model Entity’s [semantic load balancer](/ai-gateway/entities/ai-model/#algorithms) stores vector representations of each target model’s semantic description at configuration time, and uses the vector database to compare incoming prompts against those stored vectors. Semantic policies use the same vector database to perform similarity searches at request time. -The selected database stores the embeddings generated by the Model or policies (either at config time or runtime), and determines the accuracy and performance of semantic operations. {{site.ai_gateway}} supports Redis with Vector Similarity Search (VSS) and PostgreSQL with the pgvector extension. +The selected database stores the embeddings generated by the Model or Policies (either at config time or runtime), and determines the accuracy and performance of semantic operations. + +{% include plugins/ai-vector-db.md name="semantic features" %} -### Similarity in Model load balancing and policies +### How semantic similarity is applied Semantic similarity is used differently depending on the feature: **Model semantic load balancing** (`semantic` algorithm): -- Compares incoming prompt embeddings against stored embeddings of each target model's semantic description. -- Routes requests to the target whose description is most semantically similar to the prompt. -- Embeddings are computed at config time for targets and at request time for prompts. +- Generates embeddings for each target model's semantic description at configuration time and stores them in the vector database. +- At request time, embeds the incoming prompt using the same embedding model and compares it against the stored target embeddings. +- Routes requests to the target whose description is most semantically similar to the prompt, using the distance metric (cosine or Euclidean) configured for the Model. +- The quality of routing depends on semantic description quality and consistent use of the same embedding model for both targets and prompts. -**Semantic policies**: -- Each semantic policy uses similarity search slightly differently based on its goal. +**Semantic Policies**: +- Each semantic Policy uses similarity search slightly differently based on its goal. - AI Semantic Cache compares prompts against cached prompt keys to find reusable responses. - AI RAG Injector compares prompts against vectorized document chunks to retrieve relevant context. - AI Semantic Prompt Guard and AI Semantic Response Guard compare content against vectorised allow and deny lists to detect misuse patterns semantically. From adc70fa0a9b0666a88dd73a39b1a9dbd544a9799 Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Thu, 11 Jun 2026 07:58:17 +0200 Subject: [PATCH 08/10] Fix introductory paragraph --- app/ai-gateway/semantic-similarity.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index e1a39f7049..7e03bb695e 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -36,16 +36,12 @@ related_resources: icon: /assets/icons/redis.svg --- -In large language tasks, applications that interact with language models rely on semantic search—not by exact word matches, but by similarity in meaning. This is achieved using vector embeddings, which represent pieces of text as points in a high-dimensional space. - -These embeddings enable the concept of semantic similarity, where the “distance” between vectors reflects how closely related two pieces of text are. Similarity can be measured using techniques like cosine similarity or Euclidean distance, forming the quantitative basis for comparing meaning. +Vector embeddings represent text as points in high-dimensional space, where the distance between vectors reflects semantic similarity. This enables semantic search—comparing meaning rather than exact words—powering LLM workflows like intelligent caching, retrieval, classification, and anomaly detection. ![Vector embeddings example](/assets/images/ai-gateway/vectors.svg) > _**Figure 1:** A simplified representation of vector text embeddings in a three-dimensional space._ -For example, in the image, "king" and "emperor" are semantically more similar than a "king" is to an "otter". - -Vector embeddings power a range of LLM workflows, including semantic search, document clustering, recommendation systems, anomaly detection, content similarity analysis, and classification via auto-labeling. +For example, in the image, “king” and “emperor” are semantically more similar than “king” is to “otter”. Similarity is measured using techniques like cosine similarity or Euclidean distance, which quantify the relationship between vectors. ## Semantic similarity in {{site.ai_gateway}} From 118095503831b99e6462023cb89e79b8eccfe573 Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Thu, 11 Jun 2026 08:07:15 +0200 Subject: [PATCH 09/10] minor fixes --- app/ai-gateway/semantic-similarity.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index 7e03bb695e..997dfa6b8b 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -76,8 +76,6 @@ Semantic similarity is used differently depending on the feature: - AI RAG Injector compares prompts against vectorized document chunks to retrieve relevant context. - AI Semantic Prompt Guard and AI Semantic Response Guard compare content against vectorised allow and deny lists to detect misuse patterns semantically. - - ## Dimensionality Embedding models work by converting text into high-dimensional floating-point arrays where mathematical distance reflects semantic relationship. In other words, ingested text data becomes points in a vector space, which enables similarity searches in vector databases, and the dimension of embeddings plays a critical role for this. @@ -188,7 +186,7 @@ Cosine similarity measures the angle between vectors, ignoring their magnitude. ![Cosine similarity example](/assets/images/ai-gateway/cosine-similarity.svg) > _**Figure 2:** Visualization of cosine similarity as the angle between vector directions._ -Cosine tends to perform well across both low and high dimensional space, especially in high-diversity datasets because it captures vector orientation rather than size. This can be useful, for example, when comparing texts about Microsoft, Apple, and {{ site.google}}. +Cosine tends to perform well across both low and high dimensional space, especially in high-diversity datasets because it captures vector orientation rather than size. This can be useful, for example, when comparing texts about Microsoft, Apple, and {{site.google}}. #### Euclidean distance @@ -231,7 +229,7 @@ rows: ## Similarity threshold -The `config.vectordb.threshold` parameter controls how strictly the vector database evaluates similarity during a query. It is passed directly to the vector engine—such as Redis or PostgreSQL (pgvector)—and defines which results qualify as matches. In Redis, for example, this maps to the `distance_threshold` query parameter. By default, Redis sets this to `0.2`, but you can override it to suit your use case. +The `config.vectordb.threshold` parameter controls how strictly the vector database evaluates similarity during a query. It is passed directly to the vector engine (such as Redis or PostgreSQL with pgvector) and defines which results qualify as matches. In Redis, for example, this maps to the `distance_threshold` query parameter. By default, Redis sets this to `0.2`, but you can override it to suit your use case. The threshold defines how permissive the matching is. **Higher threshold values allow looser matches, while lower values enforce stricter matching.** The threshold range is 0 to 1. @@ -253,7 +251,7 @@ The closer your similarity threshold is to `1`, the more likely you are to get * This happens because vector embeddings are not perfectly robust to minor semantic shifts, especially for short or ambiguous prompts. Raising the threshold narrows the match window, so you're effectively demanding a near-exact match in a complex vector space, which is rare unless the input is repeated verbatim. -The chart below illustrates this effect: as the similarity threshold increase (for example, becomes more strict), the cache hit rate typically falls. This reflects the broader acceptance of matches in the embedding space, which helps reduce redundant LLM calls at the cost of some semantic looseness. +The chart below illustrates this effect: as the similarity threshold increases (for example, becomes more strict), the cache hit rate typically falls. This reflects the broader acceptance of matches in the embedding space, which helps reduce redundant LLM calls at the cost of some semantic looseness. ![Similarity threshold and cache rate hits](/assets/images/ai-gateway/cache-hit-rate.svg) > _**Figure 5:** As the similarity threshold decreases (becomes more permissive), cache hit rate increases—illustrating the trade-off between strict semantic matching and LLM efficiency._ From 6c9e87e2b8bea67c322fc8936f099184f23db4e9 Mon Sep 17 00:00:00 2001 From: tomek-labuk Date: Thu, 11 Jun 2026 08:08:17 +0200 Subject: [PATCH 10/10] update min_version --- app/ai-gateway/semantic-similarity.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/ai-gateway/semantic-similarity.md b/app/ai-gateway/semantic-similarity.md index 997dfa6b8b..9fec7209c3 100644 --- a/app/ai-gateway/semantic-similarity.md +++ b/app/ai-gateway/semantic-similarity.md @@ -17,7 +17,7 @@ tags: - load-balancing min_version: - ai-gateway: '2.0.0' + ai-gateway: '2.0' related_resources: - text: "{{site.ai_gateway}}"