From b1c837b18f6eef18c15e65ccb6fa5c029ab62615 Mon Sep 17 00:00:00 2001 From: Shay Rojansky Date: Sat, 25 Apr 2026 08:43:26 +0200 Subject: [PATCH 1/2] Document WithApproximate LINQ operator for SQL Server vector search Document dotnet/efcore#38144 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../providers/sql-server/vector-search.md | 127 +++++++++++------- .../core/what-is-new/ef-core-11.0/whatsnew.md | 9 +- 2 files changed, 84 insertions(+), 52 deletions(-) diff --git a/entity-framework/core/providers/sql-server/vector-search.md b/entity-framework/core/providers/sql-server/vector-search.md index 77f510c409..9ffb12498c 100644 --- a/entity-framework/core/providers/sql-server/vector-search.md +++ b/entity-framework/core/providers/sql-server/vector-search.md @@ -86,78 +86,101 @@ This function computes the distance between the query vector and every row in th > [!NOTE] > The built-in support in EF 10 replaces the previous [EFCore.SqlServer.VectorSearch](https://github.com/efcore/EFCore.SqlServer.VectorSearch) extension, which allowed performing vector search before the `vector` data type was introduced. As part of upgrading to EF 10, remove the extension from your projects. -## Approximate search with VECTOR_SEARCH() +## Searching with VECTOR_SEARCH() > [!WARNING] > `VECTOR_SEARCH()` and vector indexes are currently experimental features in SQL Server and are subject to change. The APIs in EF Core for these features are also subject to change. -For large datasets, computing exact distances for every row can be prohibitively slow. SQL Server 2025 introduces support for *approximate* search through a [vector index](/sql/t-sql/statements/create-vector-index-transact-sql), which provides much better performance at the expense of returning items that are approximately similar - rather than exactly similar - to the query. +SQL Server's `VECTOR_SEARCH()` table-valued function retrieves rows based on vector similarity. Unlike `VECTOR_DISTANCE()` — which computes the distance between two specific vectors — `VECTOR_SEARCH()` searches an entire table for the most similar vectors to a given query vector. -### Vector indexes - -To use `VECTOR_SEARCH()`, you must create a vector index on your vector column. Use the `HasVectorIndex()` method in your model configuration: +Use the `VectorSearch()` extension method on your `DbSet`, and chain `OrderBy()`, `Take()`, and `WithApproximate()` to perform an approximate nearest neighbor (ANN) search that uses a [vector index](/sql/t-sql/statements/create-vector-index-transact-sql): ```csharp -protected override void OnModelCreating(ModelBuilder modelBuilder) +var blogs = await context.Blogs + .VectorSearch(b => b.Embedding, embedding, "cosine") + .OrderBy(r => r.Distance) + .Take(5) + .WithApproximate() + .ToListAsync(); + +foreach (var result in blogs) { - modelBuilder.Entity() - .HasVectorIndex(b => b.Embedding, "cosine"); + Console.WriteLine($"Blog {result.Value.Id} with distance {result.Distance}"); } ``` -This will generate the following SQL migration: +This translates to the following SQL: ```sql -CREATE VECTOR INDEX [IX_Blogs_Embedding] - ON [Blogs] ([Embedding]) - WITH (METRIC = COSINE) +SELECT TOP(@__p_1) WITH APPROXIMATE [b].[Id], [b].[Name], [v].[Distance] +FROM VECTOR_SEARCH( + TABLE = [Blogs] AS [b], + COLUMN = [Embedding], + SIMILAR_TO = @__embedding_0, + METRIC = 'cosine' +) AS [v] +ORDER BY [v].[Distance] ``` -The following distance metrics are supported for vector indexes: +`VectorSearch()` returns `VectorSearchResult`, which allows you to access both the entity and the computed distance: -Metric | Description ------------ | ----------- -`cosine` | Cosine similarity (angular distance) -`euclidean` | Euclidean distance (L2 norm) -`dot` | Dot product (negative inner product) +```csharp +var searchResults = await context.Blogs + .VectorSearch(b => b.Embedding, embedding, "cosine") + .Where(r => r.Distance < 0.05) + .OrderBy(r => r.Distance) + .Select(r => new { Blog = r.Value, Distance = r.Distance }) + .Take(3) + .WithApproximate() + .ToListAsync(); +``` -Choose the metric that best matches your embedding model and use case. Cosine similarity is commonly used for text embeddings, while euclidean distance is often used for image embeddings. +This allows you to filter on the similarity score, present it to users, etc. + +### WithApproximate() -### Searching with VECTOR_SEARCH() +`WithApproximate()` instructs SQL Server to use the vector index for approximate nearest neighbor (ANN) search, which provides significantly better performance for large datasets. It causes `WITH APPROXIMATE` to be added to the SQL `TOP` clause. `WithApproximate()` must be called after `Take()`, which specifies the number of results to return. -Once you have a vector index, use the `VectorSearch()` extension method on your `DbSet`: +Without `WithApproximate()`, the query performs an exact k-nearest neighbor (kNN) search that scans all rows, without using the vector index: ```csharp +// Exact kNN search (no vector index used) var blogs = await context.Blogs - .VectorSearch(b => b.Embedding, embedding, "cosine", topN: 5) + .VectorSearch(b => b.Embedding, embedding, "cosine") + .OrderBy(r => r.Distance) + .Take(5) .ToListAsync(); +``` + +### Vector indexes + +To use approximate search with `WithApproximate()`, you must create a vector index on your vector column. Use the `HasVectorIndex()` method in your model configuration: -foreach (var (blog, score) in blogs) +```csharp +protected override void OnModelCreating(ModelBuilder modelBuilder) { - Console.WriteLine($"Blog {blog.Id} with score {score}"); + modelBuilder.Entity() + .HasVectorIndex(b => b.Embedding, "cosine"); } ``` -This translates to the following SQL: +This will generate the following SQL migration: ```sql -SELECT [v].[Id], [v].[Name], [v].[Distance] -FROM VECTOR_SEARCH([Blogs], 'Embedding', @__embedding, 'metric = cosine', @__topN) +CREATE VECTOR INDEX [IX_Blogs_Embedding] + ON [Blogs] ([Embedding]) + WITH (METRIC = COSINE) ``` -The `topN` parameter specifies the maximum number of results to return. - -`VectorSearch()` returns `VectorSearchResult`, which allows you to access both the entity and the computed distance: +The following distance metrics are supported for vector indexes: -```csharp -var searchResults = await context.Blogs - .VectorSearch(b => b.Embedding, embedding, "cosine", topN: 5) - .Where(r => r.Distance < 0.05) - .Select(r => new { Blog = r.Value, Distance = r.Distance }) - .ToListAsync(); -``` +Metric | Description +----------- | ----------- +`cosine` | Cosine similarity (angular distance) +`euclidean` | Euclidean distance (L2 norm) +`dot` | Dot product (negative inner product) -This allows you to filter on the similarity score, present it to users, etc. +Choose the metric that best matches your embedding model and use case. Cosine similarity is commonly used for text embeddings, while euclidean distance is often used for image embeddings. ## Hybrid search @@ -175,7 +198,10 @@ var results = await context.Articles .FreeTextTable(textualQuery, topN: k) // Perform vector (semantic) search, joining the results of both searches together .LeftJoin( - context.Articles.VectorSearch(b => b.Embedding, queryEmbedding, "cosine", topN: k), + context.Articles.VectorSearch(b => b.Embedding, queryEmbedding, "cosine") + .OrderBy(r => r.Distance) + .Take(k) + .WithApproximate(), fts => fts.Key, vs => vs.Value.Id, (fts, vs) => new @@ -209,14 +235,17 @@ This query: The query produces the following SQL: ```sql -SELECT TOP(@p3) [a0].[Id], [a0].[Content], [a0].[Title] -FROM FREETEXTTABLE([Articles], *, @p, @p1) AS [f] -LEFT JOIN VECTOR_SEARCH( - TABLE = [Articles] AS [a0], - COLUMN = [Embedding], - SIMILAR_TO = @p2, - METRIC = 'cosine', - TOP_N = @p3 -) AS [v] ON [f].[KEY] = [a0].[Id] -ORDER BY 1.0E0 / CAST(10 + [f].[RANK] AS float) + ISNULL(1.0E0 / (10.0E0 + [v].[Distance]), 0.0E0) DESC +SELECT TOP(@__p_4) [a0].[Id], [a0].[Content], [a0].[Title] +FROM FREETEXTTABLE([Articles], *, @__textualQuery_0, @__k_1) AS [f] +LEFT JOIN ( + SELECT TOP(@__k_1) WITH APPROXIMATE [a].[Id], [a].[Content], [a].[Title], [v].[Distance] + FROM VECTOR_SEARCH( + TABLE = [Articles] AS [a], + COLUMN = [Embedding], + SIMILAR_TO = @__queryEmbedding_2, + METRIC = 'cosine' + ) AS [v] + ORDER BY [v].[Distance] +) AS [t] ON [f].[KEY] = [t].[Id] +ORDER BY 1.0E0 / CAST(@__k_1 + [f].[RANK] AS float) + ISNULL(1.0E0 / (CAST(@__k_1 AS float) + [t].[Distance]), 0.0E0) DESC ``` diff --git a/entity-framework/core/what-is-new/ef-core-11.0/whatsnew.md b/entity-framework/core/what-is-new/ef-core-11.0/whatsnew.md index 78900f1671..2bd1d5410e 100644 --- a/entity-framework/core/what-is-new/ef-core-11.0/whatsnew.md +++ b/entity-framework/core/what-is-new/ef-core-11.0/whatsnew.md @@ -234,15 +234,18 @@ protected override void OnModelCreating(ModelBuilder modelBuilder) } ``` -Once you have a vector index, you can use the `VectorSearch()` extension method on your `DbSet` to perform an approximate search: +Once you have a vector index, you can use the `VectorSearch()` extension method on your `DbSet`, and chain `Take()` and `WithApproximate()` to perform an approximate search: ```csharp var blogs = await context.Blogs - .VectorSearch(b => b.Embedding, embedding, "cosine", topN: 5) + .VectorSearch(b => b.Embedding, embedding, "cosine") + .OrderBy(r => r.Distance) + .Take(5) + .WithApproximate() .ToListAsync(); ``` -This translates to the SQL Server [`VECTOR_SEARCH()`](/sql/t-sql/functions/vector-search-transact-sql) table-valued function, which performs an approximate search over the vector index. The `topN` parameter specifies the number of results to return. +This translates to the SQL Server [`VECTOR_SEARCH()`](/sql/t-sql/functions/vector-search-transact-sql) table-valued function. `Take()` specifies the number of results to return, and `WithApproximate()` instructs SQL Server to use the vector index for approximate nearest neighbor (ANN) search, adding `WITH APPROXIMATE` to the SQL `TOP` clause. Without `WithApproximate()`, an exact k-nearest neighbor (kNN) search is performed instead. `VectorSearch()` returns `VectorSearchResult`, allowing you to access the distance alongside the entity. From aa313ae04727f2ca4af709269fa975b7a4cf0f30 Mon Sep 17 00:00:00 2001 From: Shay Rojansky Date: Tue, 28 Apr 2026 10:38:36 +0200 Subject: [PATCH 2/2] Update entity-framework/core/providers/sql-server/vector-search.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- entity-framework/core/providers/sql-server/vector-search.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/entity-framework/core/providers/sql-server/vector-search.md b/entity-framework/core/providers/sql-server/vector-search.md index 9ffb12498c..608ac0eb0b 100644 --- a/entity-framework/core/providers/sql-server/vector-search.md +++ b/entity-framework/core/providers/sql-server/vector-search.md @@ -96,14 +96,14 @@ SQL Server's `VECTOR_SEARCH()` table-valued function retrieves rows based on vec Use the `VectorSearch()` extension method on your `DbSet`, and chain `OrderBy()`, `Take()`, and `WithApproximate()` to perform an approximate nearest neighbor (ANN) search that uses a [vector index](/sql/t-sql/statements/create-vector-index-transact-sql): ```csharp -var blogs = await context.Blogs +var results = await context.Blogs .VectorSearch(b => b.Embedding, embedding, "cosine") .OrderBy(r => r.Distance) .Take(5) .WithApproximate() .ToListAsync(); -foreach (var result in blogs) +foreach (var result in results) { Console.WriteLine($"Blog {result.Value.Id} with distance {result.Distance}"); }