diff --git a/docs/reference/data-sources/mongodb.md b/docs/reference/data-sources/mongodb.md index c1b6eed1bed..8902affd3c5 100644 --- a/docs/reference/data-sources/mongodb.md +++ b/docs/reference/data-sources/mongodb.md @@ -28,9 +28,7 @@ The full set of configuration options is available [here](https://rtd.feast.dev/ ## Vector Search -The MongoDB online store supports [Atlas Vector Search](https://www.mongodb.com/docs/atlas/atlas-vector-search/), enabling similarity search over feature embeddings stored in MongoDB Atlas. This is powered by the `$vectorSearch` aggregation stage and requires MongoDB Atlas (or the `mongodb/mongodb-atlas-local` Docker image for local development). - -See [PR #6344](https://github.com/feast-dev/feast/pull/6344) for full implementation details. +The MongoDB online store supports [MongoDB Vector Search](https://www.mongodb.com/docs/atlas/atlas-vector-search/), enabling similarity search over feature embeddings stored in MongoDB. This is powered by the `$vectorSearch` aggregation stage and supports MongoDB Atlas, self-hosted MongoDB with Atlas Search indexes, and the `mongodb/mongodb-atlas-local` Docker image for local development. ### Configuration @@ -41,7 +39,7 @@ project: my_project provider: local online_store: type: mongodb - connection_string: mongodb+srv://:@cluster.mongodb.net + connection_string: mongodb+srv://:@cluster.mongodb.net # pragma: allowlist secret vector_enabled: true similarity: cosine # cosine | euclidean | dotProduct vector_index_wait_timeout: 60 # seconds to wait for index to become queryable @@ -76,32 +74,24 @@ item_embeddings = FeatureView( ) ``` -When `feast apply` (or `store.update()`) runs with `vector_enabled=True`, Atlas vector search indexes are automatically created for any field with `vector_index=True`. Indexes are also automatically dropped when feature views are removed. +When `feast apply` (or `store.update()`) runs with `vector_enabled=True`, MongoDB vector search indexes are automatically created for any field with `vector_index=True`. Indexes are also automatically dropped when feature views are removed. ### Retrieving Documents via Vector Search Use `retrieve_online_documents_v2()` to perform similarity search: ```python -source = FeatureStore(repo_path=".") +store = FeatureStore(repo_path=".") results = store.retrieve_online_documents_v2( - config=repo_config, - table=item_embeddings, - requested_features=["embedding", "title"], - embedding=[0.1, 0.2, ...], # query vector + features=["item_embeddings:embedding", "item_embeddings:title"], + query=[0.1, 0.2, ...], # query vector top_k=5, ) - -# Each result is a (event_timestamp, entity_key_proto, feature_dict) tuple. -# feature_dict includes a synthetic "distance" key with the vector search score. -for ts, entity_key, features in results: - print(features["title"].string_val, features["distance"].float_val) -``` ``` ### How It Works -- **Index creation**: `update()` creates an Atlas vector search index named `____vs_index` for each vector-indexed field. It waits for the index to reach `READY` status before proceeding. +- **Index creation**: `update()` creates a MongoDB vector search index named `____vs_index` for each vector-indexed field. It waits for the index to reach `READY` status before proceeding. - **Query execution**: `retrieve_online_documents_v2()` builds a `$vectorSearch` aggregation pipeline with `numCandidates = max(top_k * 10, 100)` and the specified `limit`. - **Score**: Results include a `distance` field populated from `$meta: "vectorSearchScore"`. - **BSON compatibility**: Query vectors are coerced to native Python floats to avoid numpy serialization issues. diff --git a/docs/reference/offline-stores/mongodb.md b/docs/reference/offline-stores/mongodb.md index 0e8d1786699..a41d43ca676 100644 --- a/docs/reference/offline-stores/mongodb.md +++ b/docs/reference/offline-stores/mongodb.md @@ -3,8 +3,6 @@ ## Description The MongoDB offline store provides support for reading [MongoDBSource](../data-sources/mongodb.md). -* Uses a single shared collection with a compound index for all FeatureViews, distinguished by a `feature_view` discriminator field. -* Entity dataframes can be provided as a Pandas dataframe. The offline store converts entity identifiers into serialized entity keys for efficient lookup against the collection. ## Getting started