Confirmation of Request Source
Describe the feature you'd like
We built a design review tool for the X AI hackathon.
The tool found scaling bottlenecks in the python sdk for collections management. The tool relies on artifacts it had generated (attached [here)]([url](
workflows.json
design-workflow-3-collections-management.md
))
Scaling Bottlenecks in the Third Workflow: Collections Management
The third workflow, as documented in .exp/design-workflow-3-collections-management.md and workflows.json, focuses on managing vector collections for document storage, embedding, indexing, searching, and retrieval. It involves operations like creating collections, uploading/indexing documents, and searching via gRPC calls to management and documents services. Exploration of the codebase (src/xai_sdk/collections.py, src/xai_sdk/sync/collections.py, src/xai_sdk/files.py, src/xai_sdk/poll_timer.py, proto files, and examples/sync/collection.py) reveals several client-side scaling bottlenecks, particularly for high-volume scenarios (e.g., ingesting thousands of documents or handling large files). These limit efficiency, increase latency, memory usage, and API call overhead:
1. Lack of Batch Operations for Document Addition (Major Bottleneck)
- The underlying protobuf API (in
proto/v5/collections_pb2_grpc.py and v6) supports BatchAddDocumentToCollection(BatchAddDocumentToCollectionRequest) RPC for adding multiple documents in a single call.
- However, the SDK client (
sync/collections.py) only exposes single-document methods: add_existing_document (single AddDocumentToCollection RPC) and upload_document (which internally calls single add after file upload).
- Impact for Scaling: For bulk ingestion (common in RAG/knowledge base workflows), users must loop over individual calls, resulting in N gRPC RPCs for N documents. This amplifies network latency (each RPC has overhead), risks server-side rate limiting/throttling, and serializes processing. Examples show sequential small uploads, but real-world scale (e.g., 10k+ docs) would be prohibitively slow without user-implemented parallelism (threads/asyncio).
- Evidence: No
batch_add_documents method in client; proto grep confirms RPC exists but unimplemented. Design doc mentions "batch operations," but client only implements batch_get_documents.
2. Memory Inefficiency in Document Upload for Large Files
upload_document(..., data: bytes) in sync/collections.py requires loading entire document into memory to compute len(data) and slice chunks via _chunk_file_data (in files.py:93-129), which yields fixed-size chunks ( _CHUNK_SIZE, likely ~4MB) for streaming gRPC upload.
- While upload itself streams (good), initial
bytes load is client-side burden.
- Impact for Scaling: Large documents (e.g., PDFs >100MB, datasets) cause high memory spikes per upload, risking OOM in batch loops or low-memory envs. No overload for
path: str or file objects in upload_document; users must detour via client.files.upload(path=...) (which streams from disk via open("rb").read(_CHUNK_SIZE) in files.py:130-164), get file_id, then add_existing_document. This adds complexity and still requires single adds for batch.
- Evidence: Examples use small
b"""...""" strings; files.upload(path) handles streaming well, but collections workflow doesn't integrate it seamlessly.
3. Inefficient Per-Document Polling for Async Indexing Status
- Document indexing (chunking, embedding, HNSW indexing) is server-side async after
AddDocumentToCollection.
upload_document(wait_for_indexing=True) or manual get_document loops poll status via PollTimer (poll_timer.py), checking DocumentStatus (PROCESSING → PROCESSED/FAILED) with default 10s intervals (DEFAULT_INDEXING_POLL_INTERVAL).
- No batch status method (e.g., poll multiple via
batch_get_documents is possible but manual and still per-batch-loop).
- Impact for Scaling: In bulk workflows, waiting for many docs requires many polls (e.g., 10s intervals × N docs = excessive gRPC traffic, even with configurable intervals). Busy-waiting wastes resources; no pub/sub or webhook alternatives. Timeouts (default 2min) may fail under load if server backlog.
- Evidence:
_wait_for_indexing_to_complete in sync/collections.py:319-356 is single-doc loop with time.sleep; design doc notes "optional polling," but no optimized batch/multi-wait.
4. Synchronous Blocking and Lack of Built-in Concurrency
- Sync client (
sync/collections.py) blocks on each RPC (upload, add, poll, search), making loops inherently sequential.
- No high-level batch/parallel methods (e.g.,
upload_documents(paths: list[str], parallel=True) with progress or threading).
- Async client (
aio/collections.py) exists for concurrency but mirrors issues (single ops, no batch).
- Impact for Scaling: High-throughput ingestion (e.g., processing directories of files) requires user-side concurrency (e.g.,
concurrent.futures), increasing code complexity and potential for race conditions (e.g., adding before upload complete). Telemetry/interceptors add minor per-call overhead under load.
- Evidence: Examples (
examples/sync/collection.py) run ops sequentially; BaseClient reuses channels (good), but no parallelism abstraction.
5. Minor Overhead in Repeated Validations and Conversions
- Every create/update involves Pydantic
TypeAdapter validations (FieldDefinitionValidator, ChunkConfigurationValidator) and dict-to-pb conversions (e.g., _field_definition_to_pb, _chunk_configuration_to_pb).
- Impact for Scaling: Negligible for few ops, but in loops for many collections/fields, adds CPU cycles. Proto versioning (v5/v6 stubs) requires correct channel selection, potential misconfig under scale.
- Evidence:
collections.py converters called per-call; good type safety but runtime cost.
Recommendations for Mitigation (Not Implemented Changes)
- Expose
batch_add_documents(file_ids: list[str], fields: list[dict]) wrapping proto RPC.
- Add
upload_documents overloads for paths/files with optional concurrency and batch add.
- Implement
batch_wait_for_indexing(file_ids: list[str]) using batch_get_documents loops or proto batch status if available.
- Integrate files streaming directly in collections for path-based uploads.
- Add optional async wrappers or concurrency helpers in docs/examples for scale.
These issues primarily affect client-side efficiency for large-scale document management, aligning with the workflow's emphasis on "document storage, embedding, indexing" at volume. Server-side (HNSW search, embedding) scales via design (e.g., approximate NN), but SDK gaps force suboptimal usage. For current small-scale (as in examples), fine; for production RAG pipelines, significant hurdles.
Additional Context
No response
Confirmation of Request Source
Describe the feature you'd like
We built a design review tool for the X AI hackathon.
The tool found scaling bottlenecks in the python sdk for collections management. The tool relies on artifacts it had generated (attached [here)]([url](
workflows.json
design-workflow-3-collections-management.md
))
Scaling Bottlenecks in the Third Workflow: Collections Management
The third workflow, as documented in
.exp/design-workflow-3-collections-management.mdandworkflows.json, focuses on managing vector collections for document storage, embedding, indexing, searching, and retrieval. It involves operations like creating collections, uploading/indexing documents, and searching via gRPC calls to management and documents services. Exploration of the codebase (src/xai_sdk/collections.py,src/xai_sdk/sync/collections.py,src/xai_sdk/files.py,src/xai_sdk/poll_timer.py, proto files, andexamples/sync/collection.py) reveals several client-side scaling bottlenecks, particularly for high-volume scenarios (e.g., ingesting thousands of documents or handling large files). These limit efficiency, increase latency, memory usage, and API call overhead:1. Lack of Batch Operations for Document Addition (Major Bottleneck)
proto/v5/collections_pb2_grpc.pyandv6) supportsBatchAddDocumentToCollection(BatchAddDocumentToCollectionRequest)RPC for adding multiple documents in a single call.sync/collections.py) only exposes single-document methods:add_existing_document(singleAddDocumentToCollectionRPC) andupload_document(which internally calls single add after file upload).batch_add_documentsmethod in client; proto grep confirms RPC exists but unimplemented. Design doc mentions "batch operations," but client only implementsbatch_get_documents.2. Memory Inefficiency in Document Upload for Large Files
upload_document(..., data: bytes)insync/collections.pyrequires loading entire document into memory to computelen(data)and slice chunks via_chunk_file_data(infiles.py:93-129), which yields fixed-size chunks (_CHUNK_SIZE, likely ~4MB) for streaming gRPC upload.bytesload is client-side burden.path: stror file objects inupload_document; users must detour viaclient.files.upload(path=...)(which streams from disk viaopen("rb").read(_CHUNK_SIZE)infiles.py:130-164), getfile_id, thenadd_existing_document. This adds complexity and still requires single adds for batch.b"""..."""strings;files.upload(path)handles streaming well, but collections workflow doesn't integrate it seamlessly.3. Inefficient Per-Document Polling for Async Indexing Status
AddDocumentToCollection.upload_document(wait_for_indexing=True)or manualget_documentloops poll status viaPollTimer(poll_timer.py), checkingDocumentStatus(PROCESSING → PROCESSED/FAILED) with default 10s intervals (DEFAULT_INDEXING_POLL_INTERVAL).batch_get_documentsis possible but manual and still per-batch-loop)._wait_for_indexing_to_completeinsync/collections.py:319-356is single-doc loop withtime.sleep; design doc notes "optional polling," but no optimized batch/multi-wait.4. Synchronous Blocking and Lack of Built-in Concurrency
sync/collections.py) blocks on each RPC (upload, add, poll, search), making loops inherently sequential.upload_documents(paths: list[str], parallel=True)with progress or threading).aio/collections.py) exists for concurrency but mirrors issues (single ops, no batch).concurrent.futures), increasing code complexity and potential for race conditions (e.g., adding before upload complete). Telemetry/interceptors add minor per-call overhead under load.examples/sync/collection.py) run ops sequentially;BaseClientreuses channels (good), but no parallelism abstraction.5. Minor Overhead in Repeated Validations and Conversions
TypeAdaptervalidations (FieldDefinitionValidator,ChunkConfigurationValidator) and dict-to-pb conversions (e.g.,_field_definition_to_pb,_chunk_configuration_to_pb).collections.pyconverters called per-call; good type safety but runtime cost.Recommendations for Mitigation (Not Implemented Changes)
batch_add_documents(file_ids: list[str], fields: list[dict])wrapping proto RPC.upload_documentsoverloads for paths/files with optional concurrency and batch add.batch_wait_for_indexing(file_ids: list[str])usingbatch_get_documentsloops or proto batch status if available.These issues primarily affect client-side efficiency for large-scale document management, aligning with the workflow's emphasis on "document storage, embedding, indexing" at volume. Server-side (HNSW search, embedding) scales via design (e.g., approximate NN), but SDK gaps force suboptimal usage. For current small-scale (as in examples), fine; for production RAG pipelines, significant hurdles.
Additional Context
No response