Add throughput bucket samples for Cosmos Spark connector by xinlian12 · Pull Request #48734 · Azure/azure-sdk-for-java

xinlian12 · 2026-04-08T19:18:12Z

Summary

Add Python and Scala sample notebooks demonstrating server-side throughput bucket configuration for the Cosmos Spark connector (azure-cosmos-spark_3).

Changes

Samples/Python/NYC-Taxi-Data/04_ThroughputBucket.ipynb — PySpark notebook
Samples/Scala/NYC-Taxi-Data/04_ThroughputBucket.scala — Scala Databricks notebook

Both samples are modeled after the existing 01_Batch samples but replace the SDK-side global throughput control with the simpler server-side throughputBucket configuration:

Config key	Description
`spark.cosmos.throughputControl.enabled`	`"true"`
`spark.cosmos.throughputControl.name`	Group name
`spark.cosmos.throughputControl.throughputBucket`	Integer between `1` and `5`

Key differences from `01_Batch`

Removed the ThroughputControl metadata container creation (not needed for server-side buckets)
Removed separate throughput control account/catalog configuration
Replaced targetThroughputThreshold, globalControl.database, globalControl.container with throughputBucket

Verification

These are Databricks notebook samples and do not have associated unit tests. The structure and configuration keys were verified against the Spark connector source code (CosmosConfig.scala, ThroughputControlHelper.scala).

Add Python (.ipynb) and Scala sample notebooks demonstrating server-side throughput bucket configuration as an alternative to SDK-based global throughput control. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

FabianMeiswinkel

LGTM

Copilot

Pull request overview

Adds new Scala and Python Databricks sample notebooks under azure-cosmos-spark_3 demonstrating server-side throughput buckets (spark.cosmos.throughputControl.throughputBucket) for the NYC Taxi ingestion workflow, modeled after the existing 01_Batch samples but without SDK/global throughput-control metadata container setup.

Changes:

Add Scala Databricks notebook sample showing ingest/query/CF validation and deletes using throughput buckets.
Add PySpark Databricks notebook sample showing the same flow using throughput buckets.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
sdk/cosmos/azure-cosmos-spark_3/Samples/Scala/NYC-Taxi-Data/04_ThroughputBucket.scala	New Scala Databricks sample demonstrating throughput bucket configuration for ingest and delete workloads.
sdk/cosmos/azure-cosmos-spark_3/Samples/Python/NYC-Taxi-Data/04_ThroughputBucket.ipynb	New PySpark Databricks sample notebook demonstrating throughput bucket configuration for ingest and delete workloads.

sdk/cosmos/azure-cosmos-spark_3/Samples/Python/NYC-Taxi-Data/04_ThroughputBucket.ipynb

sdk/cosmos/azure-cosmos-spark_3/Samples/Scala/NYC-Taxi-Data/04_ThroughputBucket.scala

tvaron3

LGTM

xinlian12 added the Created By Copilot label Apr 8, 2026

xinlian12 marked this pull request as ready for review April 8, 2026 19:21

xinlian12 requested review from a team and kirankumarkolli as code owners April 8, 2026 19:21

Copilot AI review requested due to automatic review settings April 8, 2026 19:21

Copilot started reviewing on behalf of xinlian12 April 8, 2026 19:23 View session

FabianMeiswinkel approved these changes Apr 8, 2026

View reviewed changes

Copilot AI reviewed Apr 8, 2026

View reviewed changes

tvaron3 approved these changes Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add throughput bucket samples for Cosmos Spark connector#48734

Add throughput bucket samples for Cosmos Spark connector#48734
xinlian12 wants to merge 1 commit intoAzure:mainfrom
xinlian12:addSampleForThroughputBucketInSpark-v2

xinlian12 commented Apr 8, 2026

Uh oh!

FabianMeiswinkel left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tvaron3 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xinlian12 commented Apr 8, 2026

Summary

Changes

Key differences from 01_Batch

Verification

Uh oh!

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tvaron3 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Key differences from `01_Batch`