Search before asking
Description
DolphinScheduler currently supports Amazon EMR on EC2 task type for managing EC2-based clusters and running computing tasks. However, there is no support for Amazon EMR Serverless, which is a serverless deployment option that allows users to run Spark and Hive workloads without managing cluster infrastructure.
EMR Serverless automatically provisions and scales compute resources on demand, offering a simpler operational model and cost optimization through pay-per-use pricing. It has become the recommended way to run EMR workloads for many use cases.
This feature request proposes adding a new EMR_SERVERLESS task type that enables users to:
- Submit jobs — Submit Spark or Hive jobs to a pre-created EMR Serverless application via the StartJobRun API
- Monitor job status — Automatically poll job state (SUBMITTED → PENDING → SCHEDULED → RUNNING → SUCCESS/FAILED/CANCELLED) via the GetJobRun API
- Cancel jobs — Automatically cancel running jobs when a DolphinScheduler task is killed, via the CancelJobRun API
- Failover recovery — Recover tracking of running jobs when Worker nodes restart
Task Parameters
| Parameter |
Description |
| Application Id |
EMR Serverless application ID (e.g. 00fkht2eodujab09) |
| Execution Role Arn |
IAM role ARN for job execution |
| Job Name |
Optional job name for identification |
| StartJobRunRequest JSON |
JSON containing JobDriver and ConfigurationOverrides for the job |
Use case
- Data engineering teams running scheduled Spark ETL pipelines without managing EMR clusters
- Ad-hoc Hive query workloads that benefit from serverless auto-scaling
- Cost-sensitive environments where pay-per-use is preferred over always-on clusters
- Organizations migrating from EMR on EC2 to EMR Serverless for operational simplicity
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Search before asking
Description
DolphinScheduler currently supports Amazon EMR on EC2 task type for managing EC2-based clusters and running computing tasks. However, there is no support for Amazon EMR Serverless, which is a serverless deployment option that allows users to run Spark and Hive workloads without managing cluster infrastructure.
EMR Serverless automatically provisions and scales compute resources on demand, offering a simpler operational model and cost optimization through pay-per-use pricing. It has become the recommended way to run EMR workloads for many use cases.
This feature request proposes adding a new
EMR_SERVERLESStask type that enables users to:Task Parameters
00fkht2eodujab09)JobDriverandConfigurationOverridesfor the jobUse case
Related issues
No response
Are you willing to submit a PR?
Code of Conduct