Skip to content

Commit adfbd87

Browse files
authored
refactor/pyspark4.1: Upgrade Spark Docker Image and subproject to 4.1.1 (#175)
* Updates Spark image to `apache/spark:4.1.1` * Updates compose and config filenames to match version (4.1) * Bumps pyspark dependency on `pyspark-pipeline` to 4.1.1 to match Spark Cluster
1 parent 5771665 commit adfbd87

5 files changed

Lines changed: 273 additions & 12 deletions

File tree

module5-batch-processing/compose.spark-4.0-standalone.yaml renamed to module5-batch-processing/compose.spark-4.1-standalone.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
x-spark-image: &spark-image spark:${SPARK_VERSION:-4.0.1-scala2.13-java21-python3-ubuntu}
1+
x-spark-image: &spark-image apache/spark:${SPARK_VERSION:-4.1.1-scala2.13-java21-python3-ubuntu}
22
x-hive-image: &hive-image apache/hive:${HIVE_VERSION:-4.2.0}
33
x-postgres-image: &postgres-image postgres:${POSTGRES_VERSION:-18.1-alpine}
44

@@ -13,7 +13,7 @@ x-spark-common:
1313
volumes:
1414
&spark-common-vol
1515
- ./logs/:/opt/spark/logs/
16-
- ./spark-4.0-standalone.conf:/opt/spark/conf/spark-standalone.conf
16+
- ./spark-4.1-standalone.conf:/opt/spark/conf/spark-standalone.conf
1717
- ~/.gcp/spark_credentials.json:/secrets/gcp_credentials.json
1818
- vol-spark-extra-jars:/opt/spark/extra-jars/
1919
depends_on:

module5-batch-processing/pyspark-4.x/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Batch processing with PySpark 4.x
22

3-
![Python](https://img.shields.io/badge/Python-3.13_|_3.12-4B8BBE.svg?style=flat&logo=python&logoColor=FFD43B&labelColor=306998)
3+
![Python](https://img.shields.io/badge/Python-3.14_|_3.13_|_3.12-4B8BBE.svg?style=flat&logo=python&logoColor=FFD43B&labelColor=306998)
44
[![PySpark](https://img.shields.io/badge/PySpark-4.x-262A38?style=flat-square&logo=apachespark&logoColor=E36B22&labelColor=262A38)](https://spark.apache.org/docs/4.0.2/api/python/user_guide)
55
[![Hadoop](https://img.shields.io/badge/Hadoop-3.4.x-262A38?style=flat-square&logo=apachehadoop&logoColor=FDEE21&labelColor=262A38)](https://spark.apache.org/docs/4.0.2/api/python/user_guide)
66
[![Scala](https://img.shields.io/badge/Scala-2.13-262A38?style=flat-square&logo=scala&logoColor=E03E3C&labelColor=262A38)](https://sdkman.io/usage/)

module5-batch-processing/pyspark-4.x/pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ name = "pyspark-4.0-pipeline"
33
version = "2.0"
44
description = "Batch processing with PySpark 4"
55
readme = "README.md"
6-
requires-python = ">=3.12,<3.14"
6+
requires-python = ">=3.12,<3.15"
77

88
dependencies = [
9-
"pyspark[connect]==4.0.1",
9+
"pyspark[connect]==4.1.1",
1010
"pyarrow>=23.0.0,<24.0",
1111
]
1212

0 commit comments

Comments
 (0)