Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/using/specifics.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ You can build a `pyspark-notebook` image with a different `Spark` version by ove

- Spark distribution is defined by the combination of Spark, Hadoop, and Scala versions,
see [Download Apache Spark](https://spark.apache.org/downloads.html) and the [archive repo](https://archive.apache.org/dist/spark/) for more information.
- `openjdk_version`: The version of the OpenJDK (JRE headless) distribution (`17` by default).
- `openjdk_version`: The version of the OpenJDK (JRE headless) distribution (`21` by default).
- This version needs to match the version supported by the Spark distribution used above.
- See [Spark Overview](https://spark.apache.org/docs/latest/#downloading) and [Ubuntu packages](https://packages.ubuntu.com/search?keywords=openjdk).
- `spark_version` (optional): The Spark version to install, for example `3.5.0`.
Expand Down
13 changes: 12 additions & 1 deletion images/pyspark-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ USER root
# Spark dependencies
# Default values can be overridden at build time
# (ARGS are in lowercase to distinguish them from ENV)
ARG openjdk_version="17"
ARG openjdk_version="21"

RUN apt-get update --yes && \
apt-get install --yes --no-install-recommends \
Expand All @@ -33,6 +33,7 @@ ARG scala_version
# You need to use https://archive.apache.org/dist/spark/ website if you want to download old Spark versions
# But it seems to be slower, that's why we use the recommended site for download
ARG spark_download_url="https://dlcdn.apache.org/spark/"
ARG derby_version="10.17.1.0"

ENV SPARK_HOME=/usr/local/spark
ENV PATH="${PATH}:${SPARK_HOME}/bin"
Expand All @@ -47,6 +48,16 @@ RUN /opt/setup-scripts/setup_spark.py \
--scala-version="${scala_version}" \
--spark-download-url="${spark_download_url}"

# Spark bundles Derby 10.16.1.1 by default; replace it with a fixed release.
RUN set -eux; \
derby_jar="$(find /usr/local -type f -path '*/spark-*-bin-hadoop*/jars/derby-*.jar' | head -n 1)"; \
Comment thread
nicholasmhughes marked this conversation as resolved.
Outdated
test -n "${derby_jar}"; \
derby_dir="$(dirname "${derby_jar}")"; \
curl -fsSL -o "/tmp/derby-${derby_version}.jar" \
"https://repo1.maven.org/maven2/org/apache/derby/derby/${derby_version}/derby-${derby_version}.jar"; \
Comment thread
nicholasmhughes marked this conversation as resolved.
rm -f "${derby_jar}"; \
mv "/tmp/derby-${derby_version}.jar" "${derby_dir}/"

# Configure IPython system-wide
COPY ipython_kernel_config.py "/etc/ipython/"
RUN fix-permissions "/etc/ipython/"
Expand Down
Loading