Skip to content

update pyspark-notebook to java 21 and derby 10.17.1.0#2424

Open
nicholasmhughes wants to merge 2 commits intojupyter:mainfrom
nicholasmhughes:fix-pyspark-notebook-crit-cve
Open

update pyspark-notebook to java 21 and derby 10.17.1.0#2424
nicholasmhughes wants to merge 2 commits intojupyter:mainfrom
nicholasmhughes:fix-pyspark-notebook-crit-cve

Conversation

@nicholasmhughes
Copy link
Copy Markdown

Describe your changes

  • default to java 21
  • remove the affected 10.16.1.1 derby jar
  • download and install the updated 10.17.1.0 jar

Issue ticket if applicable

fixes #2423

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes


# Spark bundles Derby 10.16.1.1 by default; replace it with a fixed release.
RUN set -eux; \
derby_jar="$(find /usr/local -type f -path '*/spark-*-bin-hadoop*/jars/derby-*.jar' | head -n 1)"; \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we limit the find to /usr/local/spark?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could, and I'm happy to make the change if you're worried about build speed. I was mostly trying to make sure that I didn't miss something in case the Spark installation location changed at some point in the future, or potentially if multiple versions ever got installed in different directories. Just let me know if you'd like to see that change!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a ${SPARK_HOME} env variable for that, so let's use it - it will be both fast and without the hard-code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! stand by for new commit.

@manics
Copy link
Copy Markdown
Contributor

manics commented Mar 25, 2026

Is this a vulnerability in pyspark, or is there something specific to how docker-stacks installs pyspark?

@nicholasmhughes
Copy link
Copy Markdown
Author

Is this a vulnerability in pyspark, or is there something specific to how docker-stacks installs pyspark?

It’s not a pyspark-specific vulnerability per-se, and it’s not really a docker-stacks-only behavior either. This CVE is in Apache Derby (a Java dependency). docker-stacks installs the upstream Spark binary distribution, and that Spark bundle includes derby-10.16.1.1.jar under $SPARK_HOME/jars. So, docker-stacks is inheriting what Spark ships.

@manics
Copy link
Copy Markdown
Contributor

manics commented Mar 26, 2026

Do you think Spark are willing to update their distribution? If your security scanner is picking up this vuln in docker-stacks I presume other organisations will be having similar problem in their own systems if they use the spark distribution.

@nicholasmhughes
Copy link
Copy Markdown
Author

Do you think Spark are willing to update their distribution? If your security scanner is picking up this vuln in docker-stacks I presume other organisations will be having similar problem in their own systems if they use the spark distribution.

Ehhh... probably unlikely that they'll rip out the affected version until Java 17 drops off the support matrix. The fixed Derby jar is only supported on Java 21+.

@mathbunnyru
Copy link
Copy Markdown
Member

To add more context, this is the upstream issue: apache/spark#54563

And derby is retired: apache/spark#54563 (comment)

If there is a way to easily fix the issue for our images without negatively affecting anyone, we should.
When Spark releases a new version with fixed/removed dependency, it will automatically be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pyspark-notebook contains derby-10.16.1.1.jar - subject to cve-2022-46337

3 participants