refactor(toolchains): register runtimes using manifest#3812
Conversation
Currently, all supported Python runtime versions and their platform-specific metadata (URLs, SHA256s, strip_prefix) must be hardcoded in `python/versions.bzl`. This makes it slow and difficult to adopt new Python versions or custom builds without updating `rules_python` itself. This PR introduces the ability to dynamically fetch and register Python runtimes from a remote python-build-standalone (PBS) manifest file (e.g., `SHA256SUMS`). This is supported via two new attributes in `python.override`: - `add_runtime_manifest_urls`: A list of URLs pointing to manifest files to parse and register. - `runtime_manifest_sha`: The SHA256 hash of the manifest file.
…ce mode Workspace builds running under older Bazel versions do not support Bzlmod module extensions. This commit introduces Starlark helper macros to conditionally gate and register Bzlmod-specific unit tests, resolving test suite loading crashes in workspace CI jobs.
Removes the external test_helpers.bzl under tests/python/ and inlines the register_python_tests macro directly inside python_tests.bzl. This simplifies the test suite layout while keeping legacy workspace gating completely intact.
Enables local PBS manifest file resolution in Bzlmod mode. Implemented attr.label_list attribute on python.override and updated _populate_from_pbs_manifest to read local files via module_ctx.read.
Separates free-threading and archive type from the build flavor string in parse_filename. Also includes canonical documentation citation in the docstring and updates manifest parsing assertions.
Documents local manifest file loading alongside remote manifest URLs. Includes unified Starlark example using @// label prefix.
Flattens parsed manifest entries into a single list and sorts by archive flavor (install_only > install_only_stripped > full) so smaller standalone archives take precedence.
Restores explicit whitelist validation on entry.archive_flavor to prevent unsupported standalone release asset formats from polluting available toolchain mappings.
# Conflicts: # tests/support/mocks/python_ext.bzl
Centralizes standalone Python runtime asset registration into python/runtimes_manifest.txt and parses it dynamically via _populate_from_pbs_manifest, replacing the static TOOL_VERSIONS table.
There was a problem hiding this comment.
Code Review
This pull request introduces support for dynamically fetching and registering Python runtimes from a python-build-standalone manifest file, replacing the hardcoded TOOL_VERSIONS map with a dynamic parser. Feedback on the implementation highlights several critical Starlark compatibility issues, such as the unsupported use of lambda expressions and string methods like removeprefix and removesuffix on older Bazel versions. Additionally, suggestions are provided to improve robustness by handling empty base download URLs, parsing tabs in manifests, and fixing a potential type error in the test suite.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if "://" in location: | ||
| urls = [location] | ||
| else: | ||
| urls = ["{}/{}".format(b_url, location) for b_url in base_download_urls] |
There was a problem hiding this comment.
If base_download_urls is empty (which can happen if add_runtime_manifest_files is used with relative paths but no base_url or add_runtime_manifest_urls are provided), urls will be empty. This will lead to a silent failure during toolchain registration and a confusing error at download time. We should fail early with a clear error message.
| if "://" in location: | |
| urls = [location] | |
| else: | |
| urls = ["{}/{}".format(b_url, location) for b_url in base_download_urls] | |
| if "://" in location: | |
| urls = [location] | |
| else: | |
| if not base_download_urls: | |
| _fail("Manifest entry '{}' is a relative path, but no base URL or manifest URLs were provided to resolve it.".format(location)) | |
| return | |
| urls = ["{}/{}".format(b_url, location) for b_url in base_download_urls] |
| add_target_settings = [], | ||
| add_runtime_manifest_urls = [], | ||
| add_runtime_manifest_files = [], | ||
| runtime_manifest_sha = None): |
There was a problem hiding this comment.
In tests, runtime_manifest_sha defaults to None. If passed as None to _populate_from_pbs_manifest, it will override the default "" and cause mctx.download to fail with a type error because it expects a string. We should default it to "" instead of None.
add_runtime_manifest_urls = [],
add_runtime_manifest_files = [],
runtime_manifest_sha = ""):
|
I really like this listing of it. It makes it really obvious and apparent how many runtimes are being registered (600+ !). It also just seems much easier to add/remove entries. The paths in the manifest are a bit funny: It looks like CI is failing on the workspace tests. I'm not keen to port all this functionality to workspace. I'm thinking, for workspace, we just replace some internals so it reads from the manifest file, but don't expose a Q: Should we have one manifest file, or split it along some axis? e.g. python version, platform, build date, etc. Q: Should we publish a runtime manifest as part of releases? The thought is it lets one easily "upgrade" the runtimes (treating rules_python as a trusted source) without having to upgrade rules_python itself. The counter argument is: one can always add a custom manifest. |
Instead of generating complex nested Starlark dictionaries during repository setup, internal_config_repo now generates a straightforward list of parsed manifest structs (MANIFEST_ENTRIES) using render.struct() and render.list(). The versions.bzl module dynamically converts these entries into the legacy TOOL_VERSIONS dictionary format when loaded during macro evaluation.
…times-manifest # Conflicts: # tests/python/python_tests.bzl
Update internal_config_repo_bzl and python_register_toolchains_bzl in python/private/BUILD.bazel to include their respective loaded modules so that Stardoc extracts API documentation successfully.
…flags Mitigate transient 504 Gateway Timeout network errors during downloads by adding secondary mirror fallback rewrites for Stardoc to downloader_config.cfg and scaling HTTP timeouts and retries in .bazelrc files.
Combat transient 504 Gateway Timeout network dropouts on external artifact downloads by escalating HTTP timeout scaling to 10.0 and downloader retries to 10 across bazelrc files, and adding a mirror rewrite rule for rules_java.
Ensure primary GitHub release URLs are attempted prior to secondary mirror fallbacks by adding primary pass-through rewrite rules for rules_java and stardoc across downloader_config.cfg files.
Sets `python_version = "3.14"` directly on `sphinx_build_binary` to ensure the documentation builder executes with a modern Python runtime capable of parsing PEP 695 type aliases in recent Sphinx releases.
Replaced fragile substring search over PLATFORMS.keys() with direct synthesis of the PLATFORMS dictionary key from structured struct fields (arch, vendor, os, libc, freethreaded). Eliminated defensive getattr() access. Factored entry sorting into _manifest_entry_sort_key preferring install_only and the lowest microarchitecture level.
…ntimes-manifest # Conflicts: # python/private/pbs_manifest.bzl # python/private/python.bzl
Renamed flavor to build_flavor in parse_filename and parse_sha_manifest to clearly distinguish runtime build configurations (e.g., debug, pgo+lto) from distribution release bundling schemes (archive_flavor).
| @@ -57,1208 +57,7 @@ _ASTRAL_PREFIX = "https://releases.astral.sh/github/python-build-standalone/rele | |||
| # It is possible to provide lists in "url". It is also possible to provide patches or patch_strip. | |||
There was a problem hiding this comment.
Please remove the supporting print_toolchain_checksum.bzl docs above.
| @@ -0,0 +1,617 @@ | |||
| # Standalone runtimes manifest catalog | |||
There was a problem hiding this comment.
Please a comment on how this is generated.
…on releases Adds all prior release distribution entries (including historical Python 3.8-3.13 and freethreaded variant builds) to runtimes_manifest.txt to ensure dynamic runtime registration provides equivalent toolchain coverage to the previous static table.
a223107 to
72ac435
Compare
…ACE dictionary, and unit test alignment - Populates runtimes_manifest.txt with the complete historical union of all Python release distributions (including legacy 3.10.11 and freethreaded variant builds) to provide 100% equivalent runtime coverage across WORKSPACE and Bzlmod modes. - Dynamically binds TOOL_VERSIONS in versions.bzl while deferring default argument evaluation in legacy macros. - Aligns manifest unit test assertions in parse_sha_manifest_tests.bzl with the build_flavor struct attribute renaming.
… manifest binding - Sets workspace_mode = True in rules_python_internal_deps within internal_dev_deps.bzl to correctly populate manifest entries during top-level WORKSPACE initialization under Bazel 7.x. - Refactors parse_filename in pbs_manifest.bzl to use universal Starlark slice notation instead of interpreter-dependent string methods. - Updates internal_dev_deps docstrings to clarify WORKSPACE mode usage.
dougthor42
left a comment
There was a problem hiding this comment.
Q: Should we have one manifest file, or split it along some axis? e.g. python version, platform, build date, etc.
I'd vote for one file as long as the sorting issue in my other comment can be addressed.
If not, then split along python minor version (3.9 to 3.15; 7 files) makes sense to me. Then dropping support for a major version is just deleting the file.
Q: Should we publish a runtime manifest as part of releases? The thought is it lets one easily "upgrade" the runtimes (treating rules_python as a trusted source) without having to upgrade rules_python itself. The counter argument is: one can always add a custom manifest.
Eh. If it's trivial to do, sure, but at least for me I don't see much use for it. How often are people adding their own or customizing runtimes? I don't know.
| @@ -0,0 +1,572 @@ | |||
| # Standalone runtimes manifest catalog | |||
| 00bb2d629f7eacbb5c6b44dc04af26d1f1da64cee3425b0d8eb5135a93830296 20250317/cpython-3.13.2+20250317-x86_64-unknown-linux-musl-install_only.tar.gz | |||
There was a problem hiding this comment.
This sorting makes it difficult to know if a given python version is in the manifest.
Could there be another 1st column which is just the file name sans build date? And then things get sorted by that column instead of the hash.
cpython-3.11.5+20230826-x86_64-pc-windows-msvc-shared-install_only 00f00226... 20230826/cpython-3.11.5+20230826-x86_64-pc-windows-msvc-shared-install_only.tar.gz
cpython-3.12.9+20250317-aarch64-unknown-linux-gnu-install_only 00c6bf9a... 20250317/cpython-3.12.9+20250317-aarch64-unknown-linux-gnu-install_only.tar.gz
cpython-3.12.11+20250808-aarch64-pc-windows-msvc-install_only 00bf7d7... 20250808/cpython-3.12.11+20250808-aarch64-pc-windows-msvc-install_only.tar.gz
I guess it would remove the nice alignment that hash-as-first-col gives. And would need a natsort to handle .9 being sorted before .11 and whatnot.
| rewrite ^github\.com/bazelbuild/rules_java/(.*) mirror.bazel.build/github.com/bazelbuild/rules_java/$1 | ||
| rewrite ^github\.com/bazelbuild/stardoc/(.*) mirror.bazel.build/github.com/bazelbuild/stardoc/$1 | ||
|
|
||
|
|
There was a problem hiding this comment.
nit: extra newlines, here and in other downloader_config files.
| rewrite ^github\.com/bazelbuild/rules_java/(.*) mirror.bazel.build/github.com/bazelbuild/rules_java/$1 | ||
| rewrite ^github\.com/bazelbuild/stardoc/(.*) mirror.bazel.build/github.com/bazelbuild/stardoc/$1 | ||
|
|
||
|
|
There was a problem hiding this comment.
All 3 downloader_config files are identical - can/should they be deduped? In a separate PR.
This changes the list of runtimes that are registered to come from a file instead of
being logic within Starlark code.