Adopt the compact index for gem commands#9606
Open
hsbt wants to merge 19 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adopts the Compact Index protocol for RubyGems gem commands by introducing a RubyGems-native Gem::CompactIndexClient (with disk caching, ETag/ranged requests, and Repr-Digest verification) and routing Gem::Source#load_specs and gem install resolution through it, with fallback to Marshal indexes when compact index is unavailable.
Changes:
- Add
Gem::CompactIndexClient(cache, parser, HTTP fetcher, updater, cache-file writer) and wire it intoGem::SourceandGem::Resolver::APISet. - Extend resolver specs to carry
created_atmetadata from compact index v2 and adjust API spec behavior to avoid fetching Marshal gemspecs for install paths. - Add/adjust tests and test helpers to stub compact-index endpoints and validate caching, redirect handling, and digest verification behavior.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/rubygems/source.rb | Prefer compact index for load_specs, add compact-index client + caching directory logic, keep Marshal fallback |
| lib/rubygems/resolver/specification.rb | Add created_at attribute to resolver specs |
| lib/rubygems/resolver/api_specification.rb | Populate created_at; build stub Gem::Specification from compact-index data |
| lib/rubygems/resolver/api_set.rb | Switch version fetching/parsing to use compact-index client fetch_info |
| lib/rubygems/compact_index_client.rb | New top-level client coordinating cache + parser |
| lib/rubygems/compact_index_client/cache.rb | New on-disk cache wrapper with per-process fetch deduping |
| lib/rubygems/compact_index_client/cache_file.rb | New safe cache-file writer with optional digest verification |
| lib/rubygems/compact_index_client/http_fetcher.rb | New fetcher built on Gem::RemoteFetcher with redirect handling |
| lib/rubygems/compact_index_client/parser.rb | New parser for names/versions/info files using existing gem-parser logic |
| lib/rubygems/compact_index_client/updater.rb | New updater implementing ETag + ranged requests + digest verification |
| Manifest.txt | Add new compact index client files to the manifest |
| test/rubygems/helper.rb | Add compact-index test helper endpoints and response builder |
| test/rubygems/test_gem_source.rb | Add compact-index coverage for Gem::Source#load_specs + caching behavior |
| test/rubygems/test_gem_resolver_best_set.rb | Update resolver tests to use compact-index style responses |
| test/rubygems/test_gem_resolver_api_set.rb | Update APISet tests to use compact-index style responses across scenarios |
| test/rubygems/test_gem_resolver_api_specification.rb | Add tests for created_at parsing behavior |
| test/rubygems/test_gem_dependency_installer.rb | Ensure install resolution uses APISet/compact index and avoids quick gemspec fetches |
| test/rubygems/test_gem_commands_update_command.rb | Ensure gem update exercises compact-index-backed load_specs |
| test/rubygems/test_gem_commands_outdated_command.rb | Ensure gem outdated exercises compact-index-backed load_specs |
| test/rubygems/test_gem_compact_index_stub.rb | Validate the stub helper speaks the compact-index protocol correctly |
| test/rubygems/test_gem_compact_index_client.rb | Unit tests for compact index client behaviors (names/versions/info/cache) |
| test/rubygems/test_gem_compact_index_client_updater.rb | Unit tests for updater ETag/range/digest verification logic |
| test/rubygems/test_gem_compact_index_client_parser.rb | Unit tests for parser (versions deletions, requirements/deps parsing, checksums) |
| test/rubygems/test_gem_compact_index_client_http_fetcher.rb | Unit tests for HTTP fetcher joining paths, headers, redirects, and errors |
| test/rubygems/test_gem_compact_index_client_cache.rb | Unit tests for cache directory creation, fetch-once semantics, and checksum gating |
| test/rubygems/test_gem_compact_index_client_cache_file.rb | Unit tests for cache file atomic writes/appends and digest mismatch behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+26
to
+29
| name, versions_string, checksum = line.split(" ", 3) | ||
| @info_checksums[name] = checksum || "" | ||
| versions_string.split(",") do |version| | ||
| delete = version.delete_prefix!("-") |
Comment on lines
+81
to
+86
| def parse_digests(response) | ||
| return unless header = response["Repr-Digest"] || response["Digest"] | ||
| digests = {} | ||
| header.split(",") do |param| | ||
| algorithm, value = param.split("=", 2) | ||
| algorithm.strip! |
Comment on lines
+115
to
+124
| def parse_created_at(value) | ||
| value = value.first if value.is_a?(Array) | ||
| return unless value.is_a?(String) | ||
|
|
||
| begin | ||
| Time.new(value) | ||
| rescue ArgumentError | ||
| nil | ||
| end | ||
| end |
Comment on lines
+43
to
+46
| spec = Gem::Resolver::APISpecification.new set, data | ||
|
|
||
| assert_equal Time.new("2026-06-05T10:30:45Z"), spec.created_at | ||
| end |
First piece of a RubyGems-side compact index client, ported from Bundler::CompactIndexClient. The two implementations are intentionally kept separate so that gem commands can adopt the compact index without touching Bundler. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Keeps cached compact index files in sync with the server using ETag conditional requests and ranged requests, verifying Repr-Digest checksums. Unlike the Bundler version, checksum and gzip failures raise Gem::CompactIndexClient errors instead of Bundler::HTTPError. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Manages the on-disk cache layout (versions, names, info/*, etags) and delegates fetching to the Updater, deduplicating endpoint fetches per process. Also adds the DEBUG_COMPACT_INDEX debug logger to the client. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Parses the versions index (including deletion lines and per-gem info checksums) and info files. Reuses Gem::Resolver::APISet::GemParser for info lines, which already preserves compact index v2 metadata such as created_at. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Completes the client facade: names, versions, info, dependencies, latest_version, available? and reset! on top of Cache and Parser. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adapts Gem::RemoteFetcher to the fetcher interface expected by the compact index client. RemoteFetcher#fetch_path only supports If-Modified-Since, while the compact index needs ETag conditional requests and ranged requests, so this issues requests directly through RemoteFetcher#request. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
util_setup_compact_index serves versions, names and info/NAME on the FakeFetcher with consistent MD5/SHA-256 checksums, ETags and optional created_at v2 metadata, so functional tests can drive gem commands against a stubbed compact index. Verified against the real Gem::CompactIndexClient. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fetches a single gem's info file with an ETag conditional request, without consulting the versions index. The versions index download only pays off when most gems are needed; for gem install, fetching just the required info files keeps the first-use cost at the current level while still benefiting from the disk cache. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Info files are now cached on disk under Gem.spec_cache_dir and refreshed with ETag conditional requests instead of being downloaded in full on every resolution. APISet keeps fetching only the info files it needs (via fetch_info) rather than the whole versions index, so the first-use cost stays at the current level. When the user's home is not safely writable the cache falls back to a temporary directory. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The compact index v2 publishes per-version creation timestamps which the parser already preserves. Keep them on resolver specifications so features like cooldown can consult publish dates during resolution. Sources without timestamps leave created_at nil. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The compact index info file carries everything needed to download and install a gem, so materializing the resolved specification no longer fetches the Marshal gemspec from /quick/. Development dependencies are not part of the info file; fetch_development_dependencies still fetches the full gemspec for --development installs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Gem::Source#load_specs now builds the released, latest and prerelease name tuple lists from the compact index versions file, falling back to the Marshal spec indexes when the source does not provide a usable compact index. This moves gem update, outdated, list and search off Marshal data for compact index sources. The client construction moves to Gem::Source so APISet and load_specs share one client and disk cache per source. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
End-to-end tests driving both commands against a stubbed compact index with no usable Marshal data, proving the SpecFetcher path works through Gem::Source#load_specs. The stubbed versions response now carries a response uri so the dependency_resolver_set probe and the compact index fetch share one endpoint, as on real servers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A Digest or Repr-Digest parameter without a value (no "=") made byte_sequence raise NoMethodError on nil, failing the whole fetch when a mirror or proxy sends a broken header. The same flaw exists in the Bundler implementation this was ported from. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both methods computed the rubygems.org to index.rubygems.org rewrite independently; keep the logic in one place. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pathname is built into Ruby 4.0+, so only require the library when the constant is not already available. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pathname#write uses text mode, so on Windows the LF in fixture data became CRLF, shifting the file size the Range header is computed from and breaking the MD5/SHA-256 checksums. The client itself is unaffected since CacheFile always writes in binary mode. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was the end-user or developer problem that led to this PR?
To build the cooldown option (#9113) for
gemcommands, RubyGems needs the compact index: the per-versioncreated_attimestamps only exist there.What is your fix for the problem, implemented in this PR?
This adds
Gem::CompactIndexClientwith a disk cache (ETag conditional and ranged requests,Repr-Digestverification) and movesgem installresolution andGem::Source#load_specs(update/outdated/list/search) onto it, falling back to the Marshal indexes when a source does not provide a compact index.The client is an independent implementation rather than a reuse of Bundler's. Keeping them separate means this change cannot affect Bundler's behavior, which makes regressions easy to isolate while this side stabilizes; the duplication is accepted for now.
Gem::CompactIndexClientitself is a near-verbatim port ofBundler::CompactIndexClient, differing only in dropping Bundler-specific dependencies and adding a couple of small APIs the gem commands need. It also fixes a few minor bugs that still exist in Bundler's copy, which are worth fixing there separately.Two limitations are left in on purpose.
Gem::Source#fetch_spec(/quickMarshal gemspecs) remains for commands that need a full gemspec (gem specification -r,gem list -r -d,--development), since the compact index does not carry that data. Those paths are low-frequency and go throughGem::SafeMarshal.gem updateandgem outdatedquietly wrong; an explicit offline mode can be discussed separately.Make sure the following tasks are checked