Skip to content

Adopt the compact index for gem commands#9606

Open
hsbt wants to merge 19 commits into
masterfrom
gem-compact-index
Open

Adopt the compact index for gem commands#9606
hsbt wants to merge 19 commits into
masterfrom
gem-compact-index

Conversation

@hsbt

@hsbt hsbt commented Jun 10, 2026

Copy link
Copy Markdown
Member

What was the end-user or developer problem that led to this PR?

To build the cooldown option (#9113) for gem commands, RubyGems needs the compact index: the per-version created_at timestamps only exist there.

What is your fix for the problem, implemented in this PR?

This adds Gem::CompactIndexClient with a disk cache (ETag conditional and ranged requests, Repr-Digest verification) and moves gem install resolution and Gem::Source#load_specs (update/outdated/list/search) onto it, falling back to the Marshal indexes when a source does not provide a compact index.

The client is an independent implementation rather than a reuse of Bundler's. Keeping them separate means this change cannot affect Bundler's behavior, which makes regressions easy to isolate while this side stabilizes; the duplication is accepted for now.

Gem::CompactIndexClient itself is a near-verbatim port of Bundler::CompactIndexClient, differing only in dropping Bundler-specific dependencies and adding a couple of small APIs the gem commands need. It also fixes a few minor bugs that still exist in Bundler's copy, which are worth fixing there separately.

Two limitations are left in on purpose.

  • Gem::Source#fetch_spec (/quick Marshal gemspecs) remains for commands that need a full gemspec (gem specification -r, gem list -r -d, --development), since the compact index does not carry that data. Those paths are low-frequency and go through Gem::SafeMarshal.
  • When the network is unreachable we fail exactly as before instead of silently serving a stale cached index, which would make gem update and gem outdated quietly wrong; an explicit offline mode can be discussed separately.

Make sure the following tasks are checked

Copilot AI review requested due to automatic review settings June 10, 2026 07:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adopts the Compact Index protocol for RubyGems gem commands by introducing a RubyGems-native Gem::CompactIndexClient (with disk caching, ETag/ranged requests, and Repr-Digest verification) and routing Gem::Source#load_specs and gem install resolution through it, with fallback to Marshal indexes when compact index is unavailable.

Changes:

  • Add Gem::CompactIndexClient (cache, parser, HTTP fetcher, updater, cache-file writer) and wire it into Gem::Source and Gem::Resolver::APISet.
  • Extend resolver specs to carry created_at metadata from compact index v2 and adjust API spec behavior to avoid fetching Marshal gemspecs for install paths.
  • Add/adjust tests and test helpers to stub compact-index endpoints and validate caching, redirect handling, and digest verification behavior.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
lib/rubygems/source.rb Prefer compact index for load_specs, add compact-index client + caching directory logic, keep Marshal fallback
lib/rubygems/resolver/specification.rb Add created_at attribute to resolver specs
lib/rubygems/resolver/api_specification.rb Populate created_at; build stub Gem::Specification from compact-index data
lib/rubygems/resolver/api_set.rb Switch version fetching/parsing to use compact-index client fetch_info
lib/rubygems/compact_index_client.rb New top-level client coordinating cache + parser
lib/rubygems/compact_index_client/cache.rb New on-disk cache wrapper with per-process fetch deduping
lib/rubygems/compact_index_client/cache_file.rb New safe cache-file writer with optional digest verification
lib/rubygems/compact_index_client/http_fetcher.rb New fetcher built on Gem::RemoteFetcher with redirect handling
lib/rubygems/compact_index_client/parser.rb New parser for names/versions/info files using existing gem-parser logic
lib/rubygems/compact_index_client/updater.rb New updater implementing ETag + ranged requests + digest verification
Manifest.txt Add new compact index client files to the manifest
test/rubygems/helper.rb Add compact-index test helper endpoints and response builder
test/rubygems/test_gem_source.rb Add compact-index coverage for Gem::Source#load_specs + caching behavior
test/rubygems/test_gem_resolver_best_set.rb Update resolver tests to use compact-index style responses
test/rubygems/test_gem_resolver_api_set.rb Update APISet tests to use compact-index style responses across scenarios
test/rubygems/test_gem_resolver_api_specification.rb Add tests for created_at parsing behavior
test/rubygems/test_gem_dependency_installer.rb Ensure install resolution uses APISet/compact index and avoids quick gemspec fetches
test/rubygems/test_gem_commands_update_command.rb Ensure gem update exercises compact-index-backed load_specs
test/rubygems/test_gem_commands_outdated_command.rb Ensure gem outdated exercises compact-index-backed load_specs
test/rubygems/test_gem_compact_index_stub.rb Validate the stub helper speaks the compact-index protocol correctly
test/rubygems/test_gem_compact_index_client.rb Unit tests for compact index client behaviors (names/versions/info/cache)
test/rubygems/test_gem_compact_index_client_updater.rb Unit tests for updater ETag/range/digest verification logic
test/rubygems/test_gem_compact_index_client_parser.rb Unit tests for parser (versions deletions, requirements/deps parsing, checksums)
test/rubygems/test_gem_compact_index_client_http_fetcher.rb Unit tests for HTTP fetcher joining paths, headers, redirects, and errors
test/rubygems/test_gem_compact_index_client_cache.rb Unit tests for cache directory creation, fetch-once semantics, and checksum gating
test/rubygems/test_gem_compact_index_client_cache_file.rb Unit tests for cache file atomic writes/appends and digest mismatch behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +26 to +29
name, versions_string, checksum = line.split(" ", 3)
@info_checksums[name] = checksum || ""
versions_string.split(",") do |version|
delete = version.delete_prefix!("-")
Comment on lines +81 to +86
def parse_digests(response)
return unless header = response["Repr-Digest"] || response["Digest"]
digests = {}
header.split(",") do |param|
algorithm, value = param.split("=", 2)
algorithm.strip!
Comment on lines +115 to +124
def parse_created_at(value)
value = value.first if value.is_a?(Array)
return unless value.is_a?(String)

begin
Time.new(value)
rescue ArgumentError
nil
end
end
Comment on lines +43 to +46
spec = Gem::Resolver::APISpecification.new set, data

assert_equal Time.new("2026-06-05T10:30:45Z"), spec.created_at
end
hsbt and others added 19 commits June 11, 2026 20:14
First piece of a RubyGems-side compact index client, ported from
Bundler::CompactIndexClient. The two implementations are intentionally
kept separate so that gem commands can adopt the compact index without
touching Bundler.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Keeps cached compact index files in sync with the server using ETag
conditional requests and ranged requests, verifying Repr-Digest
checksums. Unlike the Bundler version, checksum and gzip failures
raise Gem::CompactIndexClient errors instead of Bundler::HTTPError.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Manages the on-disk cache layout (versions, names, info/*, etags) and
delegates fetching to the Updater, deduplicating endpoint fetches per
process. Also adds the DEBUG_COMPACT_INDEX debug logger to the client.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Parses the versions index (including deletion lines and per-gem info
checksums) and info files. Reuses Gem::Resolver::APISet::GemParser for
info lines, which already preserves compact index v2 metadata such as
created_at.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Completes the client facade: names, versions, info, dependencies,
latest_version, available? and reset! on top of Cache and Parser.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adapts Gem::RemoteFetcher to the fetcher interface expected by the
compact index client. RemoteFetcher#fetch_path only supports
If-Modified-Since, while the compact index needs ETag conditional
requests and ranged requests, so this issues requests directly through
RemoteFetcher#request.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
util_setup_compact_index serves versions, names and info/NAME on the
FakeFetcher with consistent MD5/SHA-256 checksums, ETags and optional
created_at v2 metadata, so functional tests can drive gem commands
against a stubbed compact index. Verified against the real
Gem::CompactIndexClient.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Fetches a single gem's info file with an ETag conditional request,
without consulting the versions index. The versions index download
only pays off when most gems are needed; for gem install, fetching
just the required info files keeps the first-use cost at the current
level while still benefiting from the disk cache.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Info files are now cached on disk under Gem.spec_cache_dir and
refreshed with ETag conditional requests instead of being downloaded
in full on every resolution. APISet keeps fetching only the info files
it needs (via fetch_info) rather than the whole versions index, so the
first-use cost stays at the current level. When the user's home is not
safely writable the cache falls back to a temporary directory.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The compact index v2 publishes per-version creation timestamps which
the parser already preserves. Keep them on resolver specifications so
features like cooldown can consult publish dates during resolution.
Sources without timestamps leave created_at nil.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The compact index info file carries everything needed to download and
install a gem, so materializing the resolved specification no longer
fetches the Marshal gemspec from /quick/. Development dependencies are
not part of the info file; fetch_development_dependencies still
fetches the full gemspec for --development installs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Gem::Source#load_specs now builds the released, latest and prerelease
name tuple lists from the compact index versions file, falling back to
the Marshal spec indexes when the source does not provide a usable
compact index. This moves gem update, outdated, list and search off
Marshal data for compact index sources. The client construction moves
to Gem::Source so APISet and load_specs share one client and disk
cache per source.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
End-to-end tests driving both commands against a stubbed compact index
with no usable Marshal data, proving the SpecFetcher path works through
Gem::Source#load_specs. The stubbed versions response now carries a
response uri so the dependency_resolver_set probe and the compact index
fetch share one endpoint, as on real servers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A Digest or Repr-Digest parameter without a value (no "=") made
byte_sequence raise NoMethodError on nil, failing the whole fetch when
a mirror or proxy sends a broken header. The same flaw exists in the
Bundler implementation this was ported from.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Both methods computed the rubygems.org to index.rubygems.org rewrite
independently; keep the logic in one place.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pathname is built into Ruby 4.0+, so only require the library when the
constant is not already available.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pathname#write uses text mode, so on Windows the LF in fixture data
became CRLF, shifting the file size the Range header is computed from
and breaking the MD5/SHA-256 checksums. The client itself is
unaffected since CacheFile always writes in binary mode.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@hsbt hsbt force-pushed the gem-compact-index branch from 343ee2c to a29a32f Compare June 11, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants