From 8466c3e82b9ea63a71f7040ba8148c576960b56b Mon Sep 17 00:00:00 2001 From: Paul Mars Date: Thu, 23 Apr 2026 17:11:19 +0200 Subject: [PATCH 1/3] ai: prepare for agentic workflows --- AGENTS.md | 5 +++++ cmd/kb/AGENTS.md | 23 +++++++++++++++++++++++ internal/kb/AGENTS.md | 38 ++++++++++++++++++++++++++++++++++++++ kb/AGENTS.md | 41 +++++++++++++++++++++++++++++++++++++++++ public/kb/AGENTS.md | 12 ++++++++++++ snap/kb/AGENTS.md | 7 +++++++ tests/kb/AGENTS.md | 28 ++++++++++++++++++++++++++++ 7 files changed, 154 insertions(+) create mode 100644 AGENTS.md create mode 100644 cmd/kb/AGENTS.md create mode 100644 internal/kb/AGENTS.md create mode 100644 kb/AGENTS.md create mode 100644 public/kb/AGENTS.md create mode 100644 snap/kb/AGENTS.md create mode 100644 tests/kb/AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..27f8ef62 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,5 @@ +# Knowledge Base + +This project uses a decentralized knowledge base system stored in `kb/` directories across various project subdirectories. + +When navigating this codebase or modifying files, you MUST always start by reading the root `kb/AGENTS.md` index file and any local `kb/AGENTS.md` files related to the directories you are working in. All rules for maintaining and updating the knowledge base are documented in the root `kb/AGENTS.md` file. diff --git a/cmd/kb/AGENTS.md b/cmd/kb/AGENTS.md new file mode 100644 index 00000000..1ebcdbf2 --- /dev/null +++ b/cmd/kb/AGENTS.md @@ -0,0 +1,23 @@ +# Preface + +This document serves as the local knowledge base index for the `cmd/` directory. It outlines the scope and purpose of the entry points for the `chisel` binary. + +# Overview + +The `cmd/` directory contains the main binary entry points for the `chisel` application. Instead of housing core business logic, this directory is responsible for initializing the application, wiring up the CLI subsystem, and handling version information. Core logic is deferred to the packages in `internal/`. + +# Directory + +- `chisel/` - Contains the primary executable entry point for the `chisel` application. +- `chisel/main.go` - Application bootstrap and CLI initialization using `jessevdk/go-flags`. +- `chisel/cmd_cut.go` - Implements the `cut` command, the primary operation that extracts slices into a target root filesystem. +- `chisel/cmd_find.go` - Implements the `find` command for searching available slices by name or package. +- `chisel/cmd_info.go` - Implements the `info` command for displaying detailed metadata about specific slices. +- `chisel/cmd_debug.go` - Implements the `debug` subcommand group for internal diagnostics. +- `chisel/cmd_debug_check_release_archives.go` - Implements `debug check-release-archives`, which validates archive configurations in a chisel-releases tree. +- `chisel/cmd_version.go` - Implements the `version` command. +- `chisel/cmd_help.go` - Implements the `help` command. +- `chisel/helpers.go` - Shared CLI utilities used across commands. +- `chisel/log.go` - Logging setup for the CLI layer. +- `mkversion.sh` - Shell script used for generating version information during builds. +- `version.go` - Application version data structures and variables. diff --git a/internal/kb/AGENTS.md b/internal/kb/AGENTS.md new file mode 100644 index 00000000..72bd3e9d --- /dev/null +++ b/internal/kb/AGENTS.md @@ -0,0 +1,38 @@ +# Preface + +This is the index file for the `internal/` directory's knowledge base. It provides context about the core internal packages of Chisel, encompassing slice orchestration, package setup, extraction, archive fetching, caching, filesystem operations, manifest generation, and supporting utilities. + +# Overview + +The `internal/` directory houses the core business logic, components, and utilities of Chisel. It coordinates slice selection, dependency resolution, package fetching, file extraction, filesystem mutations, and manifest output. These packages are not part of Chisel's public API and must not be imported by external consumers. + +# Directory + +- `slicer/` - Main orchestrator for a Chisel run. Receives a slice selection, drives all other internal packages (setup, archive, cache, deb, fsutil, scripts, manifestutil) to completion, and writes the final filesystem and manifest. +- `setup/` - Parses chisel-releases YAML slice definitions, builds the `Release` data model (packages, slices, dependencies), and performs dependency resolution using Tarjan's topological sort algorithm. Also handles fetching package index metadata from remote archives. +- `deb/` - Extracts files from `.deb` archives (AR format with tar/gzip/xz/zstd inner layers). Handles multiple compression formats and preserves file permissions and ordering. +- `archive/` - Manages remote Ubuntu package archive sources over HTTP/HTTPS. Handles PGP signature verification of package indices, credential management for authenticated repositories (e.g. Ubuntu Pro), and HTTP-level caching. +- `cache/` - Content-addressable on-disk cache keyed by SHA256 digest. Stores extracted files and uses hardlinks to the target filesystem to avoid redundant copies. Respects `XDG_CACHE_HOME`. +- `fsutil/` - Core filesystem operations for writing files, directories, and symlinks into the target root filesystem, with correct ownership, permissions, and SHA256 generation during writes. +- `manifestutil/` - Generates the Chisel manifest: a ZSTD-compressed file in jsonwall format recording every installed package, slice, and file. The default filename is `manifest.wall`. +- `scripts/` - Executes Starlark mutation scripts defined in slice definitions. Scripts run after extraction and can transform or clean up files within the target filesystem. +- `control/` - Fast, memory-efficient parser for Debian control files (the metadata sections embedded in `.deb` archives). Uses a two-pass approach: index sections first, then retrieve fields directly on access. +- `strdist/` - String distance and glob matching utilities. Implements a configurable edit-distance algorithm (`Distance`) with pluggable cost functions, and a `GlobPath` function that uses that algorithm to match file paths against patterns supporting `?`, `*`, and `**` wildcards. +- `pgputil/` - Decodes and validates PGP signatures on package archive metadata. Wraps `golang.org/x/crypto/openpgp`. +- `testutil/` - Shared test helpers used across unit tests: mock archive builders, composable content checkers, file presence and permission validators, tree dumpers, and permutation utilities. +- `apacheutil/` - Shared slice-naming utilities (`SliceKey`, name-format regexps, `ParseSliceKey`). The "apache" prefix signals that this package carries an Apache-2.0 license, which is required because it is a transitive dependency of the `public/` packages; a CI script enforces Apache-2.0 on all internal packages reachable from `public/`. +- `apachetestutil/` - Test helpers for reading manifest contents (`DumpManifestContents`), carrying the same Apache-2.0 license requirement as `apacheutil/` because it is depended on by tests in the `public/` packages. + +# Architecture + +Chisel's internal packages form a directed dependency chain driven by `slicer/`: + +``` +slicer/ → setup/ (resolve slice deps) → archive/ (fetch package indices) + → cache/ + deb/ (extract .deb to cache by SHA256) + → fsutil/ (hardlink/copy from cache to target rootfs) + → scripts/ (apply Starlark mutations) + → manifestutil/ (write manifest.wall) +``` + +`control/` is used by `deb/` and `archive/` for parsing Debian metadata. `pgputil/` is used by `archive/` for signature verification. `strdist/` is used at the CLI layer and in `setup/` for error reporting. `testutil/` is test-only and has no production dependents. diff --git a/kb/AGENTS.md b/kb/AGENTS.md new file mode 100644 index 00000000..ac29b918 --- /dev/null +++ b/kb/AGENTS.md @@ -0,0 +1,41 @@ +# Preface + +This is the root index file of the dispersed knowledge base (`kb/`). It serves as the main entry point to understand system architecture, navigate different subsystems, and maintain AI-friendly documentation conventions. + +# Overview + +This is the root `kb/AGENTS.md` file for this repository, and every relevant directory has its own `kb/` directory and `AGENTS.md` file. Specific information is found in other `kb/*.md` files with dashed low case names (e.g. `kb/special-relativity.md`). + +The design of this structure has the following key goals: + +- **Mechanical** - Agents are the main actors reading and writing the knowledge base. +- **Generic** - Benefits any agentic workflow, no matter the editor or platform. +- **Distilled** - Avoids the use of verbose task logs that pollute the context window. +- **Hierarchical** - Avoids excessive information in a single place that also pollutes the context window. +- **Human** - Information is readily available in a useful readable format. + +# Important + +- Read local `kb/AGENTS.md` files upon navigating directories. +- Keep the `kb/` files updated whenever there is something relevant to be documented. +- Follow the "Preface" and header conventions. The presence of headers other than "Preface" is optional and should be used only when needed. Omit optional empty or trivial headers everywhere instead of using placeholder text. + +# Headers + +Every header used across the `kb/*.md` files in this repository MUST be documented here to maintain semantic standardizations. + +* `Preface`: A brief introduction outlining the scope and relevance of a specific `.md` file, present precisely at the top of the file to aid quick AI parsing. **Required** in all `.md` files. +* `Overview`: High-level summary of the directory, subsystem or knowledge base layout at large. Do not use this as an index of the directory contents (see `Index`). +* `Important`: Essential directives outlining critical constraints, behaviors, or rules. +* `Headers`: Global registry of header definitions, uniquely hosted at the root `kb/AGENTS.md`. Do NOT use this header in any other document. +* `Architecture`: Structural design details or boundary explanations for a given component. Only use this for software architecture concepts, NOT for defining filesystem layouts. +* `Directory`: Details describing the contents and structure of the current directory, and potentially nested directories. Use this instead of Architecture when discussing filesystem layouts. Format items as a list, starting with the file or directory name surrounded by backticks, a hyphen, and then its description (e.g. `- \`filename\` - Description`). +* `Index`: A list exclusively indexing local `.md` files or nested `kb/AGENTS.md` child files. Only to be used in `AGENTS.md` index files and MUST be placed at the very end of the file. Other references, local or otherwise, are okay but must be inlined where they were naturally mentioned. Format items as a list, starting with the file or directory name surrounded by backticks, a hyphen, and then a brief description (e.g. `- \`filename\` - Description`). + +# Index + +- `../cmd/kb/AGENTS.md` - CLI entry points +- `../internal/kb/AGENTS.md` - Core internal packages +- `../public/kb/AGENTS.md` - Public API packages +- `../tests/kb/AGENTS.md` - Integration testing +- `../snap/kb/AGENTS.md` - Snap packaging diff --git a/public/kb/AGENTS.md b/public/kb/AGENTS.md new file mode 100644 index 00000000..4e5e1c77 --- /dev/null +++ b/public/kb/AGENTS.md @@ -0,0 +1,12 @@ +# Preface + +This document describes the scope of the `public/` directory, which contains Chisel's public API packages intended for consumption by external tools. + +# Overview + +The `public/` directory houses the two packages that form Chisel's stable public contract. These packages define the on-disk format and data schema for the Chisel manifest, enabling third-party tools such as SBOM generators and vulnerability scanners to consume Chisel output without depending on internal packages. + +# Directory + +- `jsonwall/` - Defines and implements the "jsonwall" database format: a ZSTD-compressed text file with one JSON object per line, with fields sorted for efficient search and iteration. This is the storage format used for the Chisel manifest. +- `manifest/` - Defines the manifest entry schema (schema version 1.0), including the `Package`, `Slice`, and `File` record types with their fields (`Kind`, `Name`, `Version`, `Digest`, `Arch`, etc.). Integrates with `jsonwall` for serialization and deserialization. diff --git a/snap/kb/AGENTS.md b/snap/kb/AGENTS.md new file mode 100644 index 00000000..ade4b7f0 --- /dev/null +++ b/snap/kb/AGENTS.md @@ -0,0 +1,7 @@ +# Preface + +This document describes the scope of the `snap/` directory within the Chisel repository, providing context for automated agents navigating and modifying the snapcraft configuration. + +# Directory + +- `snapcraft.yaml` - The primary manifest file defining the snap package. It configures classic confinement, utilizes the Go plugin, and outlines the build steps which depend on `cmd/mkversion.sh` for version injection. diff --git a/tests/kb/AGENTS.md b/tests/kb/AGENTS.md new file mode 100644 index 00000000..927f9d10 --- /dev/null +++ b/tests/kb/AGENTS.md @@ -0,0 +1,28 @@ +# Preface + +This file serves as the local knowledge base index for the `tests/` directory. It outlines the scope and procedures for end-to-end integration testing within the Chisel repository. + +# Overview + +The `tests/` directory contains the integration test suite for the project. These tests exercise the actual `chisel` binary against real or mock Ubuntu package archives, verifying end-to-end behaviors such as slice extraction, manifest generation, and error handling. The suite is built on the [Spread](https://github.com/canonical/spread) framework for multi-system test execution. + +# Important + +- **Execution**: Integration tests are run with `spread`, not with `go test`. Each test scenario is a subdirectory containing a `task.yaml` with shell-based assertions. +- **No build tags**: Unlike Go-based integration test suites, these tests require no `//go:build` directives. They are entirely shell-driven. +- **Pre-built binary**: Spread compiles and provisions the `chisel` binary as part of the test environment setup defined in `spread.yaml` at the repository root. + +# Directory + +- `basic/` - Core slice extraction scenario verifying that files are correctly written to the target root filesystem and that mutation scripts are applied. +- `find/` - Tests for the `chisel find` command, covering search by slice name and package. +- `info/` - Tests for the `chisel info` command, verifying detailed slice metadata output. +- `debug-check-release-archives/` - Tests for the `chisel debug check-release-archives` command, validating archive configuration correctness. +- `pro-archives/` - Tests covering Ubuntu Pro subscription archive support (fips, fips-updates, esm-apps, esm-infra). +- `use-a-custom-chisel-release/` - Tests the ability to override the default chisel-releases with a custom release tree. +- `unmaintained/` - Edge-case tests for packages whose support window has ended. +- `unstable/` - Edge-case tests for packages from unstable or unsupported releases. + +# Architecture + +Each test scenario is a directory containing a `task.yaml` that defines the test steps as shell commands with assertions (typically `grep`-based). Spread handles multi-system provisioning, binary deployment, and test execution across different Ubuntu releases as configured in the root `spread.yaml`. From ae7adea6b720662c1ed8aa25cf90d1a0281db4ae Mon Sep 17 00:00:00 2001 From: Paul Mars Date: Fri, 24 Apr 2026 16:01:32 +0200 Subject: [PATCH 2/3] ci: properly look for go public packages --- .github/scripts/external-packages-license-check.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/scripts/external-packages-license-check.go b/.github/scripts/external-packages-license-check.go index 77745697..03c4b3d7 100644 --- a/.github/scripts/external-packages-license-check.go +++ b/.github/scripts/external-packages-license-check.go @@ -59,7 +59,7 @@ func run() error { } // Check the internal dependencies of the external packages. - output, err := exec.Command("sh", "-c", "go list -deps -test ./public/*").Output() + output, err := exec.Command("sh", "-c", "go list -deps -test ./public/...").Output() if err != nil { return err } From 534c231fb030540d676ea461e62f6cb0871caebc Mon Sep 17 00:00:00 2001 From: Paul Mars Date: Mon, 27 Apr 2026 15:59:45 +0200 Subject: [PATCH 3/3] ai: correct pkg descriptions --- internal/kb/AGENTS.md | 41 +++++++++++++++++++++++++++++++---------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/internal/kb/AGENTS.md b/internal/kb/AGENTS.md index 72bd3e9d..9733cd9e 100644 --- a/internal/kb/AGENTS.md +++ b/internal/kb/AGENTS.md @@ -9,10 +9,10 @@ The `internal/` directory houses the core business logic, components, and utilit # Directory - `slicer/` - Main orchestrator for a Chisel run. Receives a slice selection, drives all other internal packages (setup, archive, cache, deb, fsutil, scripts, manifestutil) to completion, and writes the final filesystem and manifest. -- `setup/` - Parses chisel-releases YAML slice definitions, builds the `Release` data model (packages, slices, dependencies), and performs dependency resolution using Tarjan's topological sort algorithm. Also handles fetching package index metadata from remote archives. +- `setup/` - Fetches and parses chisel-releases YAML slice definitions, building the `Release` data model (packages, slices, archives). Performs slice dependency resolution, uses Tarjan's topological sort to detect cycles, validates path conflicts across slices, and resolves same-path contention between packages using `prefer` relationships. - `deb/` - Extracts files from `.deb` archives (AR format with tar/gzip/xz/zstd inner layers). Handles multiple compression formats and preserves file permissions and ordering. - `archive/` - Manages remote Ubuntu package archive sources over HTTP/HTTPS. Handles PGP signature verification of package indices, credential management for authenticated repositories (e.g. Ubuntu Pro), and HTTP-level caching. -- `cache/` - Content-addressable on-disk cache keyed by SHA256 digest. Stores extracted files and uses hardlinks to the target filesystem to avoid redundant copies. Respects `XDG_CACHE_HOME`. +- `cache/` - Content-addressable on-disk store keyed by SHA256 digest, with time-based eviction. Resolves the cache directory from `XDG_CACHE_HOME`. - `fsutil/` - Core filesystem operations for writing files, directories, and symlinks into the target root filesystem, with correct ownership, permissions, and SHA256 generation during writes. - `manifestutil/` - Generates the Chisel manifest: a ZSTD-compressed file in jsonwall format recording every installed package, slice, and file. The default filename is `manifest.wall`. - `scripts/` - Executes Starlark mutation scripts defined in slice definitions. Scripts run after extraction and can transform or clean up files within the target filesystem. @@ -27,12 +27,33 @@ The `internal/` directory houses the core business logic, components, and utilit Chisel's internal packages form a directed dependency chain driven by `slicer/`: +```mermaid +flowchart LR + slicer["slicer
Orchestrator"] + + subgraph logic["Core Logic"] + setup["setup
Release parsing, dep resolution,
conflict detection"] + archive["archive
Ubuntu archive HTTP client"] + manifestutil["manifestutil
Manifest writer"] + scripts["scripts
Starlark mutations"] + end + + subgraph base["Extraction & Storage"] + deb["deb
.deb file extractor"] + fsutil["fsutil
Filesystem writer"] + cache["cache
Content-addressable store"] + end + + subgraph util["Utilities"] + control["control
Debian control parser"] + pgputil["pgputil
PGP verification"] + strdist["strdist
Glob & distance matching"] + end + + slicer --> setup & archive & deb & fsutil & scripts & manifestutil + setup --> archive & deb & cache & strdist + manifestutil --> archive & setup + archive --> cache & control & pgputil & deb + deb --> fsutil & strdist + scripts --> fsutil ``` -slicer/ → setup/ (resolve slice deps) → archive/ (fetch package indices) - → cache/ + deb/ (extract .deb to cache by SHA256) - → fsutil/ (hardlink/copy from cache to target rootfs) - → scripts/ (apply Starlark mutations) - → manifestutil/ (write manifest.wall) -``` - -`control/` is used by `deb/` and `archive/` for parsing Debian metadata. `pgputil/` is used by `archive/` for signature verification. `strdist/` is used at the CLI layer and in `setup/` for error reporting. `testutil/` is test-only and has no production dependents.