Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/scripts/external-packages-license-check.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ func run() error {
}

// Check the internal dependencies of the external packages.
output, err := exec.Command("sh", "-c", "go list -deps -test ./public/*").Output()
output, err := exec.Command("sh", "-c", "go list -deps -test ./public/...").Output()
if err != nil {
return err
}
Expand Down
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Knowledge Base

This project uses a decentralized knowledge base system stored in `kb/` directories across various project subdirectories.

When navigating this codebase or modifying files, you MUST always start by reading the root `kb/AGENTS.md` index file and any local `kb/AGENTS.md` files related to the directories you are working in. All rules for maintaining and updating the knowledge base are documented in the root `kb/AGENTS.md` file.
23 changes: 23 additions & 0 deletions cmd/kb/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Preface

This document serves as the local knowledge base index for the `cmd/` directory. It outlines the scope and purpose of the entry points for the `chisel` binary.

# Overview

The `cmd/` directory contains the main binary entry points for the `chisel` application. Instead of housing core business logic, this directory is responsible for initializing the application, wiring up the CLI subsystem, and handling version information. Core logic is deferred to the packages in `internal/`.

# Directory

- `chisel/` - Contains the primary executable entry point for the `chisel` application.
- `chisel/main.go` - Application bootstrap and CLI initialization using `jessevdk/go-flags`.
- `chisel/cmd_cut.go` - Implements the `cut` command, the primary operation that extracts slices into a target root filesystem.
- `chisel/cmd_find.go` - Implements the `find` command for searching available slices by name or package.
- `chisel/cmd_info.go` - Implements the `info` command for displaying detailed metadata about specific slices.
- `chisel/cmd_debug.go` - Implements the `debug` subcommand group for internal diagnostics.
- `chisel/cmd_debug_check_release_archives.go` - Implements `debug check-release-archives`, which validates archive configurations in a chisel-releases tree.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit imprecise. I am not sure what is the expected level of detail at this point, it is my understanding this is a rough high level overview per-file and the "knowledge database" will grow to contain more specific instructions later on.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit imprecise.

I agree but I understand the goal is not to be super precise here, but to help the agent knows if going deeper in a specific direction is relevant for whatever task is at hand.

it is my understanding this is a rough high level overview per-file and the "knowledge database" will grow to contain more specific instructions later on.

This is also my understanding, but the proper level of detail is yet to be defined (or at least I still need to form a solid understanding of it). The goal of this PR is to bootstrap the infra with basic knowledge and then play with agents to see how well they work with it.

- `chisel/cmd_version.go` - Implements the `version` command.
- `chisel/cmd_help.go` - Implements the `help` command.
- `chisel/helpers.go` - Shared CLI utilities used across commands.
- `chisel/log.go` - Logging setup for the CLI layer.
- `mkversion.sh` - Shell script used for generating version information during builds.
- `version.go` - Application version data structures and variables.
59 changes: 59 additions & 0 deletions internal/kb/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Preface

This is the index file for the `internal/` directory's knowledge base. It provides context about the core internal packages of Chisel, encompassing slice orchestration, package setup, extraction, archive fetching, caching, filesystem operations, manifest generation, and supporting utilities.

# Overview

The `internal/` directory houses the core business logic, components, and utilities of Chisel. It coordinates slice selection, dependency resolution, package fetching, file extraction, filesystem mutations, and manifest output. These packages are not part of Chisel's public API and must not be imported by external consumers.

# Directory

- `slicer/` - Main orchestrator for a Chisel run. Receives a slice selection, drives all other internal packages (setup, archive, cache, deb, fsutil, scripts, manifestutil) to completion, and writes the final filesystem and manifest.
- `setup/` - Fetches and parses chisel-releases YAML slice definitions, building the `Release` data model (packages, slices, archives). Performs slice dependency resolution, uses Tarjan's topological sort to detect cycles, validates path conflicts across slices, and resolves same-path contention between packages using `prefer` relationships.
- `deb/` - Extracts files from `.deb` archives (AR format with tar/gzip/xz/zstd inner layers). Handles multiple compression formats and preserves file permissions and ordering.
- `archive/` - Manages remote Ubuntu package archive sources over HTTP/HTTPS. Handles PGP signature verification of package indices, credential management for authenticated repositories (e.g. Ubuntu Pro), and HTTP-level caching.
- `cache/` - Content-addressable on-disk store keyed by SHA256 digest, with time-based eviction. Resolves the cache directory from `XDG_CACHE_HOME`.
- `fsutil/` - Core filesystem operations for writing files, directories, and symlinks into the target root filesystem, with correct ownership, permissions, and SHA256 generation during writes.
- `manifestutil/` - Generates the Chisel manifest: a ZSTD-compressed file in jsonwall format recording every installed package, slice, and file. The default filename is `manifest.wall`.
- `scripts/` - Executes Starlark mutation scripts defined in slice definitions. Scripts run after extraction and can transform or clean up files within the target filesystem.
- `control/` - Fast, memory-efficient parser for Debian control files (the metadata sections embedded in `.deb` archives). Uses a two-pass approach: index sections first, then retrieve fields directly on access.
Comment on lines +17 to +19
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are very good already. I am positively surprised.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a mixed bag. The agent was rather good on small packages, doing well-known operations. However it got confused by names on the apache* packages and completely missed the point of these packages until I instructed it to look into the commit history.

- `strdist/` - String distance and glob matching utilities. Implements a configurable edit-distance algorithm (`Distance`) with pluggable cost functions, and a `GlobPath` function that uses that algorithm to match file paths against patterns supporting `?`, `*`, and `**` wildcards.
- `pgputil/` - Decodes and validates PGP signatures on package archive metadata. Wraps `golang.org/x/crypto/openpgp`.
- `testutil/` - Shared test helpers used across unit tests: mock archive builders, composable content checkers, file presence and permission validators, tree dumpers, and permutation utilities.
- `apacheutil/` - Shared slice-naming utilities (`SliceKey`, name-format regexps, `ParseSliceKey`). The "apache" prefix signals that this package carries an Apache-2.0 license, which is required because it is a transitive dependency of the `public/` packages; a CI script enforces Apache-2.0 on all internal packages reachable from `public/`.
- `apachetestutil/` - Test helpers for reading manifest contents (`DumpManifestContents`), carrying the same Apache-2.0 license requirement as `apacheutil/` because it is depended on by tests in the `public/` packages.

# Architecture

Chisel's internal packages form a directed dependency chain driven by `slicer/`:

```mermaid
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note to reviewer]: I am really not sure about the value of this as it can be discovered again via tools or other sources (semantic index, LSP). This can also very quickly become out of date and thus we should probably make sure the agent is instructed to periodically confirm and update this knowledge.

flowchart LR
slicer["slicer<br/>Orchestrator"]

subgraph logic["Core Logic"]
setup["setup<br/>Release parsing, dep resolution,<br/>conflict detection"]
archive["archive<br/>Ubuntu archive HTTP client"]
manifestutil["manifestutil<br/>Manifest writer"]
scripts["scripts<br/>Starlark mutations"]
end

subgraph base["Extraction & Storage"]
deb["deb<br/>.deb file extractor"]
fsutil["fsutil<br/>Filesystem writer"]
cache["cache<br/>Content-addressable store"]
end

subgraph util["Utilities"]
control["control<br/>Debian control parser"]
pgputil["pgputil<br/>PGP verification"]
strdist["strdist<br/>Glob & distance matching"]
end

slicer --> setup & archive & deb & fsutil & scripts & manifestutil
setup --> archive & deb & cache & strdist
manifestutil --> archive & setup
archive --> cache & control & pgputil & deb
deb --> fsutil & strdist
scripts --> fsutil
```
41 changes: 41 additions & 0 deletions kb/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Preface

This is the root index file of the dispersed knowledge base (`kb/`). It serves as the main entry point to understand system architecture, navigate different subsystems, and maintain AI-friendly documentation conventions.

# Overview

This is the root `kb/AGENTS.md` file for this repository, and every relevant directory has its own `kb/` directory and `AGENTS.md` file. Specific information is found in other `kb/*.md` files with dashed low case names (e.g. `kb/special-relativity.md`).

The design of this structure has the following key goals:

- **Mechanical** - Agents are the main actors reading and writing the knowledge base.
- **Generic** - Benefits any agentic workflow, no matter the editor or platform.
- **Distilled** - Avoids the use of verbose task logs that pollute the context window.
- **Hierarchical** - Avoids excessive information in a single place that also pollutes the context window.
- **Human** - Information is readily available in a useful readable format.

# Important

- Read local `kb/AGENTS.md` files upon navigating directories.
- Keep the `kb/` files updated whenever there is something relevant to be documented.
- Follow the "Preface" and header conventions. The presence of headers other than "Preface" is optional and should be used only when needed. Omit optional empty or trivial headers everywhere instead of using placeholder text.

# Headers

Every header used across the `kb/*.md` files in this repository MUST be documented here to maintain semantic standardizations.

* `Preface`: A brief introduction outlining the scope and relevance of a specific `.md` file, present precisely at the top of the file to aid quick AI parsing. **Required** in all `.md` files.
* `Overview`: High-level summary of the directory, subsystem or knowledge base layout at large. Do not use this as an index of the directory contents (see `Index`).
* `Important`: Essential directives outlining critical constraints, behaviors, or rules.
* `Headers`: Global registry of header definitions, uniquely hosted at the root `kb/AGENTS.md`. Do NOT use this header in any other document.
* `Architecture`: Structural design details or boundary explanations for a given component. Only use this for software architecture concepts, NOT for defining filesystem layouts.
* `Directory`: Details describing the contents and structure of the current directory, and potentially nested directories. Use this instead of Architecture when discussing filesystem layouts. Format items as a list, starting with the file or directory name surrounded by backticks, a hyphen, and then its description (e.g. `- \`filename\` - Description`).
* `Index`: A list exclusively indexing local `.md` files or nested `kb/AGENTS.md` child files. Only to be used in `AGENTS.md` index files and MUST be placed at the very end of the file. Other references, local or otherwise, are okay but must be inlined where they were naturally mentioned. Format items as a list, starting with the file or directory name surrounded by backticks, a hyphen, and then a brief description (e.g. `- \`filename\` - Description`).

# Index

- `../cmd/kb/AGENTS.md` - CLI entry points
- `../internal/kb/AGENTS.md` - Core internal packages
- `../public/kb/AGENTS.md` - Public API packages
- `../tests/kb/AGENTS.md` - Integration testing
- `../snap/kb/AGENTS.md` - Snap packaging
12 changes: 12 additions & 0 deletions public/kb/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Preface

This document describes the scope of the `public/` directory, which contains Chisel's public API packages intended for consumption by external tools.

# Overview

The `public/` directory houses the two packages that form Chisel's stable public contract. These packages define the on-disk format and data schema for the Chisel manifest, enabling third-party tools such as SBOM generators and vulnerability scanners to consume Chisel output without depending on internal packages.

# Directory

- `jsonwall/` - Defines and implements the "jsonwall" database format: a ZSTD-compressed text file with one JSON object per line, with fields sorted for efficient search and iteration. This is the storage format used for the Chisel manifest.
- `manifest/` - Defines the manifest entry schema (schema version 1.0), including the `Package`, `Slice`, and `File` record types with their fields (`Kind`, `Name`, `Version`, `Digest`, `Arch`, etc.). Integrates with `jsonwall` for serialization and deserialization.
7 changes: 7 additions & 0 deletions snap/kb/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Preface

This document describes the scope of the `snap/` directory within the Chisel repository, providing context for automated agents navigating and modifying the snapcraft configuration.

# Directory

- `snapcraft.yaml` - The primary manifest file defining the snap package. It configures classic confinement, utilizes the Go plugin, and outlines the build steps which depend on `cmd/mkversion.sh` for version injection.
28 changes: 28 additions & 0 deletions tests/kb/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Preface

This file serves as the local knowledge base index for the `tests/` directory. It outlines the scope and procedures for end-to-end integration testing within the Chisel repository.

# Overview

The `tests/` directory contains the integration test suite for the project. These tests exercise the actual `chisel` binary against real or mock Ubuntu package archives, verifying end-to-end behaviors such as slice extraction, manifest generation, and error handling. The suite is built on the [Spread](https://github.com/canonical/spread) framework for multi-system test execution.

# Important

- **Execution**: Integration tests are run with `spread`, not with `go test`. Each test scenario is a subdirectory containing a `task.yaml` with shell-based assertions.
- **No build tags**: Unlike Go-based integration test suites, these tests require no `//go:build` directives. They are entirely shell-driven.
- **Pre-built binary**: Spread compiles and provisions the `chisel` binary as part of the test environment setup defined in `spread.yaml` at the repository root.

# Directory

- `basic/` - Core slice extraction scenario verifying that files are correctly written to the target root filesystem and that mutation scripts are applied.
- `find/` - Tests for the `chisel find` command, covering search by slice name and package.
- `info/` - Tests for the `chisel info` command, verifying detailed slice metadata output.
- `debug-check-release-archives/` - Tests for the `chisel debug check-release-archives` command, validating archive configuration correctness.
- `pro-archives/` - Tests covering Ubuntu Pro subscription archive support (fips, fips-updates, esm-apps, esm-infra).
- `use-a-custom-chisel-release/` - Tests the ability to override the default chisel-releases with a custom release tree.
- `unmaintained/` - Edge-case tests for packages whose support window has ended.
- `unstable/` - Edge-case tests for packages from unstable or unsupported releases.

# Architecture

Each test scenario is a directory containing a `task.yaml` that defines the test steps as shell commands with assertions (typically `grep`-based). Spread handles multi-system provisioning, binary deployment, and test execution across different Ubuntu releases as configured in the root `spread.yaml`.
Loading