Tracking results storage and exposure

# Tracking results storage and exposure

## Context

The engine periodically tracks whether online documents are accessible and extractable. For each service/terms pair, the outcome is either success (content fetched and version recorded) or failure (content inaccessible or extraction broken). When tracking fails, the `Reporter` module creates an issue on a third-party software forge (GitHub or GitLab). When tracking resumes, it closes the issue. These forge issues are the **sole persistent record** of tracking results.

## Problem statement

Tracking results have no local persistence. Even when everything works, the persistence of tracking data depends entirely on an external forge. If the forge is unreachable or no API token is configured, failures are logged but not persisted, and the information is lost once the process ends. There is no way to query the current tracking status of a service without calling an external forge API.

This RFC proposes to introduce a local storage layer that persists tracking results independently of any external service, and to expose them through the collection API. This storage becomes the single source of truth for tracking status. External integrations (such as the current GitHub and GitLab issue managers) would then be extracted into separate modules that consume tracking results through this API. The active consumption of this data (creating issues, sending notifications) is out of scope.

## Alternatives considered

Two existing standards and four alternative storage strategies were evaluated before arriving at the proposed design. None was retained. A dedicated Git repository with a custom JSON format was chosen instead.

<details>
<summary>Existing standards considered</summary>

### TAP — Test Anything Protocol

[TAP](https://testanything.org/) is a text-based protocol for reporting test results. The analogy with tracking is genuine: for each service/terms pair, the engine verifies that content is accessible and extractable, producing a pass or fail result.

TAP is not retained for two reasons. First, the standard only structures a small fraction of the data: TAP provides a binary `ok`/`not ok` outcome, but the engine needs to persist richer information (error reasons, source document metadata, snapshot references, timing data, transient errors). All of this must go into free-form YAML diagnostic blocks, outside the standard's structure:

```tap
TAP version 14
1..1
not ok 1 - Facebook · Terms of Service
  ---
  date: "2026-03-15T10:30:00Z"
  runId: "f47ac10b-58cc-4372-a567-0e02b2c3d479"
  reasons:
    - "CSS selector \".content\" has no match in the document"
  sourceDocuments:
    - id: legal-terms
      fetch: "https://www.facebook.com/legal/terms"
      select: ".content"
      snapshotId: abc123
      mimeType: text/html
  ...
```

Only the `not ok` line is structured by TAP. Everything in the `---`/`...` block is free-form YAML that no TAP tool can interpret. Second, existing TAP tools (`tap-parser`, `tap-mocha-reporter`) are designed for CI/CD terminal output, not for the consumers needed here (issue sync, API endpoints, federation aggregation). Since the standard neither structures the data nor provides usable tooling, adopting it would impose format constraints without any corresponding benefit.

### EARL — Evaluation and Report Language

[EARL](https://www.w3.org/TR/EARL10-Schema/) is a W3C vocabulary for expressing test results. The semantic fit is the strongest: its concepts of `Assertion`, `TestSubject`, `TestCriterion`, and five-valued `OutcomeValue` map naturally to the tracking domain.

EARL is not retained for the same two reasons. The EARL vocabulary covers the outcome, the assertor (engine version), the test subject (URL), and the test criterion (terms type), but everything else must be added as custom properties outside the standard:

```json
{
  "@context": "http://www.w3.org/ns/earl#",
  "@type": "Assertion",
  "assertedBy": { "@type": "Software", "title": "Open Terms Archive Engine", "version": "11.0.0" },
  "subject": { "@type": "TestSubject", "source": "https://www.facebook.com/legal/terms" },
  "test": { "@type": "TestCriterion", "title": "Terms of Service" },
  "result": { "@type": "TestResult", "outcome": "earl:failed", "description": "CSS selector \".content\" has no match" },
  "date": "2026-03-15T10:30:00Z",
  "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "sourceDocuments": [{ "id": "legal-terms", "fetch": "https://www.facebook.com/legal/terms", "select": ".content", "snapshotId": "abc123", "mimeType": "text/html" }]
}
```

The top half (EARL vocabulary) structures the outcome, assertor, subject, and criterion. The bottom half (`date`, `runId`, `sourceDocuments`) is custom, outside the standard. And existing EARL tooling is entirely limited to WCAG accessibility evaluation (axe, Pa11y, WCAG-EM Report Tool); no generic viewer, dashboard, or aggregator exists. The RDF foundation (`@context`, `@type`, nested objects) adds structural overhead without any ecosystem to leverage in return.

</details>

<details>
<summary>Alternative storage strategies considered</summary>

### Storing in the declarations repository

The declarations repository defines what to track and how to track them. Tracking results describe whether tracking works. Colocating them could seem logical, and it mirrors the current practice of attaching GitHub/GitLab issues to the declarations repository. A `tracking-results/` folder could hold one file per service/terms pair alongside a `run.json`.

This approach is not retained because the declarations repository has a fundamentally different operational model. It is designed for human contributions via pull requests: contributors add or modify declarations, reviewers approve them, and the engine reads them. The engine currently has read-only access and never commits to this repository. Adding machine-generated commits every 12 hours would require giving the engine write access, pollute the commit history and notifications for contributors who watch the repository, and mix two fundamentally different types of data: human-authored declarations and machine-generated tracking results. The repository's role as a curated, human-managed collection of declarations would be blurred.

### Storing as Git trailers in snapshot and version commits

The engine already stores metadata as Git trailers in snapshot and version commits (`X-engine-version`, `X-fetcher`, `X-source-document-location`). Tracking results could be added as additional trailers. For failures (where no content commit exists), empty commits could carry the tracking data. A `run.json` equivalent could be stored as a specially formatted empty commit at the end of each run.

This approach is not retained because it requires stacking workarounds that add complexity without benefit. Trailers are flat key-value pairs, but the tracking data includes nested structures (`sourceDocuments` is an array of objects with 8 fields each, `transientError` is an object); encoding these as flat trailers would be verbose and fragile. Empty commits for failures and run summaries would clutter the repository history alongside content commits, making both harder to read. Tracking results would be dispersed across two repositories (snapshots and versions), mixing tracking data with content data in both and requiring cross-repository queries to reconstruct the tracking state of a single service. In practice, this amounts to building an ad-hoc storage subsystem inside repositories that serve a different purpose, at which point a dedicated repository is simpler and cleaner.

### Storing as files in the snapshots repository

Tracking result files could be committed to the snapshots repository alongside snapshot files, using the existing write infrastructure. A `run.json` at the root would not interfere with existing snapshot files, which are organized in service/terms subdirectories.

This approach is not retained because the snapshots repository is already the fastest-growing repository in the system. On large collections, it has already reached size and commit count limits that required operational workarounds. Adding tracking result commits on top of snapshot commits would further increase this pressure. Beyond size concerns, it mixes two types of data with different semantics and commit patterns. Consumers that process snapshots (dataset generation, version extraction) would encounter tracking result files alongside content files. The commit history would interleave high-frequency snapshot recordings with low-frequency tracking state transitions, making both harder to navigate. And if tracking results need to be published separately (for federation, for a public dashboard) without exposing raw snapshots, this is impossible when everything is in the same repository.

### Storing as a local state file

A `./data/tracking-results.json` file, not Git-tracked, updated during each run. The file could preserve the full history of entries (not just current state) and be queried using a lightweight embedded database like lowdb.

This approach is not retained because it sacrifices three important properties. First, publication and federation: a Git repository can be cloned, pushed to a remote, and mirrored where a local JSON file cannot. There is no stable URL to give to a federation aggregator, no way to replicate the data without building custom synchronization. Second, auditability and data integrity: the snapshots and versions repositories are public Git repositories that anyone can clone and independently verify, and no one can silently rewrite history. A local JSON file offers no such guarantee; its content can be modified at any time without detection. Third, durability: a local file exists only on the machine that runs the engine. A disk failure, a machine recreation, or an accidental deletion destroys the entire tracking history with no possibility of recovery. A Git repository pushed to a remote is inherently replicated, and any clone serves as a full backup. Rebuilding publication, replication, durability, and integrity guarantees on top of a local file would mean reconstructing a subset of what Git already provides, while the project already has the full infrastructure to manage Git repositories.

</details>

## Proposed solution

Tracking results are stored in a **new dedicated Git repository** managed by the engine alongside the existing snapshots and versions repositories. This follows the established pattern where each repository has a clear, single purpose: declarations define what to track, snapshots store raw content, versions store extracted content, and **tracking results store whether tracking works and why it fails**.

The operational cost of a fourth repository (creation, deployment, CI configuration) is real but incremental. Operators already manage three repositories per instance; a fourth follows the same pattern.

### Repository structure

The tracking results repository contains **one JSON file per declared service/terms pair**, reflecting its current tracking status. Every declared terms has a file, whether it is tracking successfully or failing. A `run.json` file at the root captures run-level information.

```
tracking-results/
├── run.json
├── README.md
├── Facebook/
│   └── Terms of Service.json              ← status: failed
├── Google/
│   ├── Terms of Service.json              ← status: ok
│   └── Privacy Policy.json                ← status: ok
└── …
```

This approach stores the **complete tracking history** in Git. Each file is updated only when its content changes: status transitions, failure reason changes, or transient error appearances and disappearances. The Git history of each file tells the full story: when tracking first succeeded, when it failed, why it failed, when it resumed. A `git checkout` at any past commit gives the exact state of the entire collection at that point in time.

### File format for tracking results

Each service/terms file contains a JSON object. The content depends on the current tracking status.

**When tracking is successful:**

```json
{
  "status": "ok",
  "date": "2026-01-10T10:30:00Z",
  "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "serviceName": "Google",
  "sourceDocuments": [
    {
      "id": "terms",
      "fetch": "https://policies.google.com/terms",
      "select": ".content",
      "remove": ".banner",
      "filter": ["removeLinks"],
      "executeClientScripts": false,
      "snapshotId": "def456",
      "mimeType": "text/html"
    }
  ]
}
```

**When tracking is successful but a transient error occurred and was resolved by retry:**

```json
{
  "status": "ok",
  "date": "2026-01-10T10:30:00Z",
  "runId": "a1b2c3d4-58cc-4372-a567-0e02b2c3d479",
  "serviceName": "Google",
  "sourceDocuments": [
    {
      "id": "terms",
      "fetch": "https://policies.google.com/terms",
      "select": ".content",
      "remove": ".banner",
      "filter": ["removeLinks"],
      "executeClientScripts": false,
      "snapshotId": "def456",
      "mimeType": "text/html"
    }
  ],
  "transientError": {
    "date": "2026-04-07T10:30:00Z",
    "reasons": ["Fetch failed: HTTP code 503"]
  }
}
```

**When tracking is failing:**

```json
{
  "status": "failed",
  "date": "2026-03-15T10:30:00Z",
  "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "serviceName": "Facebook",
  "reasons": [
    "CSS selector \".content\" has no match in the document"
  ],
  "sourceDocuments": [
    {
      "id": "legal-terms",
      "fetch": "https://www.facebook.com/legal/terms",
      "select": ".content",
      "remove": null,
      "filter": null,
      "executeClientScripts": false,
      "snapshotId": "abc123",
      "mimeType": "text/html"
    }
  ]
}
```

| Field | Type | Present when | Description |
|---|---|---|---|
| `status` | `"ok"` or `"failed"` | Always | Current tracking status |
| `date` | ISO 8601 string | Always | Datetime when the current status started. Preserved across runs as long as the status does not change |
| `runId` | UUID v4 string | Always | Identifier of the run that last updated this file. Matches the `runId` in `run.json` |
| `transientError` | object | `ok`, only when a transient error occurred during the last run | Transient error that was resolved by retry |
| `transientError.date` | ISO 8601 string | With `transientError` | Datetime when the transient error occurred |
| `transientError.reasons` | string[] | With `transientError` | Human-readable error reasons |
| `serviceName` | string | Always | Human-readable service name (e.g., "Facebook"), as distinct from the service ID used in file paths |
| `reasons` | string[] | `failed` | Human-readable error reasons |
| `sourceDocuments` | object[] | Always | Source documents of the tracked terms, with their declaration metadata and last snapshot information. Field names follow the declaration format |
| `sourceDocuments[].id` | string | Always | Identifier of the source document, generated from its URL |
| `sourceDocuments[].fetch` | string | Always | URL of the source document |
| `sourceDocuments[].select` | string, object, or array | Always | CSS selectors for content to include |
| `sourceDocuments[].remove` | string, object, array, or null | Always | CSS selectors for content to exclude |
| `sourceDocuments[].filter` | string[] or null | Always | Names of filters applied to the content |
| `sourceDocuments[].executeClientScripts` | boolean | Always | Whether fetching requires a headless browser |
| `sourceDocuments[].snapshotId` | string or null | Always | ID of the last recorded snapshot, if any |
| `sourceDocuments[].mimeType` | string or null | Always | MIME type of the last recorded snapshot (e.g., `text/html`, `application/pdf`) |

The `runId` field is a UUID v4 generated at the start of each tracking run. It is written both in `run.json` and in every individual file that is updated during the run. In individual files, it enables correlating file changes to runs without relying on commit timestamp proximity, and detecting interrupted runs where `run.json` carries a `runId` that some individual files never received, revealing which terms were not processed. In `run.json`, it enables efficient API polling: a consumer can compare the `runId` to determine whether new data is available since its last check.

The `date` field records when the current status started, not when the file was last updated. For failures, if a failure persists for 22 days but the reasons change on day 10, `date` is preserved from day 1 and the reasons are updated; the Git history shows when the reasons changed. For successes, `date` records when tracking resumed after a failure or when the terms was first successfully tracked. This allows consumers to know at a glance how long a service has been in its current state without traversing Git history.

The `transientError` field captures errors that were likely transient (HTTP 503, DNS temporary failure, timeout, etc.) and were resolved by the engine's automatic retry mechanism. This field is present only when the last run encountered such an error; if the next run succeeds without transient error, the field is removed. This provides visibility on "fragile" services that succeed but only after retry. The Git history of the file shows when transient errors occurred and disappeared, allowing pattern detection over time.

The `sourceDocuments` array is present on all entries, regardless of tracking status. Each entry contains the full declaration metadata of the source document (using the field names from the declaration format: `fetch`, `select`, `remove`, `filter`, `executeClientScripts`) alongside the last snapshot information (`snapshotId`, `mimeType`). This ensures that any consumer can understand not only *whether* tracking works, but also *what* is being tracked and *how* it is configured, without needing to read the service declaration or query the snapshots repository.

### Run file

A `run.json` file at the repository root is updated at the end of **every run**, providing run-level information that cannot be derived from individual tracking result files:

```json
{
  "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "collectionId": "france",
  "schedule": "30 */12 * * *",
  "lastRun": {
    "startDate": "2026-04-06T10:30:00Z",
    "endDate": "2026-04-06T10:42:34Z",
    "engineVersion": "11.0.0"
  },
  "declared": {
    "services": 523,
    "terms": 1523
  },
  "tracked": {
    "ok": 1476,
    "failed": 47
  },
  "transitions": {
    "newFailures": [
      { "serviceId": "Facebook", "termsType": "Terms of Service" },
      { "serviceId": "Twitter", "termsType": "Privacy Policy" }
    ],
    "recoveries": [
      { "serviceId": "Google", "termsType": "Privacy Policy" }
    ],
    "reasonChanges": []
  },
  "transientErrors": 23
}
```

| Field | Description |
|---|---|
| `runId` | UUID v4 identifier for the current run, generated at the start of each run |
| `collectionId` | ID of the collection this instance tracks |
| `schedule` | Configured cron expression for tracking runs |
| `lastRun.startDate` | Datetime when the last run started |
| `lastRun.endDate` | Datetime when the last run ended |
| `lastRun.engineVersion` | Version of the engine that performed the last run |
| `declared.services` | Total number of declared services |
| `declared.terms` | Total number of declared terms across all services |
| `tracked.ok` | Number of terms currently tracking successfully |
| `tracked.failed` | Number of terms currently failing |
| `transitions.newFailures` | Array of `{ serviceId, termsType }` objects for terms that started failing in the last run |
| `transitions.recoveries` | Array of `{ serviceId, termsType }` objects for terms that recovered in the last run |
| `transitions.reasonChanges` | Array of `{ serviceId, termsType }` objects for terms whose failure reasons changed in the last run |
| `transientErrors` | Number of terms that encountered a transient error during the last run, whether or not it was resolved by retry |

This file serves several purposes. The Git history of `run.json` has one commit per run, so a gap in commits reveals that the tracking was not running. The `tracked.ok`/`tracked.failed` ratio at each run provides a health curve over time. The `lastRun.engineVersion` field allows correlating spikes in failures with engine upgrades. The `collection` and `schedule` fields allow a federation aggregator to identify and contextualize each instance.

### Collection API extension

The existing collection API exposes service declarations, collection metadata, and version content over HTTP. It would be extended with the following endpoints to expose tracking results, using the existing naming conventions (plural for collections as `/services`, singular for specific resources as `/service/:id`):

| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/tracking-results` | Returns the tracking status of all declared terms |
| `GET` | `/tracking-result/:serviceId` | Returns the tracking status of all terms for a given service |
| `GET` | `/tracking-result/:serviceId/:termsType` | Returns the tracking status of a specific service/terms pair |
| `GET` | `/tracking-results/run` | Returns the latest run information (`run.json` content) |

`GET /tracking-results` returns an array of all tracking result objects, each enriched with `serviceId` and `termsType` fields derived from the file path. It supports filtering by status (`GET /tracking-results?status=failed`). Pagination is not included in the initial design, as the `?status=failed` filter covers the most common current use case (opening issues on forge). These endpoints always return the current state; querying tracking results at a specific date is not supported initially, but can be added later following the same approach used for the `/version` endpoint if the need arises.

#### Response examples

`GET /tracking-results`:

```json
[
  {
    "serviceId": "Google",
    "termsType": "Terms of Service",
    "status": "ok",
    "date": "2026-01-10T10:30:00Z",
    "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "serviceName": "Google",
    "sourceDocuments": [
      { "id": "terms", "fetch": "https://policies.google.com/terms", "select": ".content", "remove": ".banner", "filter": ["removeLinks"], "executeClientScripts": false, "snapshotId": "def456", "mimeType": "text/html" }
    ]
  },
  {
    "serviceId": "Facebook",
    "termsType": "Terms of Service",
    "status": "failed",
    "date": "2026-03-15T10:30:00Z",
    "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "serviceName": "Facebook",
    "reasons": ["CSS selector \".content\" has no match in the document"],
    "sourceDocuments": [
      { "id": "legal-terms", "fetch": "https://www.facebook.com/legal/terms", "select": ".content", "remove": null, "filter": null, "executeClientScripts": false, "snapshotId": "abc123", "mimeType": "text/html" }
    ]
  }
]
```

`GET /tracking-result/Google`:

```json
[
  {
    "serviceId": "Google",
    "termsType": "Terms of Service",
    "status": "ok",
    "date": "2026-01-10T10:30:00Z",
    "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "serviceName": "Google",
    "sourceDocuments": [
      { "id": "terms", "fetch": "https://policies.google.com/terms", "select": ".content", "remove": ".banner", "filter": ["removeLinks"], "executeClientScripts": false, "snapshotId": "def456", "mimeType": "text/html" }
    ]
  },
  {
    "serviceId": "Google",
    "termsType": "Privacy Policy",
    "status": "ok",
    "date": "2025-11-02T08:15:00Z",
    "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "serviceName": "Google",
    "sourceDocuments": [
      { "id": "privacy", "fetch": "https://policies.google.com/privacy", "select": ".content", "remove": null, "filter": null, "executeClientScripts": false, "snapshotId": "ghi789", "mimeType": "text/html" }
    ]
  }
]
```

`GET /tracking-result/Facebook/Terms%20of%20Service`:

```json
{
  "serviceId": "Facebook",
  "termsType": "Terms of Service",
  "status": "failed",
  "date": "2026-03-15T10:30:00Z",
  "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "serviceName": "Facebook",
  "reasons": ["CSS selector \".content\" has no match in the document"],
  "sourceDocuments": [
    { "id": "legal-terms", "fetch": "https://www.facebook.com/legal/terms", "select": ".content", "remove": null, "filter": null, "executeClientScripts": false, "snapshotId": "abc123", "mimeType": "text/html" }
  ]
}
```

`GET /tracking-results/run`:

```json
{
  "runId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "collectionId": "france",
  "schedule": "30 */12 * * *",
  "lastRun": {
    "startDate": "2026-04-06T10:30:00Z",
    "endDate": "2026-04-06T10:42:34Z",
    "engineVersion": "11.0.0"
  },
  "declared": { "services": 523, "terms": 1523 },
  "tracked": { "ok": 1476, "failed": 47 },
  "transitions": {
    "newFailures": [{ "serviceId": "Facebook", "termsType": "Terms of Service" }],
    "recoveries": [{ "serviceId": "Google", "termsType": "Privacy Policy" }],
    "reasonChanges": []
  },
  "transientErrors": 23
}
```

The API always serves the state of the last completed run. It does this by resolving the commit SHA of the last `run.json` update and reading all files at that commit. While a new run is in progress and individual files are being updated with new commits, the API continues to serve the snapshot from the previous run's final commit. When the current run completes and `run.json` is committed, the API automatically starts serving the new state. This approach is resilient to crashes: if the engine crashes mid-run, the API keeps serving the last complete state.

<details>
<summary>Notification strategy</summary>

External connectors (GitHub issue manager, GitLab issue manager) need to know when a run completes and transitions have occurred. Push-based mechanisms were considered, including webhooks where the engine would send an HTTP POST at the end of each run, and Server-Sent Events where connectors would subscribe to a stream. Both add significant complexity in the form of payload signing, delivery retry logic, connection management, and additional configuration.

Since tracking runs occur on a fixed schedule (typically every 12 hours), connectors can simply poll `GET /tracking-results/run` at the same frequency. The cost of one lightweight request every 12 hours is negligible, and the `runId` field allows a connector to immediately determine whether new data is available since its last check. This polling approach requires no additional infrastructure in the engine.

Push-based notifications can be added later if the number of connectors grows or if lower latency is needed.

</details>

<details>
<summary>Deployment</summary>

Each instance would create a new repository following the naming convention established for other repositories: `{instance_name}-tracking-results`. For example, the `demo` instance would have:

- `demo-declarations` (existing)
- `demo-versions` (existing)
- `demo-snapshots` (existing)
- `demo-tracking-results` (new)

The deployment configuration would be updated to include the new repository path and remote, following the same pattern as versions and snapshots.

Instances can be deployed incrementally: the tracking results repository can be created and configured at any time. On the first run after configuration, the engine will populate it with the initial state of all declared terms and a first `run.json`. No migration of historical data from GitHub/GitLab issues is necessary; the repository starts recording from the first run onwards. Historical issue data remains accessible on the forges for reference.

</details>

# Feedback expectations

We invite you to provide feedback to:

- Point out any limitations or edge cases you see in the proposed implementation.
- Suggest improvements or refinements to the current proposal.
- Share any alternative approaches you believe could address the problem more effectively.

If you support the proposal as it stands, please react with a 👍 or leave a positive comment.

Please provide your feedback by **April 29th.**

Field	Description
`runId`	UUID v4 identifier for the current run, generated at the start of each run
`collectionId`	ID of the collection this instance tracks
`schedule`	Configured cron expression for tracking runs
`lastRun.startDate`	Datetime when the last run started
`lastRun.endDate`	Datetime when the last run ended
`lastRun.engineVersion`	Version of the engine that performed the last run
`declared.services`	Total number of declared services
`declared.terms`	Total number of declared terms across all services
`tracked.ok`	Number of terms currently tracking successfully
`tracked.failed`	Number of terms currently failing
`transitions.newFailures`	Array of `{ serviceId, termsType }` objects for terms that started failing in the last run
`transitions.recoveries`	Array of `{ serviceId, termsType }` objects for terms that recovered in the last run
`transitions.reasonChanges`	Array of `{ serviceId, termsType }` objects for terms whose failure reasons changed in the last run
`transientErrors`	Number of terms that encountered a transient error during the last run, whether or not it was resolved by retry

Method	Endpoint	Description
`GET`	`/tracking-results`	Returns the tracking status of all declared terms
`GET`	`/tracking-result/:serviceId`	Returns the tracking status of all terms for a given service
`GET`	`/tracking-result/:serviceId/:termsType`	Returns the tracking status of a specific service/terms pair
`GET`	`/tracking-results/run`	Returns the latest run information (`run.json` content)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking results storage and exposure #1241

Tracking results storage and exposure

Context

Problem statement

Alternatives considered

TAP — Test Anything Protocol

EARL — Evaluation and Report Language

Storing in the declarations repository

Storing as Git trailers in snapshot and version commits

Storing as files in the snapshots repository

Storing as a local state file

Proposed solution

Repository structure

File format for tracking results

Run file

Collection API extension

Response examples

Feedback expectations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Type	Present when	Description
`status`	`"ok"` or `"failed"`	Always	Current tracking status
`date`	ISO 8601 string	Always	Datetime when the current status started. Preserved across runs as long as the status does not change
`runId`	UUID v4 string	Always	Identifier of the run that last updated this file. Matches the `runId` in `run.json`
`transientError`	object	`ok`, only when a transient error occurred during the last run	Transient error that was resolved by retry
`transientError.date`	ISO 8601 string	With `transientError`	Datetime when the transient error occurred
`transientError.reasons`	string[]	With `transientError`	Human-readable error reasons
`serviceName`	string	Always	Human-readable service name (e.g., "Facebook"), as distinct from the service ID used in file paths
`reasons`	string[]	`failed`	Human-readable error reasons
`sourceDocuments`	object[]	Always	Source documents of the tracked terms, with their declaration metadata and last snapshot information. Field names follow the declaration format
`sourceDocuments[].id`	string	Always	Identifier of the source document, generated from its URL
`sourceDocuments[].fetch`	string	Always	URL of the source document
`sourceDocuments[].select`	string, object, or array	Always	CSS selectors for content to include
`sourceDocuments[].remove`	string, object, array, or null	Always	CSS selectors for content to exclude
`sourceDocuments[].filter`	string[] or null	Always	Names of filters applied to the content
`sourceDocuments[].executeClientScripts`	boolean	Always	Whether fetching requires a headless browser
`sourceDocuments[].snapshotId`	string or null	Always	ID of the last recorded snapshot, if any
`sourceDocuments[].mimeType`	string or null	Always	MIME type of the last recorded snapshot (e.g., `text/html`, `application/pdf`)

Tracking results storage and exposure #1241

Description

Tracking results storage and exposure

Context

Problem statement

Alternatives considered

TAP — Test Anything Protocol

EARL — Evaluation and Report Language

Storing in the declarations repository

Storing as Git trailers in snapshot and version commits

Storing as files in the snapshots repository

Storing as a local state file

Proposed solution

Repository structure

File format for tracking results

Run file

Collection API extension

Response examples

Feedback expectations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions