Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,7 @@ Import order: stdlib, third-party packages, internal Cortex packages (separated
- Sign commits with DCO: `git commit -s -m "message"`
- Run `make doc` if config/flags changed
- Include CHANGELOG entry for user-facing changes

## Related Policies

This file (`AGENTS.md`) provides technical guidance **to** AI coding agents working in this repository (build commands, architecture, conventions). For the policy governing **human use** of AI tools when preparing contributions, see [GENAI_POLICY.md](GENAI_POLICY.md).
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,22 @@
# Changelog

## master / unreleased
* [FEATURE] Distributor: Add experimental `-distributor.enable-start-timestamp` flag for Prometheus Remote Write 2.0. When enabled, `StartTimestamp (ST)` is ingested. #7371
* [FEATURE] Memberlist: Add `-memberlist.cluster-label` and `-memberlist.cluster-label-verification-disabled` to prevent accidental cross-cluster gossip joins and support rolling label rollout. #7385
* [ENHANCEMENT] Distributor: Introduce dynamic `Symbols` slice capacity pooling. #7398 #7401
* [ENHANCEMENT] Metrics Helper: Add native histogram support for aggregating and merging, including dual-format histogram handling that exposes both native and classic bucket formats. #7359
* [ENHANCEMENT] Cache: Add per-tenant TTL configuration for query results cache to control cache expiration on a per-tenant basis with separate TTLs for regular and out-of-order data. #7357
* [CHANGE] Querier: Make query time range configurations per-tenant: `query_ingesters_within`, `query_store_after`, and `shuffle_sharding_ingesters_lookback_period`. Uses `model.Duration` instead of `time.Duration` to support serialization but has minimum unit of 1ms (nanoseconds/microseconds not supported). #7160
* [ENHANCEMENT] Tenant Federation: Add a local cache to regex resolver. #7363
* [ENHANCEMENT] Query Scheduler: Add `cortex_query_scheduler_tracked_requests` metric to track the current number of requests held by the scheduler. #7355
* [ENHANCEMENT] Distributor: Optimize memory allocations by reusing the existing capacity of these pooled slices in the Prometheus Remote Write 2.0 path. #7392
* [BUGFIX] Alertmanager: Fix disappearing user config and state when ring is temporarily unreachable. #7372
* [BUGFIX] Fix nil when ingester_query_max_attempts > 1. #7369
* [BUGFIX] Querier: Fix queryWithRetry and labelsWithRetry returning (nil, nil) on cancelled context by propagating ctx.Err(). #7370
* [BUGFIX] Metrics Helper: Fix non-deterministic bucket order in merged histograms by sorting buckets after map iteration, matching Prometheus client library behavior. #7380
* [BUGFIX] Fix memory leak in `ReuseWriteRequestV2` by explicitly clearing the `Symbols` backing array string pointers before returning the object to `sync.Pool`. #7373
* [BUGFIX] Distributor: Return HTTP 401 Unauthorized when tenant ID resolution fails in the Prometheus Remote Write 2.0 path. #7389
* [BUGFIX] KV store: Fix false-positive `status_code="500"` metrics for HA tracker CAS operations when using memberlist. #7408

## 1.21.0 in progress

Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Contributing to Cortex

See [https://cortexmetrics.io/docs/contributing/](https://cortexmetrics.io/docs/contributing/).

If using generative AI tools, please also review our [Generative AI Contribution Policy](GENAI_POLICY.md).
62 changes: 62 additions & 0 deletions GENAI_POLICY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Generative AI Contribution Policy

## Purpose

The Cortex project welcomes contributions that make use of generative AI (GenAI) tools. AI assistants can help contributors write code, explore the codebase, draft documentation, and improve productivity. However, **humans bear full responsibility** for every contribution they submit, regardless of how it was produced.

This policy applies to all repositories under the [cortexproject](https://github.com/cortexproject) GitHub organisation.

## Permitted Use of AI Tools

The following uses of AI tools are encouraged and permitted:

- **Coding assistants** - Using tools like GitHub Copilot, Claude Code, Cursor, or similar to help write, refactor, or debug code.
- **Codebase exploration** - Querying AI tools to understand project architecture, locate relevant code, or learn conventions.
- **Documentation drafting** - Generating initial drafts of documentation, comments, or commit messages.
- **PR review assistance** - Using AI to help review code, identify potential issues, or suggest improvements.
- **Maintainer-configured review bots** - Automated review bots configured by project maintainers.

## Contributor Responsibilities

When using AI tools to assist with contributions, you must:

1. **Understand every line you submit.** You must be able to independently explain any change in your contribution. "The AI wrote it" is not an acceptable justification during review.

2. **Review and validate AI output.** Never submit AI-generated content verbatim without careful review. Verify correctness, check for hallucinated APIs or dependencies, and ensure the output follows Cortex conventions.

3. **Disclose significant AI usage.** If AI generated the bulk of a contribution (e.g., an entire new feature, large refactors, or substantial documentation), note this in the PR description. Minor assistance (autocomplete, small suggestions) does not require disclosure.

4. **Honour the DCO.** Your `Signed-off-by` line on each commit certifies the [Developer Certificate of Origin](https://developercertificate.org/) for **all** content in that commit, including any AI-generated portions. You are attesting that you have the right to submit the work.

5. **Meet the same quality bar.** AI-assisted contributions are held to the same standards as any other contribution: tests, documentation, CHANGELOG entries, passing CI, and adherence to the project's [design patterns and conventions](docs/contributing/design-patterns-and-conventions.md).

## GitHub Communications

- **Issues, pull request reviews, and discussions** must be substantively human-authored. Do not submit bulk AI-generated comments, reviews, or issue reports.
- Sharing AI-generated analyses (e.g., "I asked an AI to summarise the failure modes and here is what it found") is acceptable when clearly attributed and verified by the contributor.
- Do not use AI tools to generate large volumes of low-quality issues or review comments.

## Maintainer Authority

Maintainers may:

- **Request disclosure** of AI tool usage for any contribution.
- **Close or request revision** of PRs or issues that appear to contain unreviewed AI-generated content.
- **Escalate persistent low-effort submissions** through the project's normal [Code of Conduct](code-of-conduct.md) enforcement process.

## Relationship to Other Policies

| Document | Purpose |
|----------|---------|
| [Contributing Guide](CONTRIBUTING.md) | General contribution workflow and requirements |
| [Code of Conduct](code-of-conduct.md) | Community behaviour standards |
| [Governance](GOVERNANCE.md) | Project governance and decision-making |
| [AGENTS.md](AGENTS.md) | Technical guidance **to** AI coding agents working in this repo |

**AGENTS.md vs GENAI_POLICY.md:** `AGENTS.md` provides instructions that AI coding agents consume when working with the codebase (build commands, architecture, conventions). This document (`GENAI_POLICY.md`) governs how **human contributors** use AI tools when preparing their contributions.

## References

- [OpenTelemetry GenAI Contribution Policy](https://github.com/open-telemetry/community/blob/main/policies/genai.md)
- [Linux Foundation AI Guidelines](https://www.linuxfoundation.org/legal/generative-ai)
- [Developer Certificate of Origin](https://developercertificate.org/)
2 changes: 1 addition & 1 deletion code-of-conduct.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
## Cortex Community Code of Conduct

Cortex follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
Cortex follows the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/main/code-of-conduct.md).
23 changes: 0 additions & 23 deletions docs/blocks-storage/querier.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,11 +117,6 @@ querier:
# CLI flag: -querier.max-samples
[max_samples: <int> | default = 50000000]

# Maximum lookback beyond which queries are not sent to ingester. 0 means all
# queries are sent to ingester.
# CLI flag: -querier.query-ingesters-within
[query_ingesters_within: <duration> | default = 0s]

# Enable returning samples stats per steps in query response.
# CLI flag: -querier.per-step-stats-enabled
[per_step_stats_enabled: <boolean> | default = false]
Expand All @@ -131,14 +126,6 @@ querier:
# CLI flag: -querier.response-compression
[response_compression: <string> | default = "gzip"]

# The time after which a metric should be queried from storage and not just
# ingesters. 0 means all queries are sent to store. When running the blocks
# storage, if this option is enabled, the time range of the query sent to the
# store will be manipulated to ensure the query end is not more recent than
# 'now - query-store-after'.
# CLI flag: -querier.query-store-after
[query_store_after: <duration> | default = 0s]

# Maximum duration into the future you can query. 0 to disable.
# CLI flag: -querier.max-query-into-future
[max_query_into_future: <duration> | default = 10m]
Expand Down Expand Up @@ -247,16 +234,6 @@ querier:
# CLI flag: -querier.ingester-query-max-attempts
[ingester_query_max_attempts: <int> | default = 1]

# When distributor's sharding strategy is shuffle-sharding and this setting is
# > 0, queriers fetch in-memory series from the minimum set of required
# ingesters, selecting only ingesters which may have received series since
# 'now - lookback period'. The lookback period should be greater or equal than
# the configured 'query store after' and 'query ingesters within'. If this
# setting is 0, queriers always query all ingesters (ingesters shuffle
# sharding on read path is disabled).
# CLI flag: -querier.shuffle-sharding-ingesters-lookback-period
[shuffle_sharding_ingesters_lookback_period: <duration> | default = 0s]

thanos_engine:
# Experimental. Use Thanos promql engine
# https://github.com/thanos-io/promql-engine rather than the Prometheus
Expand Down
62 changes: 39 additions & 23 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -4111,6 +4111,12 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# CLI flag: -distributor.enable-type-and-unit-labels
[enable_type_and_unit_labels: <boolean> | default = false]

# EXPERIMENTAL: If true, StartTimestampMs (ST) is handled for remote write v2
# samples and histograms. CreatedTimestamp (CT) is used as a fallback when ST is
# not set.
# CLI flag: -distributor.enable-start-timestamp
[enable_start_timestamp: <boolean> | default = false]

# The maximum number of active series per user, per ingester. 0 to disable.
# CLI flag: -ingester.max-series-per-user
[max_series_per_user: <int> | default = 5000000]
Expand Down Expand Up @@ -4286,6 +4292,25 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# zones are not available.
[query_partial_data: <boolean> | default = false]

# Maximum lookback duration for querying data from ingesters. Queries for data
# older than this will only query the long-term storage. This is a per-tenant
# limit that can be overridden in the runtime configuration. Should be less than
# or equal to close-idle-tsdb-timeout.
# CLI flag: -limits.query-ingesters-within
[query_ingesters_within: <duration> | default = 0s]

# Minimum age of data before querying the long-term storage. Queries for data
# younger than this will only query ingesters. This is a per-tenant limit that
# can be overridden in the runtime configuration.
# CLI flag: -limits.query-store-after
[query_store_after: <duration> | default = 0s]

# Lookback period for shuffle sharding of ingesters. This is a per-tenant limit
# that can be overridden in the runtime configuration. Should be greater than or
# equal to query-ingesters-within.
# CLI flag: -limits.shuffle-sharding-ingesters-lookback-period
[shuffle_sharding_ingesters_lookback_period: <duration> | default = 0s]

# The maximum number of rows that can be fetched when querying parquet storage.
# Each row maps to a series in a parquet file. This limit applies before
# materializing chunks. 0 to disable.
Expand Down Expand Up @@ -4564,6 +4589,20 @@ The `memberlist_config` configures the Gossip memberlist.
# CLI flag: -memberlist.advertise-port
[advertise_port: <int> | default = 7946]

# The cluster label is an optional string to include in outbound packets and
# gossip streams. Other members in the memberlist cluster will discard any
# message whose label doesn't match the configured one, unless the
# 'cluster-label-verification-disabled' configuration option is set to true.
# CLI flag: -memberlist.cluster-label
[cluster_label: <string> | default = ""]

# When true, memberlist doesn't verify that inbound packets and gossip streams
# have the cluster label matching the configured one. This verification should
# be disabled while rolling out the change to the configured cluster label in a
# live memberlist cluster.
# CLI flag: -memberlist.cluster-label-verification-disabled
[cluster_label_verification_disabled: <boolean> | default = false]

# Other cluster members to join. Can be specified multiple times. It can be an
# IP, hostname or an entry specified in the DNS Service Discovery format.
# CLI flag: -memberlist.join
Expand Down Expand Up @@ -4755,11 +4794,6 @@ The `querier_config` configures the Cortex querier.
# CLI flag: -querier.max-samples
[max_samples: <int> | default = 50000000]

# Maximum lookback beyond which queries are not sent to ingester. 0 means all
# queries are sent to ingester.
# CLI flag: -querier.query-ingesters-within
[query_ingesters_within: <duration> | default = 0s]

# Enable returning samples stats per steps in query response.
# CLI flag: -querier.per-step-stats-enabled
[per_step_stats_enabled: <boolean> | default = false]
Expand All @@ -4769,14 +4803,6 @@ The `querier_config` configures the Cortex querier.
# CLI flag: -querier.response-compression
[response_compression: <string> | default = "gzip"]

# The time after which a metric should be queried from storage and not just
# ingesters. 0 means all queries are sent to store. When running the blocks
# storage, if this option is enabled, the time range of the query sent to the
# store will be manipulated to ensure the query end is not more recent than 'now
# - query-store-after'.
# CLI flag: -querier.query-store-after
[query_store_after: <duration> | default = 0s]

# Maximum duration into the future you can query. 0 to disable.
# CLI flag: -querier.max-query-into-future
[max_query_into_future: <duration> | default = 10m]
Expand Down Expand Up @@ -4885,16 +4911,6 @@ store_gateway_client:
# CLI flag: -querier.ingester-query-max-attempts
[ingester_query_max_attempts: <int> | default = 1]

# When distributor's sharding strategy is shuffle-sharding and this setting is >
# 0, queriers fetch in-memory series from the minimum set of required ingesters,
# selecting only ingesters which may have received series since 'now - lookback
# period'. The lookback period should be greater or equal than the configured
# 'query store after' and 'query ingesters within'. If this setting is 0,
# queriers always query all ingesters (ingesters shuffle sharding on read path
# is disabled).
# CLI flag: -querier.shuffle-sharding-ingesters-lookback-period
[shuffle_sharding_ingesters_lookback_period: <duration> | default = 0s]

thanos_engine:
# Experimental. Use Thanos promql engine
# https://github.com/thanos-io/promql-engine rather than the Prometheus promql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ ingester:

memberlist:
bind_port: 7946
cluster_label: gossip-demo
join_members:
- localhost:7947
abort_if_cluster_join_fails: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ ingester:

memberlist:
bind_port: 7947
cluster_label: gossip-demo
join_members:
- localhost:7946
abort_if_cluster_join_fails: false
Expand Down
1 change: 1 addition & 0 deletions docs/configuration/v1-guarantees.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ Currently experimental features are:
- Distributor/Ingester: Stream push connection
- Enable stream push connection between distributor and ingester by setting `-distributor.use-stream-push=true` on Distributor.
- Add `__type__` and `__unit__` labels to OTLP and remote write v2 requests (`-distributor.enable-type-and-unit-labels`)
- Handle StartTimestampMs (ST) for remote write v2 samples and histograms, using CreatedTimestamp (CT) as a fallback when ST is not set (`-distributor.enable-start-timestamp`)
- Ingester: Series Queried Metric
- Enable on Ingester via `-ingester.active-queried-series-metrics-enabled=true`
- Set the time window to expose via metrics using `-ingester.active-queried-series-metrics-windows=2h`. At least 1 time window is required to expose the metric.
Expand Down
7 changes: 7 additions & 0 deletions docs/contributing/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,13 @@ a piece of work is finished it should:
* Include a CHANGELOG message if users of Cortex need to hear about what you did.
* If you have made any changes to flags or config, run `make doc` and commit the changed files to update the config file documentation.

## Use of AI Tools

Cortex permits the use of generative AI tools to assist with contributions. Contributors remain
fully responsible for all submitted content. If AI generated the bulk of a contribution, please
disclose this in the PR description. See the full `GENAI_POLICY.md`
for details.

## Formatting

Cortex projects uses `goimports` tool (`go get golang.org/x/tools/cmd/goimports` to install) to format the Go files, and sort imports. We use goimports with `-local github.com/cortexproject/cortex` parameter, to put Cortex internal imports into a separate group. We try to keep imports sorted into three groups: imports from standard library, imports of 3rd party packages and internal Cortex imports. Goimports will fix the order, but will keep existing newlines between imports in the groups. We try to avoid extra newlines like that.
Expand Down
4 changes: 3 additions & 1 deletion docs/guides/gossip-ring-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ memberlist:
# defaults to hostname
node_name: "Ingester 1"
bind_port: 7946
cluster_label: "gossip-demo"
join_members:
- localhost:7947
abort_if_cluster_join_fails: false
Expand Down Expand Up @@ -127,9 +128,10 @@ We don't need to change or add `memberlist.join_members` list. This new instance
will discover other peers through it. When using Kubernetes, the suggested setup is to have a headless service pointing to all pods
that want to be part of the gossip cluster, and then point `join_members` to this headless service.

In production, set `memberlist.cluster_label` to the same value on every Cortex process that should share the same gossip cluster. This helps avoid accidentally merging rings with other Cortex, Mimir, or Loki deployments that can reach the same seed addresses.

We also don't need to change `/tmp/cortex/storage` directory in the `blocks_storage.filesystem.dir` field. This is the directory where all ingesters will
"upload" finished blocks. This can also be an S3 or GCP storage, but for simplicity, we use the local filesystem in this example.

After these changes, we can start another Cortex instance using the modified configuration file. This instance will join the ring
and will start receiving samples after it enters the ACTIVE state.

Loading
Loading