Antalya 26.1: Remote initiator improvements by ianton-ru · Pull Request #1577 · Altinity/ClickHouse

ianton-ru · 2026-03-25T12:57:46Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Different improvement for remote initiator

Documentation entry for user-facing changes

With remote initiator feature queries like

SELECT * FROM iceberg(...) SETTINGS object_storage_cluster='swarm', object_storage_remote_initiator=1

rewrites as

SELECT * FROM remote('remote_host', icebergCluster('swarm', ...)

'remote_host' is a random host from 'swarm' cluster
See #756

Current PR introduces the next improvements:

Partially solved object_storage_remote_initiator auth works incorrectly #1570 - uses username and password if access to cluster requires it. Throws exception if cluster uses common secret, this should be solved in future PRs.
Solved object_storage_remote_initiator with different cluster name #1571 - new setting object_storage_remote_initiator_cluster allows to choose remote_host from different cluster, not only from swarm
remote query did not work with additional setting inside function, like remote('remote_host', iceberg(..., SETTINGS iceberg_metadata_file_path='path/to/metadata.json')). Now must work correctly.
In query remote('remote_host', icebergCluster('remote_cluster', ...)) cluster remote_cluster can be defined only on remote_host and unknown on current initial host. Removed cluster check on early stage, this allows to execute such queries.

CI/CD Options

Exclude tests:

Regression jobs to run:

github-actions · 2026-03-25T12:59:43Z

Workflow [PR], commit [a5eee1d]

ianton-ru · 2026-03-25T12:59:54Z

@codex review

ianton-ru · 2026-03-25T13:08:17Z

Audit: PR #1577 — Antalya 26.1: Remote initiator improvements

Source: Altinity/ClickHouse#1577
Base: antalya-26.1
Reviewed revision: branch including commit d7c4beebbdb (Remove unused header) and prior feature commits.

Confirmed defects

None. No confirmed defects in reviewed scope.

(Earlier revision briefly introduced an unused #include <Common/logger_useful.h> in QueryAnalyzer.cpp; commit d7c4beebbdb removes it.)

Coverage summary

Area	Notes
Scope reviewed	`FunctionNode` SETTINGS handling (`QueryTreeBuilder`, `FunctionNode` hash / clone / equal / `toASTImpl`), `QueryAnalyzer::resolveTableFunction` → `TableFunctionNode` bridge, `StorageDistributed::buildQueryTreeDistributed`, `IStorageCluster::{read, convertToRemote}` (initiator cluster, secret guard, URI decode, user/password `remote`), `ITableFunctionCluster` cluster-presence check removal, `Settings` / `SettingsChangesHistory`, integration tests under `tests/integration/test_s3_cluster/`.
Categories failed	None.
Categories passed	SETTINGS propagation for nested table functions; initiator cluster selection (`object_storage_remote_initiator_cluster`); host decode and auth branches; fail-closed on cluster secret (`NOT_IMPLEMENTED`); deferred cluster resolution for remote-only definitions; recursion guard via `object_storage_remote_initiator = false` on copied context; `SettingsChangesHistory` updates; integration scenarios (secret error, split initiator vs object-storage cluster).
Assumptions / limits	Pre-existing `static pcg64` RNG in `convertToRemote` may be a concurrency concern under parallel queries; not introduced by this PR. `Poco::URI::decode` on full `Address::toString()` not exhaustively validated for every exotic address encoding; aligned with `escapeForFileName` / dotted-host tests.

Audit metadata

Method: Static review per workspace audit-review standard (call graph, transitions, fault categories, concurrency and C++ hazard pass).
Output rule: Confirmed defects only; hypotheticals folded into assumptions/limits.

AI-generated audit note.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0d1972000

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-25T13:16:43Z

src/Storages/IStorageCluster.cpp

+        auto remote_initiator_cluster_name = settings[Setting::object_storage_remote_initiator_cluster].value;
+        if (remote_initiator_cluster_name.empty())
+            remote_initiator_cluster_name = cluster_name_from_settings;
+        auto remote_initiator_cluster = getClusterImpl(context, remote_initiator_cluster_name);
+        auto storage_and_context = convertToRemote(remote_initiator_cluster, context, remote_initiator_cluster_name, query_to_send);


Strip initiator-only cluster setting before forwarding query

When object_storage_remote_initiator_cluster is set, the query forwarded via remote(...) still carries that setting even though convertToRemote only clears object_storage_remote_initiator. This makes remote execution depend on remote nodes understanding a setting that is only needed on the initiator; in mixed-version/rolling-upgrade clusters, older remote hosts can fail with unknown-setting errors before execution. The forwarded AST settings should drop object_storage_remote_initiator_cluster together with object_storage_remote_initiator.

Useful? React with 👍 / 👎.

@ianton-ru does it make sense? it looks like it does

object_storage_remote_initiator_cluster does nothing without ``object_storage_remote_initiator`. Make sense to remove just for less garbage in sub-query.

ianton-ru · 2026-03-26T11:29:26Z

tests/queries/0_stateless/01625_constraints_index_append.reference

      Prewhere filter
      Prewhere filter column: less(multiply(2, b), 100)
-  Filter column: and(indexHint(greater(plus(i, 40), 0)), equals(a, 0)) (removed)
+  Filter column: and(equals(a, 0), indexHint(greater(plus(i, 40), 0))) (removed)


Argument order depends on hash, hash was changes (see FunctionNode::updateTreeHashImpl)

arthurpassos · 2026-03-26T18:50:17Z

src/Storages/IStorageCluster.cpp

    if (settings[Setting::object_storage_remote_initiator])
    {
-        auto storage_and_context = convertToRemote(cluster, context, cluster_name_from_settings, query_to_send);
+        auto remote_initiator_cluster_name = settings[Setting::object_storage_remote_initiator_cluster].value;


Please add a comment explaining what this code block does. It took me a while to understand it by just reading the code.

I suggest something like:

/// In case the current node is not supposed to initiate the clustered query /// Sends this query to a remote initiator using the `remote` table function if (settings[Setting::object_storage_remote_initiator]) { /// Re-writes queries in the form of: /// Input: SELECT * FROM iceberg(...) SETTINGS object_storage_cluster='swarm', object_storage_remote_initiator=1 /// Output: SELECT * FROM remote('remote_host', icebergCluster('swarm', ...) /// Where `remote_host` is a random host from the cluster which will execute the query /// This means the initiator node belongs to the same cluster that will execute the query /// In case remote_initiator_cluster_name is set, the initiator might be set to a different cluster }

arthurpassos · 2026-03-26T18:50:50Z

src/Storages/IStorageCluster.cpp

    if (shard_addresses.size() != 1)
        throw Exception(ErrorCodes::LOGICAL_ERROR, "Size of shard {} in cluster {} is not equal 1", shard_num, cluster_name_from_settings);
-    auto host_name = shard_addresses[0].toString();
+    std::string host_name;


Address here in encoded format foo%2Ebar instead of foo.bar. This wasn't catched in tests before I add a host with dot in name.

arthurpassos · 2026-03-26T18:52:16Z

src/Storages/IStorageCluster.cpp

-    auto remote_query = makeASTFunction(remote_function_name, make_intrusive<ASTLiteral>(host_name), table_expression->table_function);
+    boost::intrusive_ptr<ASTFunction> remote_query;
+
+    if (shard_addresses[0].user_specified)


A comment please

arthurpassos · 2026-03-26T18:54:24Z

src/TableFunctions/ITableFunctionCluster.h

-            throw Exception(ErrorCodes::CLUSTER_DOESNT_EXIST, "Requested cluster '{}' not found", cluster_name);
+        /// Remove check cluster existing here
+        /// In query like
+        /// remote('remote_host', xxxCluster('remote_cluster', ...))


What if the query is not remote? Can't we check for that?

/// If cluster not exists, query falls later

Where and with which exception? It would be good to avoid any network calls before failing

Cluster name is not a network node name, it's an internal ClickHouse name. Query falls later when tries to get hosts from cluster. Network calls can't be made without hosts.

But it's hard to understand here is cluster function inside 'remote' or not.

arthurpassos · 2026-03-26T18:58:51Z

src/Storages/IStorageCluster.cpp

+        auto remote_initiator_cluster_name = settings[Setting::object_storage_remote_initiator_cluster].value;
+        if (remote_initiator_cluster_name.empty())
+            remote_initiator_cluster_name = cluster_name_from_settings;
+        auto remote_initiator_cluster = getClusterImpl(context, remote_initiator_cluster_name);
+        auto storage_and_context = convertToRemote(remote_initiator_cluster, context, remote_initiator_cluster_name, query_to_send);


@ianton-ru does it make sense? it looks like it does

arthurpassos · 2026-03-26T19:14:00Z

The changes look ok, but I think it needs more documentation. I also wonder if we can keep the 'cluster exists' check by wrapping that check in a `if (!is_remote0'

…r_improvements

arthurpassos

LGTM

alsugiliazova · 2026-03-30T17:54:14Z

Audit Report: PR #1577 — Remote initiator improvements

Scope: Altinity/ClickHouse PR #1577

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Confirmed defects

Medium: Forwarded query does not strip initiator-only setting

Impact: In mixed-version/rolling-upgrade clusters, the rewritten remote query can fail on the remote initiator with an unknown-setting error before execution, breaking object_storage_remote_initiator flow.
Anchor: src/Storages/IStorageCluster.cpp / IStorageCluster::convertToRemote
Trigger: Query uses object_storage_remote_initiator=1 and object_storage_remote_initiator_cluster='...', and selected remote initiator runs a version that does not know object_storage_remote_initiator_cluster.
Why defect: convertToRemote removes object_storage_remote_initiator from AST query settings, but does not remove object_storage_remote_initiator_cluster; this setting is initiator-only and is still forwarded to remote SQL.
Fix direction (short): Remove object_storage_remote_initiator_cluster from settings_ast.changes together with object_storage_remote_initiator, then drop SETTINGS clause if empty.
Regression test direction (short): Add test asserting rewritten forwarded SQL does not contain either initiator-only setting after convertToRemote.

Coverage summary

Scope reviewed: IStorageCluster remote initiator rewrite path; analyzer propagation of table-function SETTINGS (FunctionNode, QueryTreeBuilder, QueryAnalyzer, StorageDistributed); ITableFunctionCluster cluster existence deferral; new settings/history wiring; integration tests under tests/integration/test_s3_cluster.
Categories failed: Setting-forwarding compatibility contract (initiator-only setting leak to remote SQL).
Categories passed: Initiator host selection and URI decode path; user/password remote auth propagation; fail-closed behavior for secret-based clusters (NOT_IMPLEMENTED); nested table-function SETTINGS preservation in analyzer/query-tree conversion; deferred cluster lookup behavior; settings metadata/history registration; added integration scenarios for auth/secret/initiator-cluster split.
Assumptions/limits: Static audit only; no runtime mixed-version cluster execution performed in this review.

alsugiliazova · 2026-03-30T18:35:31Z

PR #1577 CI Verification Report

CI Results Overview

Category	Count
Success	~55
Failure	8 (see analysis below)
Skipped	~39 (excluded sanitizer suites)

PR's New Test Validation

The PR adds new integration tests for test_object_storage_remote_initiator in test_s3_cluster/test.py (+123 lines). These tests initially failed on the Mar 27 CI run but were fixed by subsequent commits (Fix test df2595a, Fix setting cleanup a5eee1d).

Latest run (Mar 30):

Job	Test	Result
Integration tests (amd_binary, 5/5)	`test_object_storage_remote_initiator`	OK
Integration tests (amd_asan, db disk, old analyzer, 2/6)	`test_object_storage_remote_initiator`	OK
Integration tests (arm_binary, distributed plan, 2/4)	`test_object_storage_remote_initiator`	OK
Integration tests (amd_asan, targeted)	`test_object_storage_remote_initiator[1-10]` through `[10-10]` (10 parametrized)	All OK

All 13 test executions passed on the latest CI run across amd_binary, amd_asan, and arm_binary configurations.

CI Failures

1. `test_object_storage_remote_initiator` (Mar 27 run) — Fixed by PR Commits

Jobs: Integration tests (amd_binary 2/5, amd_asan 4/6, arm_binary 2/4)

Failed on Mar 27 run, passed on all subsequent runs after fix commits df2595a and a5eee1d.

Related to PR: Yes — Development-stage failures, resolved in final commits

2. `01625_constraints_index_append` (Fast test, Mar 25) — Fixed by PR Commits

Job: Fast test (initial run only)

The PR modifies the reference file for this test. Failed once on the earliest commit (Mar 25), then passed consistently in all 24+ subsequent runs across all stateless test configurations.

Related to PR: Yes — Reference file update, resolved in subsequent commits

3. BuzzHouse (amd_debug, arm_asan) — Known Flaky Fuzzer

Server crash during random SQL fuzzing. BuzzHouse is a known flaky fuzzer across the CI.

Related to PR: No — Known flaky fuzzer unrelated to remote initiator changes

4. `test_storage_hudi` (3 tests) — Pre-existing Branch Failure

Job: Integration tests (amd_binary, 4/5)

test_single_hudi_file, test_multiple_hudi_files, test_types — All 3 Hudi tests fail. Database analysis shows these fail on multiple PRs (#1577, #1581, #1594, #1568) and master (PR=0), confirming pre-existing breakage.

Related to PR: No — Pre-existing Hudi test failure across the branch

5. `test_backup_to_s3_different_credentials[...-non_native_multipart]` — Flaky

Job: Integration tests (amd_binary, 5/5)

1 failure out of 72 total runs for this test on PR #1577. Passed in all other configurations (amd_asan, arm_binary) and in prior runs.

Related to PR: No — Intermittent flaky test

6. `01171_mv_select_insert_isolation_long` — Known Flaky

Job: Stateless tests (arm_asan, targeted) — 3 failures

Failed on all 3 targeted reruns. This is a long-running MV isolation test known to be unstable on arm_asan.

Related to PR: No — Pre-existing flaky test

7. `test_move_after_processing[another_bucket-AzureQueue]` — Unrelated

Job: Integration tests (arm_binary, distributed plan, 3/4)

Azure Queue storage processing test, completely unrelated to remote initiator or Iceberg changes.

Related to PR: No — Azure Queue test

8. Stateless tests (ParallelReplicas, s3 storage) — Intermittent Flaky

01038_dictionary_lifetime_min_zero_sec and 04003_cast_nullable_read_in_order_explain — 1 failure each in ParallelReplicas mode.

Related to PR: No — Intermittent ParallelReplicas flakiness

9. GrypeScan (-alpine) — CVE in Base Image

CVE in Alpine base image (altinityinfra/clickhouse-server:1577-26.1.6.20001.altinityantalya-alpine). Non-alpine image scan passed.

Related to PR: No — Base image vulnerability

Regression Test Results (PR's Internal CI)

Suite	x86_64	aarch64
Iceberg (1)	Fail (3h30m timeout)	Pass
Iceberg (2)	Pass	Pass
Parquet	Pass	Pass
Parquet (aws_s3)	Pass	Pass
Parquet (minio)	Pass	Pass
S3 Export (part)	Pass	Pass
S3 Export (partition)	Pass	Pass
Swarms	Fail	Fail

Regression Failure: Iceberg (1) x86_64 — Pre-existing Timeout

The Iceberg 1 suite hit the 3h30m job timeout on x86_64. Database analysis (30-day window) shows this is a pre-existing issue:

26.1.6.20001.altinityantalya x86_64: 2 Fail / 11 OK (~15% fail rate)
26.1.4.20001.altinityantalya x86_64: 7 Fail / 51 OK (~12% fail rate)
26.1.3.20001.altinityantalya x86_64: 66 Fail / 18 OK (~79% fail rate)
Iceberg 1 aarch64 passed. Iceberg 2 passed on both architectures.

Assessment: Flaky (pre-existing) — Intermittent timeout affecting the Iceberg 1 suite, not specific to this PR.

Regression Failure: Swarms — Pre-existing Node Failure Instability

Two failing scenarios on both x86_64 and aarch64:

`/swarms/feature/node failure/initiator out of disk space` (Fail)

Database check for: /swarms/feature/node failure/initiator out of disk space

- History: 52/113 fails in last 7 days, 202/423 fails in last 30 days
- Last fail: 2026-03-30
- Last pass: 2026-03-30
- Concentration: All versions affected — 26.1.6 (73%), 26.1.4 (28%), 26.1.3 (78%), 25.8.16 (5%), 25.8.14 (57%)
- Error signature: Consistent — UNKNOWN_DATABASE exception (Code: 81)

Assessment: Flaky (pre-existing)
Recommendation: Known unstable test, no action needed for PR verification

`/swarms/feature/node failure/check restart clickhouse on swarm node` (Error — 600s timeout)

Database check for: /swarms/feature/node failure/check restart clickhouse on swarm node

- History: 266 Fail + 60 Error / 97 OK in last 30 days (~77% failure rate)
- Last fail: 2026-03-26
- Last pass: 2026-03-30
- Concentration: All 26.1.x versions show 0% pass rate; 25.8.x has partial passes
- Error signature: Consistent — ExpectTimeoutError 600s

Assessment: Flaky (pre-existing)
Recommendation: Known unstable test, no action needed for PR verification

Both Swarms failures are confirmed as pre-existing instability by previous verification reports (PR #1575, PR #1583).

Verdict: Ready to merge after audit review — No unresolved PR-related failures.

ianton-ru added 5 commits March 24, 2026 09:49

User/password auth for object_storage_remote_initiator

000a737

Keep function settings in remote call

d4b850d

object_storage_remote_initiator_cluster setting

0c95253

Fix remote query with clustre, unknown on initial node

8dab965

Fix FunctionNode::toASTImpl

a0d1972

ianton-ru added antalya antalya-26.1 labels Mar 25, 2026

Remove unused header

d7c4bee

chatgpt-codex-connector bot reviewed Mar 25, 2026

View reviewed changes

svb-alt added the antalya-26.1.6.20001 label Mar 25, 2026

Fix test

df2595a

ianton-ru commented Mar 26, 2026

View reviewed changes

svb-alt requested a review from arthurpassos March 26, 2026 12:33

arthurpassos reviewed Mar 26, 2026

View reviewed changes

ianton-ru and others added 2 commits March 27, 2026 15:51

Add comments

d8b6b82

Merge branch 'antalya-26.1' into feature/antalya-26.1/remote_initiato…

259fb32

…r_improvements

arthurpassos previously approved these changes Mar 27, 2026

View reviewed changes

Fix setting object_storage_remote_initiator_cluster cleanup

a5eee1d

ianton-ru dismissed arthurpassos’s stale review via a5eee1d March 30, 2026 09:14

ianton-ru mentioned this pull request Mar 30, 2026

[NOT FOR MERGE]1493+1527+1568+1577 #1599

Open

27 tasks

alsugiliazova added verified Verified by QA labels Mar 30, 2026

Selfeer mentioned this pull request Mar 30, 2026

Export Partition - release the part lock when the query is cancelled #1593

Open

27 tasks

Conversation

ianton-ru commented Mar 25, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented Mar 25, 2026

Uh oh!

ianton-ru commented Mar 25, 2026

Audit: PR #1577 — Antalya 26.1: Remote initiator improvements

Confirmed defects

Coverage summary

Audit metadata

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arthurpassos commented Mar 26, 2026

Uh oh!

arthurpassos left a comment

Choose a reason for hiding this comment

Uh oh!

alsugiliazova commented Mar 30, 2026

Audit Report: PR #1577 — Remote initiator improvements

Confirmed defects

Coverage summary

Uh oh!

alsugiliazova commented Mar 30, 2026

PR #1577 CI Verification Report

CI Results Overview

PR's New Test Validation

CI Failures

1. test_object_storage_remote_initiator (Mar 27 run) — Fixed by PR Commits

2. 01625_constraints_index_append (Fast test, Mar 25) — Fixed by PR Commits

3. BuzzHouse (amd_debug, arm_asan) — Known Flaky Fuzzer

4. test_storage_hudi (3 tests) — Pre-existing Branch Failure

5. test_backup_to_s3_different_credentials[...-non_native_multipart] — Flaky

6. 01171_mv_select_insert_isolation_long — Known Flaky

7. test_move_after_processing[another_bucket-AzureQueue] — Unrelated

8. Stateless tests (ParallelReplicas, s3 storage) — Intermittent Flaky

9. GrypeScan (-alpine) — CVE in Base Image

Regression Test Results (PR's Internal CI)

Regression Failure: Iceberg (1) x86_64 — Pre-existing Timeout

Regression Failure: Swarms — Pre-existing Node Failure Instability

/swarms/feature/node failure/initiator out of disk space (Fail)

/swarms/feature/node failure/check restart clickhouse on swarm node (Error — 600s timeout)

Uh oh!

Reviewers

Assignees

Labels

github-actions bot commented Mar 25, 2026 •

edited

Loading

1. `test_object_storage_remote_initiator` (Mar 27 run) — Fixed by PR Commits

2. `01625_constraints_index_append` (Fast test, Mar 25) — Fixed by PR Commits

4. `test_storage_hudi` (3 tests) — Pre-existing Branch Failure

5. `test_backup_to_s3_different_credentials[...-non_native_multipart]` — Flaky

6. `01171_mv_select_insert_isolation_long` — Known Flaky

7. `test_move_after_processing[another_bucket-AzureQueue]` — Unrelated

`/swarms/feature/node failure/initiator out of disk space` (Fail)

`/swarms/feature/node failure/check restart clickhouse on swarm node` (Error — 600s timeout)