columnar: support bucket parallel read in region#10871
Conversation
Signed-off-by: yongman <yming0221@gmail.com>
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds FFI callbacks and hub caches (region bucket keys; shared snap access), exposes them via FFI, and refactors the C++ disaggregated columnar reader to use a shared serialized context, bucket-aware planning, and lazy, slot-based proxy reader materialization; also gates DeltaMerge thread init when columnar mode is active. ChangesColumnar Storage Read Path Enhancement
DeltaMerge Pool Conditional Initialization
Sequence Diagram(s)sequenceDiagram
participant StorageDisaggregated as StorageDisaggregatedColumnar
participant RNTask as RNProxyReadTask
participant HubFFI as Hub ffi (fn_get_columnar_reader / fn_get_region_bucket_keys)
participant PD as PD (Bucket lookup)
StorageDisaggregated->>RNTask: buildProxyReadTask(reader_plans)
RNTask->>HubFFI: fn_get_region_bucket_keys(region_id, region_ver)
HubFFI->>PD: request bucket keys (PD API)
PD-->>HubFFI: bucket keys
RNTask->>HubFFI: fn_get_columnar_reader(serialized_payload)
HubFFI-->>RNTask: ColumnarReader handle / error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
|
@yongman I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (2)
95-107: ⚡ Quick winLog the original cleanup exception.
This broad
catch (...)drops the exception details fromclear_shared_snap_access_by_start_ts, so failures here become almost impossible to diagnose.tryLogCurrentException(log, "...")is the project-standard way to preserve that context in cleanup paths.♻️ Suggested change
catch (...) { - try - { - LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts); - } - catch (...) - { - } + tryLogCurrentException(log, fmt::format("clear shared snapaccess cache failed, start_ts={}", start_ts)); }As per coding guidelines,
Use tryLogCurrentException(log, "context") in broad catch (...) paths to avoid duplicated exception-formatting code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 95 - 107, The catch-all after calling clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr) drops the original exception; replace the inner LOG_WARNING/empty catch with a call to tryLogCurrentException(log, "clear shared snapaccess cache failed, start_ts={}", start_ts) (or at minimum tryLogCurrentException(log, "clear shared snapaccess cache failed, start_ts=" + toString(start_ts))) so the original exception context is preserved; update the catch (...) block in StorageDisaggregatedColumnar.cpp accordingly to call tryLogCurrentException(log, ...) instead of swallowing the exception.Source: Coding guidelines
675-675: ⚡ Quick winUse the fmt-style
DB::Exceptionconstructor here.This new branch is still using the legacy
(message, code)form while the surrounding paths already use the repository-standardException(ErrorCodes::..., "...")style.♻️ Suggested change
- throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error");As per coding guidelines,
Use DB::Exception for error handling with the fmt-style constructor: throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` at line 675, Replace the legacy throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) usage with the repository-standard fmt-style constructor: throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error"); update the throw site in StorageDisaggregatedColumnar (the throw of Exception referencing ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) so it uses the ErrorCodes-first signature and follow the same fmt-style pattern as surrounding paths.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rs`:
- Around line 974-981: The group is being recreated by insert/get_loader after
remove_by_start_ts tears it down, so make evictions sticky: add a
tombstone/generation field to SharedSnapAccessGroup (e.g., evicted: AtomicBool
or generation: AtomicU64) and update remove_by_start_ts to mark the group's
tombstone/generation before removing it from groups; then change group
creation/lookup logic in insert and get_loader to consult that
tombstone/generation (if evicted or generation mismatches, refuse to recreate or
insert into the group and return an error/ignore) so any in-flight loader
holding the group's lock cannot resurrect it after clear (also ensure
request_snapshot_from_leader paths check the group's generation/tombstone before
calling insert).
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 1036-1076: The block-stream path is currently allowed to create
one RNProxyInputStream per bucket unit because planned_reader_num (derived from
total_max_reader_num after splitRangesByBucketKeys) can exceed num_streams;
modify the logic so getInputStreams() does not instantiate more streams than
num_streams: compute an effective_reader_num = min(planned_reader_num,
num_streams) (or cap by num_streams where planned_reader_num is used) and change
distribution so region_reader_plans and their bucket_units are assigned/shared
among those effective_reader_num logical readers (i.e., allow a single
RNProxyInputStream to process multiple bucket units/region plans) instead of
creating a stream per bucket unit; ensure enable_bucket_parallel,
planned_reader_num, total_max_reader_num and splitRangesByBucketKeys usages stay
but the instantiation point that creates RNProxyInputStream honors the cap and
uses a shared/task-pool style assignment.
---
Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 95-107: The catch-all after calling
clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr) drops the original
exception; replace the inner LOG_WARNING/empty catch with a call to
tryLogCurrentException(log, "clear shared snapaccess cache failed, start_ts={}",
start_ts) (or at minimum tryLogCurrentException(log, "clear shared snapaccess
cache failed, start_ts=" + toString(start_ts))) so the original exception
context is preserved; update the catch (...) block in
StorageDisaggregatedColumnar.cpp accordingly to call tryLogCurrentException(log,
...) instead of swallowing the exception.
- Line 675: Replace the legacy throw Exception("lock error",
ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) usage with the repository-standard
fmt-style constructor: throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR,
"lock error"); update the throw site in StorageDisaggregatedColumnar (the throw
of Exception referencing ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) so it uses the
ErrorCodes-first signature and follow the same fmt-style pattern as surrounding
paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 44b8edea-875b-4b99-9507-46688bf65dd6
📒 Files selected for processing (8)
contrib/tiflash-columnar-hub/hub-runtime/ffi/src/RaftStoreProxyFFI/ProxyFFI.hcontrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rscontrib/tiflash-columnar-hub/hub-runtime/src/columnar_impls.rscontrib/tiflash-columnar-hub/hub-runtime/src/interfaces.rscontrib/tiflash-columnar-hub/hub-runtime/src/run.rsdbms/src/Server/Server.cppdbms/src/Storages/StorageDisaggregatedColumnar.cppdbms/src/Storages/StorageDisaggregatedColumnar.h
Signed-off-by: yongman <yming0221@gmail.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)
471-485:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftHandle the empty-plan path before reading the first stream header.
Line 1136 now allows
buildProxyReadTask()to return no tasks, but the block-stream path still assumes at least one source exists and unconditionally callspipeline.firstStream()on Line 480. An empty-range scan will therefore crash here instead of returning an empty result.Also applies to: 1136-1142
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 471 - 485, The code assumes pipeline.firstStream() exists even when buildProxyReadTask()/read_proxy_tasks produced no streams; detect the empty-plan path by checking if pipeline.streams (or read_proxy_tasks) is empty before calling pipeline.firstStream(), and return or set analyzer appropriately to avoid dereferencing a non-existent stream. Specifically, after populating pipeline.streams (and after executeGeneratedColumnPlaceholder), if pipeline.streams.empty() then bypass constructing DAGExpressionAnalyzer from pipeline.firstStream() (e.g., create an empty NamesAndTypes for analyzer or leave analyzer null and ensure callers handle empty results) so functions like DAGExpressionAnalyzer construction and further processing do not crash when there are no proxy read tasks.
🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)
668-680: ⚡ Quick winUse the repo-standard
DB::Exceptionconstructor here.These throws reverse the expected
(error_code, format, ...)order. Please switch them to the fmt-style form so they do not depend on a legacy overload.Suggested cleanup
- throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error"); ... - throw Exception("read_block failed in tiflash-proxy", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, "read_block failed in tiflash-proxy");As per coding guidelines,
**/*.cpp: UseDB::Exceptionfor error handling with the fmt-style constructor:throw Exception(ErrorCodes::SOME_CODE, "Message with {}", arg);Also applies to: 1250-1254
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 668 - 680, Replace legacy Exception throws that pass message then code with the repo-standard DB::Exception fmt-style constructor; specifically in StorageDisaggregatedColumnar.cpp where the LockedError handling uses throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) (the block that parses lock_info, calls cluster->lock_resolver->resolveLocks and logs before_expired) change it to throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") and apply the same conversion for the similar throw in the other location referenced around the 1250–1254 area so all Exception instantiations use the (ErrorCodes::..., "fmt {}", args...) form.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 471-485: The code assumes pipeline.firstStream() exists even when
buildProxyReadTask()/read_proxy_tasks produced no streams; detect the empty-plan
path by checking if pipeline.streams (or read_proxy_tasks) is empty before
calling pipeline.firstStream(), and return or set analyzer appropriately to
avoid dereferencing a non-existent stream. Specifically, after populating
pipeline.streams (and after executeGeneratedColumnPlaceholder), if
pipeline.streams.empty() then bypass constructing DAGExpressionAnalyzer from
pipeline.firstStream() (e.g., create an empty NamesAndTypes for analyzer or
leave analyzer null and ensure callers handle empty results) so functions like
DAGExpressionAnalyzer construction and further processing do not crash when
there are no proxy read tasks.
---
Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 668-680: Replace legacy Exception throws that pass message then
code with the repo-standard DB::Exception fmt-style constructor; specifically in
StorageDisaggregatedColumnar.cpp where the LockedError handling uses throw
Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR) (the block that
parses lock_info, calls cluster->lock_resolver->resolveLocks and logs
before_expired) change it to throw
Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "lock error") and apply the same
conversion for the similar throw in the other location referenced around the
1250–1254 area so all Exception instantiations use the (ErrorCodes::..., "fmt
{}", args...) form.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3573fc15-b3ea-48c0-b058-892e56d64f1d
📒 Files selected for processing (2)
dbms/src/Storages/StorageDisaggregatedColumnar.cppdbms/src/Storages/StorageDisaggregatedColumnar.h
🚧 Files skipped from review as they are similar to previous changes (1)
- dbms/src/Storages/StorageDisaggregatedColumnar.h
Signed-off-by: yongman <yming0221@gmail.com>
|
/hold |
Signed-off-by: yongman <yming0221@gmail.com>
|
/test pull-integration-next-gen-columnar |
1 similar comment
|
/test pull-integration-next-gen-columnar |
Bug: EPOCH_NOT_MATCH Infinite Retry on Region 125 in Disaggregated Columnar ReadSummaryWhen running Test Case
Test Flow (simplified)-- t_1 (int handle clustered): PASSED
create table test.t_1(a int primary key clustered, col int);
insert into test.t_1 values(1,2),(2,3);
alter table test.t_1 set tiflash replica 1;
-- wait_table → select → alter column → select → drop
-- t_2 (common handle clustered): PASSED
create table test.t_2(a varchar(10), b int, c int, primary key(a, b) clustered);
insert into test.t_2 values('1',2,3),('2',3,4);
alter table test.t_2 set tiflash replica 1;
-- wait_table → select → alter column → select → drop
-- t_3 (composite PRIMARY KEY clustered): FAILED
create table test.t_3 (A int, B varchar(20), C int, D int, PRIMARY KEY(A,C) CLUSTERED);
insert into test.t_3 values (1,'1',1,1),(2,'2',2,2);
alter table test.t_3 set tiflash replica 1;
-- wait_table → select → ERROR!ErrorLog AnalysisAll log files are under Timeline
Region 125 — Before vs After SplitBefore (epoch=68, t_2 exists): After (epoch=69, t_2 dropped, t_3 created): Key changes:
Root CauseThe bug is a missing epoch refresh in the EPOCH_NOT_MATCH handling path, spanning two layers: ArchitectureTwo Contributing IssuesIssue 1 (C++ layer):
|
| File | Role |
|---|---|
dbms/src/Storages/StorageDisaggregatedColumnar.cpp:763-792 |
C++ retry loop — never updates reader_plan |
dbms/src/Storages/StorageDisaggregatedColumnar.cpp:605-634 |
C++ error handling — drops cache but doesn't update plan |
dbms/src/Storages/Columnar/RNProxyReaderPlan.h:30-36 |
Plan struct — region_ver is a value type |
contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rs:715-722 |
★ PRIMARY BUG — EPOCH_NOT_MATCH not retried |
contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rs:707-713 |
Reference — not_leader is retried correctly |
Log Files
tests/fullstack-test-next-gen-columnar/log/tiflash-cn0/tiflash.log— C++ side (plan building, retry loop)tests/fullstack-test-next-gen-columnar/log/tiflash-cn0/tiflash_error.log— C++ warnings (repeated EPOCH_NOT_MATCH)tests/fullstack-test-next-gen-columnar/log/tiflash-cn0/tiflash_tikv.log— Rust proxy side (snapshot requests, epoch details)
How Bucket-Read Branch Introduced This Bug
Key Commits
| Commit | Author | Date | Description |
|---|---|---|---|
61d37c1750 |
Ray Yan | May 22 | *: refactor proxy to hub lib for columnar — introduced cloud_helper.rs and EPOCH_NOT_MATCH non-retry (on both master and bucket-read) |
d20fc65bb4 |
yongman | Jun 2 | snapaccess cache — wrapped request_snapshot_from_leader with get_or_request_shared_snapshot caching layer |
d0564d766b |
yongman | Jun 9 | optimize snapaccess cache for buckets read — refactored cache to DashMap groups by start_ts; added clear_shared_snap_access_by_start_ts FFI |
82f11a7bc9 |
yongman | Jun 9 | clear snap access in last owner drop — changed cache clearing from eager (every context destruction) to lazy (only last owner) |
de6a182f67 |
yongman | Jun 9 | avoid inflight request leak snapaccess — added terminal state to prevent inflight loader leaks after cache clear |
Latent Bug vs. Exposure
The EPOCH_NOT_MATCH non-retry in request_snapshot_from_leader has existed since 61d37c1750 (which is on both master and bucket-read):
// hub-runtime/src/cloud_helper.rs:715-722
// Same on master AND bucket-read:
if delegate_resp.get_region_error().has_epoch_not_match() {
// Return epoch not match error to TiDB to retry. ← comment is wrong
return Err(Error::RegionError(delegate_resp.take_region_error())); // never retries
}Compare with not_leader handling in the same backoff loop:
// NotLeader is properly retried:
if delegate_resp.get_region_error().has_not_leader() {
pd_client.evict_region_cache(shard_id);
leader_changed = true;
tokio::time::sleep(next_delay).await;
continue; // retries ✅
}Why master doesn't reproduce:
-
No caching wrapper — master calls
request_snapshot_from_leaderdirectly. Bucket-read interposesget_or_request_shared_snapshotwhich adds loader mutex serialization and SnapAccess caching by(shard_id, shard_ver, start_ts, start_table_id, end_table_id). -
Different timing characteristics — the caching layer changes request serialization patterns. Combined with lazy cache clearing (
82f11a7bc9), stale region cache entries survive longer, making the race window between region split and query execution wider. -
Test path not exercised — the
fullstack-test-next-gen-columnartest infrastructure may not have been actively run against upstream/master.
What Master and Bucket-Read Share
| Component | Master | Bucket-Read | Same? |
|---|---|---|---|
request_snapshot_from_leader EPOCH_NOT_MATCH handling |
return Err |
return Err |
✅ Identical |
C++ createColumnarReaderWithBackoff retry loop |
const reader_plan, same stall epoch |
const reader_plan, same stale epoch |
✅ Identical |
C++ createProxyColumnarReader epoch handling |
drops region cache, throws | drops region cache, throws | ✅ Identical |
What Bucket-Read Added
| Component | Master | Bucket-Read |
|---|---|---|
make_columnar_reader → snapshot request |
Direct call to request_snapshot_from_leader |
Via get_or_request_shared_snapshot caching wrapper |
| SnapAccess caching | None | SharedSnapAccessCache by (shard_id, shard_ver, start_ts, start_table_id, end_table_id) |
| Cache clearing | N/A | Lazy (only on last owner drop via StartTsClearRegistry) |
| Loader serialization | None | Per-key Mutex via get_loader() |
Conclusion
The root cause (EPOCH_NOT_MATCH non-retry in request_snapshot_from_leader) was introduced in 61d37c1750, which is on both branches. The bucket-read exposure was caused by d20fc65bb4 and d0564d766b which added the get_or_request_shared_snapshot caching layer, changing the timing characteristics enough to turn a latent race into a reproducible failure.
Reproduction
cd tests/fullstack-test-next-gen-columnar
./compose.sh up -d
# wait for cluster ready
./compose.sh exec -T tiflash-cn0 bash -c \
'cd /tests && ENABLE_NEXT_GEN=true verbose=true ./run-test.sh fullstack-test2/clustered_index/ddl.test'Signed-off-by: yongman <yming0221@gmail.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rs (1)
715-723:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRetry
epoch_not_matchin the hub instead of returning it.This still matches the failure described in the PR discussion: after a split, returning here can leave the caller recreating readers with the same stale
shard_veruntil timeout. Handleepoch_not_matchlikenot_leader: evict the region cache, refresh the retry epoch from the error payload, back off, and continue insiderequest_snapshot_from_leaderrather than surfacing the stale plan upstream.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rs` around lines 715 - 723, The code in request_snapshot_from_leader currently treats delegate_resp.get_region_error().has_epoch_not_match() the same as other fatal errors and returns Error::RegionError, which leaves callers with stale shard_ver; instead handle epoch_not_match like the not_leader path: evict the region from the local region cache, extract and refresh the retry epoch/version from delegate_resp.get_region_error() (use the epoch info in the RegionError payload), apply a backoff, and loop to retry the request rather than returning Error::RegionError. Locate the epoch_not_match branch in request_snapshot_from_leader and modify it to call the same cache-eviction and retry logic used for not_leader (preserving logging) and continue the retry loop with the updated epoch/version.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rs`:
- Around line 715-723: The code in request_snapshot_from_leader currently treats
delegate_resp.get_region_error().has_epoch_not_match() the same as other fatal
errors and returns Error::RegionError, which leaves callers with stale
shard_ver; instead handle epoch_not_match like the not_leader path: evict the
region from the local region cache, extract and refresh the retry epoch/version
from delegate_resp.get_region_error() (use the epoch info in the RegionError
payload), apply a backoff, and loop to retry the request rather than returning
Error::RegionError. Locate the epoch_not_match branch in
request_snapshot_from_leader and modify it to call the same cache-eviction and
retry logic used for not_leader (preserving logging) and continue the retry loop
with the updated epoch/version.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d1a3ab49-819d-4b19-8b74-58f434ca7229
📒 Files selected for processing (3)
contrib/tiflash-columnar-hub/hub-runtime/src/cloud_helper.rsdbms/src/Storages/StorageDisaggregatedColumnar.cppdbms/src/Storages/StorageDisaggregatedColumnar.h
🚧 Files skipped from review as they are similar to previous changes (1)
- dbms/src/Storages/StorageDisaggregatedColumnar.cpp
|
/retest |
|
/test pull-unit-next-gen |
Signed-off-by: yongman <yming0221@gmail.com>
Signed-off-by: yongman <yming0221@gmail.com>
|
/retest |
|
/retest-required |
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
/cherry-pick release-nextgen-202603 |
|
@JaySon-Huang: new pull request created to branch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
What problem does this PR solve?
Issue Number: close #10844
Problem Summary:
The read concurrency is limited in region level.
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
New Features
Refactor
Tests