[fix](catalog) Refresh remote OLAP partitions on replica relocation#63894
[fix](catalog) Refresh remote OLAP partitions on replica relocation#63894HonestManXin wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found one blocking correctness issue in the remote OLAP metadata refresh path.
Critical checkpoint conclusions:
- Goal and proof: The PR aims to refresh remote OLAP partition metadata when replica placement changes without visible-version changes. The checksum approach addresses the goal, but the new lock splitting can return inconsistent table/partition metadata; current tests only cover checksum ordering and do not cover RPC snapshot consistency or concurrent partition changes.
- Scope: The change is focused, but the lock/lifecycle change is larger than a local checksum optimization because it changes the consistency contract of getOlapTableMeta.
- Concurrency: This path does involve table metadata read/write concurrency. The previous implementation held the table read lock while serializing table meta and partition updates; the new implementation releases and reacquires it in batches, allowing DDL between phases and creating mixed snapshots.
- Lifecycle/static initialization: No special lifecycle or cross-TU static initialization concern found.
- Configuration: No new configuration items.
- Compatibility: Thrift fields are optional and backward-compatible at the wire level; no incompatible storage format change found.
- Parallel paths: The normal and temp partition paths were both updated.
- Special conditions: The checksum comparison itself is deterministic by sorted index/tablet/replica IDs, but response construction must remain snapshot-consistent.
- Test coverage: Added unit coverage validates checksum determinism, but it does not cover the end-to-end remote metadata RPC behavior or concurrent partition add/drop during refresh.
- Test result correctness: No .out result changes.
- Observability: Existing debug logging is sufficient for this change.
- Transaction/persistence/data writes: No transaction or persistence path changes.
- FE/BE variables: This is FE-to-FE thrift metadata; optional fields were added in the thrift definition.
- Performance: Batching reduces lock hold time, but not at the cost of returning an inconsistent metadata snapshot.
- User focus: No additional user-provided review focus was supplied.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31518 ms |
TPC-DS: Total hot run time: 172969 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
6f1eefd to
f22ebd7
Compare
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found one blocking correctness gap in the new checksum protocol. The server-side mixed-snapshot concern already present in the existing review thread was not duplicated here.
Critical checkpoint conclusions:
- Goal/test: the PR aims to refresh remote OLAP partition metadata when replica placement changes without a visible-version change. The new unit test covers checksum determinism, but it does not cover the end-to-end bootstrap path for already-cached remote partitions.
- Scope: the change is focused and small, but the client request path misses a required fallback for caches created before checksums were stored.
- Concurrency/lifecycle: the current head holds the table read lock while building table metadata and partition payloads, so I did not raise the already-known stale-snapshot issue again. No additional lock-order issue found in the reviewed path.
- Compatibility: thrift optional fields are wire-compatible, but rolling upgrade / pre-existing cache behavior is incomplete because existing cached partitions have no stored remote checksum.
- Parallel paths: normal and temp partition request building share the same helper, so the issue affects both paths.
- Tests: missing coverage for refreshing a version-matched partition whose cached metadata has no remoteMetaChecksum but whose replica layout differs from the remote source.
- Observability/config/persistence/data writes: no new config, persistence, transaction, or data-write path issue found. Existing logs are sufficient for this metadata RPC.
User focus: no additional user-provided review focus was specified.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31520 ms |
TPC-DS: Total hot run time: 172501 ms |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
f22ebd7 to
5354b6e
Compare
|
/review |
|
OpenCode automated review failed and did not complete. Error: Error: Token refresh failed: 401 Please inspect the workflow logs and rerun the review after the underlying issue is resolved. |
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31636 ms |
TPC-DS: Total hot run time: 171952 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
/review |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoint conclusions:
- Goal and tests: The PR targets refreshing remote OLAP partition metadata when replica placement changes without visible-version changes. The checksum approach covers the main normal-partition relocation path and has a checksum unit test, but the client rebuild flow has a correctness bug when only normal partitions are updated.
- Scope and clarity: The change is mostly focused on remote Doris table metadata refresh; the identified issue is localized to list reuse in
FeServiceClient.getOlapTable. - Concurrency and lifecycle: Server-side table metadata and partition payloads are now produced under one table read lock in the current head, so I did not find an additional mixed-snapshot issue beyond the already-known thread. The new transient checksum cache has no persistence/lifecycle concern.
- Compatibility: New thrift fields are optional, and old/new FE interaction degrades to existing behavior. No additional protocol compatibility blocker found.
- Parallel paths: Normal and temp partition rebuild paths are parallel; the temp path currently reuses the normal updated list when no temp update list is present, which is the blocking issue below.
- Data correctness: The blocking issue can make a remote table expose normal partitions as temp partitions after a normal partition checksum refresh.
- Observability: Existing debug logging is sufficient for this metadata refresh path.
- Tests: The added unit test validates checksum stability/change detection, but there is no coverage for the client rebuild case where normal partitions update and temp partitions do not.
User focus: No additional user-provided review focus was specified.
5354b6e to
0087f8f
Compare
|
/review |
There was a problem hiding this comment.
I found one distinct blocking issue in the current head. The existing inline threads about mixed snapshots, checksum bootstrapping, and temp partition list reuse were reviewed and not duplicated here.
Critical checkpoint conclusions:
- Goal/test: the PR aims to refresh remote OLAP partition metadata when replica metadata changes without visible version/time changes. The added unit test covers ordering stability and backend-id changes, but not other query-affecting replica metadata changes.
- Scope: the change is focused, but the checksum definition is too narrow for the stated goal.
- Concurrency/lifecycle: the server now keeps table metadata and partition payload generation under one table read lock, which addresses the snapshot consistency concern already raised in an existing thread. No new lock-order issue found.
- Compatibility: thrift fields are optional, so mixed-version thrift parsing is structurally compatible; the existing bootstrap thread covers old cached partitions.
- Parallel paths: normal and temp partitions use the same checksum path.
- Data correctness: blocking issue below: unchanged checksum can suppress refreshes after query-relevant replica metadata changes, leaving stale remote scan locations/replica eligibility.
- Tests: current tests do not prove refresh for bad/failed/state/path metadata changes.
- Observability/config/persistence: no new config or persistence requirement found beyond existing metadata serialization.
- User focus: no additional user-provided review focus was specified.
0087f8f to
503a238
Compare
|
/review |
There was a problem hiding this comment.
I found one additional blocking correctness issue in the remote partition freshness check. The existing review threads already cover mixed snapshots, checksum bootstrap, temp partition mixing, and missing replica query fields; this comment is a distinct non-replica stale-partition case.
Critical checkpoint conclusions:
- Goal/test: the PR aims to refresh remote OLAP partition metadata when replica placement/status changes without a visible version change. It partially accomplishes this and adds checksum unit tests, but the freshness key is still incomplete for the full serialized Partition payload.
- Scope/clarity: the change is focused, but the checksum abstraction needs to match the data it gates.
- Concurrency/lifecycle: table metadata response generation now uses a single table read lock, which addresses the previously noted mixed snapshot risk; no new thread/lifecycle concern found beyond existing comments.
- Compatibility: thrift fields are optional, so wire compatibility is intended; rolling-upgrade checksum bootstrap was already raised in an existing thread.
- Parallel paths: normal and temp partitions both use the checksum path.
- Tests: tests cover ordering and several replica fields, but do not cover non-replica Partition fields such as partition rename.
- Observability/performance: no additional observability issue found; checksum work is bounded by serialized partition metadata size and only done during remote metadata refresh.
- User focus: no additional user-provided review focus was supplied.
503a238 to
cfdbbed
Compare
|
/review |
1 similar comment
|
/review |
|
run buildall |
There was a problem hiding this comment.
Code review completed for PR 63894 at cfdbbed.
No additional distinct inline findings were confirmed beyond the already-open review threads provided in .opencode-review.oz4N53/pr_review_threads.md. I avoided duplicating those existing comments.
Checkpoint conclusions:
- Goal: The PR aims to avoid stale remote OLAP partition metadata after replica/top-level metadata changes by adding partition metadata checksums to the FE metadata RPC. The current implementation addresses the main checksum exchange flow in the inspected paths, subject to the already-open review threads.
- Scope: The change is focused on
Partitionchecksum generation, remote FE client request/response handling, service-side partition filtering, thrift fields, and checksum tests. - Concurrency: The server-side metadata response is built under
table.readLock()in the inspected current diff, avoiding the mixed table/partition snapshot issue locally. No new lock ordering or deadlock issue was confirmed. - Lifecycle/static initialization: No new static initialization or lifecycle hazard was found.
- Configuration: No configuration items are added.
- Compatibility: The thrift fields are optional. Rolling-upgrade checksum bootstrapping was inspected; no new distinct compatibility issue was confirmed beyond the existing review context.
- Parallel code paths: Normal and temp partition paths were both inspected. No additional distinct path omission was confirmed.
- Special checks/error handling: RPC status handling and partition removal/update filtering were inspected; no new unchecked failure path was confirmed.
- Tests: Unit tests cover deterministic checksum ordering and changes to several partition/replica fields. I did not run tests in this review runner.
- Observability: No new observability requirement was identified for this metadata cache optimization.
- Transaction/persistence/data writes: No transaction, edit-log, or storage write path change is introduced.
- FE-BE variable passing: The new values are FE-to-FE thrift metadata fields, not BE scan variables; no BE propagation issue was found.
- Performance: The checksum adds sorting/hash work during remote metadata refresh. This is outside the normal query execution hot path, and no confirmed pathological behavior beyond existing cache correctness discussions was found.
User focus: No additional user-provided review focus was specified.
cfdbbed to
344a4ee
Compare
|
/review |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Automated review summary
No additional confirmed, distinct issues were found in this review pass beyond the existing inline review threads. I did not resubmit the already-known concerns from those threads.
Critical checkpoint conclusions:
- Goal and tests: The PR targets refreshing remote OLAP cached partitions when metadata changes without visible-version changes, especially replica relocation. The added checksum flow and unit tests cover the main intended checksum cases.
- Scope and clarity: The change is focused on the remote OLAP table metadata request/response path and partition checksum calculation.
- Concurrency and lifecycle: The current implementation keeps table metadata serialization and partition update/removal decisions under the table read lock, avoiding a mixed snapshot in this path. No additional distinct concurrency/lifecycle issue was confirmed.
- Configuration and compatibility: No new configuration items. Added Thrift fields are optional, so no additional protocol compatibility issue was confirmed in this pass.
- Parallel paths: Normal and temp partition refresh paths are both handled.
- Error handling and data correctness: No additional distinct unchecked failure or stale-cache path was confirmed beyond existing review threads.
- Test coverage: Unit tests were added for checksum determinism and checksum changes on top-level partition fields and query-affecting replica fields. I did not run tests in this review environment.
- Observability and performance: No new observability requirement or confirmed performance blocker was found.
User focus: No additional user-provided review focus was specified.
344a4ee to
db5061c
Compare
|
/review |
There was a problem hiding this comment.
Review completed against the GitHub PR diff and existing inline review context. I did not add new inline comments because I did not find a distinct, non-duplicative issue beyond the already-known threads.
Critical checkpoint conclusions:
- Goal/test: The PR aims to avoid stale remote Olap partition metadata by adding partition meta checksums and tests for checksum determinism and selected invalidation cases. The latest patch includes unit coverage for replica query fields and partition top-level fields.
- Scope: The actual PR scope is focused on remote Olap table metadata refresh, checksum plumbing, and thrift fields.
- Concurrency/lifecycle:
getOlapTableMetanow keeps table metadata serialization and partition change collection under one table read lock, addressing the mixed-snapshot risk already raised. No new concurrency issue found in the reviewed diff. - Compatibility: New thrift fields are optional and appended. Checksum-less requests still follow the old version-only behavior, while the updated client bootstraps a checksum from cached partitions.
- Parallel paths: Normal and temp partition refresh paths both carry checksums and separate updated/removed lists in the latest patch.
- Data correctness: Existing review threads cover the main stale-cache risks. I did not find an additional distinct stale-cache path that is not already represented by those threads.
- Tests: Unit tests cover checksum ordering stability, selected replica fields, rename/state/bucket/nextVersion changes. I did not run tests in this review.
- Observability/performance: No additional logging/metric requirement identified for this metadata refresh path; checksum cost is proportional to visible partition replica metadata during refresh.
User focus: No additional user-provided review focus was specified.
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Currently, the update of partitions only depends on the visible version and visible time. If a balance occurs, the version and time of the partition will not be updated, which means that the updated partition will not be retrieved from the remote FE. When executing a query, the tablet on the BE node may no longer exist, resulting in query errors.
To avoid this problem, a checksum will be calculated for the partition to determine whether the partition's metadata has changed.