fix(pipelines): align tidb CI handling and fix hardcoded usages by wuhuizuo · Pull Request #4626 · PingCAP-QE/ci

wuhuizuo · 2026-05-22T12:49:26Z

Summary

align pingcap-inc/tidb release-8.5 pull_build and pull_unit_test with the corresponding pingcap/tidb handling
keep private-repo checkout compatible by using the repository credential flow with checkoutRefsWithCacheLock(..., withSubmodule = true)
remove outdated bigdisk scheduling settings from the affected pingcap-inc/tidb pods
fix hardcoded repo and workspace usages in related pingcap/tidb pipeline scripts and pod configs

Testing

not run (PR metadata update only)

…ase-8.5 pods

ti-chi-bot

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary:
This PR removes the ee-bigdisk nodeSelector and corresponding tolerations from the pingcap-inc/tidb release-8.5 pod-pull build and unit test pipeline manifests. The intent is to align these manifests with the existing pingcap/tidb release-8.5 scheduling configuration by no longer restricting pods to nodes labeled with ee-bigdisk=true. The changes are straightforward YAML removals and appear to be limited to only the relevant pipeline manifests. Overall, the PR is clear and concise with no apparent errors.

Code Improvements

Consistency Verification:
- Files:
  pipelines/pingcap-inc/tidb/release-8.5/pod-pull_build.yaml (lines ~69-79)
  pipelines/pingcap-inc/tidb/release-8.5/pod-pull_unit_test.yaml (lines ~71-81)
- Issue:
  The PR description mentions matching the configuration to the existing pingcap/tidb release-8.5 scheduling, but there is no evidence in the diff or PR that the target manifests have been verified to confirm this exact alignment. It’s important to ensure that the removal here does not unintentionally schedule these pods on undesired nodes.
- Suggested Solution:
  Add a comment or note in the PR or codebase referencing the source configuration that these changes are being aligned with. Possibly include a link or a snippet of the pingcap/tidb manifests for reviewer clarity.
Error Handling / Validation:
- Files: Same as above.
- Issue:
  Removing nodeSelector and tolerations changes pod scheduling behavior, which could cause pods to land on unsuitable nodes if cluster labels or taints change in the future.
- Suggested Solution:
  Consider adding a validation step in the CI pipeline or a pre-submit hook that verifies pods are scheduled on appropriate nodes (e.g., nodes with sufficient disk or resources), or document the expected node labeling standards clearly.

Best Practices

Documentation:
- Files: Both YAML manifests, near the removed sections.
- Issue:
  The removed nodeSelector and tolerations are a critical part of pod scheduling; removing them can affect cluster resource usage. There are no comments explaining why these were removed or what the new expected scheduling behavior is.
- Suggested Solution:
  Add a comment block above the spec: section or near the scheduling configuration explaining the rationale for removal, for example:
```
# Removed ee-bigdisk nodeSelector and tolerations to align with pingcap/tidb release-8.5 scheduling.
# Pods will now be scheduled on any node matching the default selectors without requiring the ee-bigdisk label.
```
Testing Coverage:
- Files: N/A (pipeline YAMLs)
- Issue:
  Changes to scheduling constraints might affect pod placement and test results, but no mention is made of testing these changes.
- Suggested Solution:
  Ensure that test runs triggered by these updated pipelines are monitored for any unexpected pod scheduling failures or resource issues. If possible, add automated checks or alerts for pod evictions or scheduling delays.

No critical issues or security concerns are present given the limited scope of the changes. The main improvements focus on clarifying intent, ensuring alignment is verified, and documenting the impact for future maintainers.

gemini-code-assist

Code Review

This pull request removes the ee-bigdisk node selectors and tolerations from the build and unit test pod configurations for TiDB release 8.5. The reviewer expressed concerns that removing these constraints while maintaining high ephemeral storage limits (up to 300Gi) could lead to scheduling instability or pod evictions on standard nodes that cannot accommodate such limits. It is recommended to verify these changes through empirical testing to ensure CI stability.

I am having trouble creating individual review comments. Click here to see my feedback.

pipelines/pingcap-inc/tidb/release-8.5/pod-pull_build.yaml (72-78)

Removing the ee-bigdisk node selector and tolerations while maintaining a high ephemeral-storage limit (150Gi) may lead to scheduling instability. While removing storage constraints can be an intentional performance improvement, stability is prioritized in CI. Please verify this change with empirical testing to ensure pods are not evicted when disk usage exceeds the 50Gi request on standard nodes.

References

Parallelism and resource settings in CI jobs should be determined by empirical testing for stability and performance rather than strictly adhering to theoretical limits.
Removing persistent volumes for caching can be an intentional change to improve build performance, even though it seems counterintuitive.
In CI configurations, stability and behavioral consistency may be prioritized over refactoring, especially when the change is perceived as risky.

pipelines/pingcap-inc/tidb/release-8.5/pod-pull_unit_test.yaml (74-80)

Removing the ee-bigdisk constraints is risky given the 300Gi ephemeral-storage limit. Although such removals can be intentional performance optimizations, they may lead to non-deterministic failures if pods are scheduled on nodes unable to accommodate the limit. Please verify this change through empirical testing to maintain CI stability.

References

Parallelism and resource settings in CI jobs should be determined by empirical testing for stability and performance rather than strictly adhering to theoretical limits.
Removing persistent volumes for caching can be an intentional change to improve build performance, even though it seems counterintuitive.
In CI configurations, stability and behavioral consistency may be prioritized over refactoring, especially when the change is perceived as risky.

wuhuizuo · 2026-05-22T13:09:06Z

/approve

wuhuizuo · 2026-05-22T13:12:13Z

/hold

wuhuizuo · 2026-05-22T13:13:07Z

/approve

ti-chi-bot · 2026-05-22T13:13:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wuhuizuo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pipelines/OWNERS~~ [wuhuizuo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Increase CPU/memory resources and reduce ephemeral storage requests - Replace emptyDir volumes with ephemeral PVCs using hyperdisk-rwo - Use checkoutRefsWithCacheLock instead of manual cache + retry logic - Add bazel-repository-cache volume mount for unit tests - Remove ci-nvme-high-performance node selector from check2 and br_test - Strip bazel cache mirrors and local repository cache references - Increase workspace volume size to 300Gi for unit tests

replace hardcoded dir name with `REFS.repo`

wuhuizuo · 2026-05-25T09:54:40Z

/unhold

ti-chi-bot · 2026-05-25T09:55:58Z

@wuhuizuo: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-replay-jenkins-pipelines	`7959255`	link	false	`/test pull-replay-jenkins-pipelines`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ci(pipelines): remove bigdisk nodeSelector from pingcap-inc/tidb rele…

b170a05

…ase-8.5 pods

github-project-automation Bot added this to EE - CI/CD system May 22, 2026

ti-chi-bot Bot added the area/jenkins-pipelines label May 22, 2026

ti-chi-bot Bot reviewed May 22, 2026

View reviewed changes

ti-chi-bot Bot added the size/S label May 22, 2026

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

ti-chi-bot Bot added the approved label May 22, 2026

ti-chi-bot Bot added the do-not-merge/hold label May 22, 2026

wuhuizuo marked this pull request as draft May 22, 2026 13:12

ti-chi-bot Bot added the do-not-merge/work-in-progress label May 22, 2026

wuhuizuo added 2 commits May 25, 2026 17:46

fix(pipelines/pingcap/tidb): fix hardcoded usages

7959255

replace hardcoded dir name with `REFS.repo`

ti-chi-bot Bot added size/XL and removed size/S labels May 25, 2026

ti-chi-bot changed the title ~~ci(pipelines): remove bigdisk nodeSelector from pingcap-inc/tidb release-8.5 pods~~ fix(pipelines): align tidb CI handling and fix hardcoded usages May 25, 2026

wuhuizuo marked this pull request as ready for review May 25, 2026 09:54

ti-chi-bot Bot removed the do-not-merge/work-in-progress label May 25, 2026

ti-chi-bot Bot removed the do-not-merge/hold label May 25, 2026

ti-chi-bot Bot merged commit 5ce24f9 into main May 25, 2026
7 of 8 checks passed

ti-chi-bot Bot deleted the agent/coder/ecaf4137 branch May 25, 2026 10:01

github-project-automation Bot moved this to ✅ Done in EE - CI/CD system May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pipelines): align tidb CI handling and fix hardcoded usages#4626

fix(pipelines): align tidb CI handling and fix hardcoded usages#4626
ti-chi-bot[bot] merged 3 commits into
mainfrom
agent/coder/ecaf4137

wuhuizuo commented May 22, 2026 •

edited by ti-chi-bot

Loading

Uh oh!

ti-chi-bot Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

wuhuizuo commented May 22, 2026

Uh oh!

wuhuizuo commented May 22, 2026

Uh oh!

wuhuizuo commented May 22, 2026

Uh oh!

ti-chi-bot Bot commented May 22, 2026

Uh oh!

wuhuizuo commented May 25, 2026

Uh oh!

ti-chi-bot Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wuhuizuo commented May 22, 2026 • edited by ti-chi-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

ti-chi-bot Bot left a comment

Choose a reason for hiding this comment

Code Improvements

Best Practices

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

pipelines/pingcap-inc/tidb/release-8.5/pod-pull_build.yaml (72-78)

pipelines/pingcap-inc/tidb/release-8.5/pod-pull_unit_test.yaml (74-80)

Uh oh!

wuhuizuo commented May 22, 2026

Uh oh!

wuhuizuo commented May 22, 2026

Uh oh!

wuhuizuo commented May 22, 2026

Uh oh!

ti-chi-bot Bot commented May 22, 2026

Uh oh!

wuhuizuo commented May 25, 2026

Uh oh!

ti-chi-bot Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wuhuizuo commented May 22, 2026 •

edited by ti-chi-bot

Loading