diff --git a/.github/pr-review-bot-README.md b/.github/pr-review-bot-README.md new file mode 100644 index 0000000000..934c0997fc --- /dev/null +++ b/.github/pr-review-bot-README.md @@ -0,0 +1,269 @@ +# PR Review Assignment Bot + +Automated GitHub Actions workflow that assigns reviewers to pull requests using weighted load balancing. + +**Workflow file:** [workflows/auto-assign-reviewers.yml](workflows/auto-assign-reviewers.yml) +**Branch:** `Usr/sogh/prreviewbot` (PR [#3326](https://github.com/Azure/data-api-builder/pull/3326)) +**Status:** Dry-run mode. Ready for team testing. + +--- + +## How It Works + +### Trigger Events + +| Event | When it fires | +|---|---| +| `pull_request_target: opened` | A new PR is created | +| `pull_request_target: reopened` | A closed PR is reopened | +| `pull_request_target: labeled` | A label is added to a PR | +| `workflow_dispatch` | Manual run from the Actions tab (with dry-run toggle) | + +> A cron schedule (`*/10 * * * *`) is available but currently commented out. + +### Opt-In Mechanism + +A PR must have the **`assign-for-review`** label to be considered. Draft PRs are automatically skipped. + +The label is added **manually** by the PR author when they consider the PR ready for review. Future improvements may automate this (e.g., auto-add on publish, or remove the label requirement entirely and trigger on all non-draft PRs). + +### Reviewer Pool + +| GitHub Handle | Name | Status | +|---|---|---| +| `souvikghosh04` | Souvik | Active | +| `Aniruddh25` | Aniruddh | Active (see note) | +| `aaronburtle` | Aaron | Active | +| `anushakolan` | Anusha | Active | +| `RubenCerna2079` | Ruben | Active | + +> Jerry (`JerryNixon`) has been removed from the pool — PM role, should not review code PRs. + +> **Decision needed:** Ruben raised that Aniruddh already reviews a disproportionate number of PRs. Options: (a) remove Ani from the bot pool so he's not auto-assigned, (b) keep him as equal, or (c) add a reduced-weight mechanism. Team leaning toward removing both Jerry and Ani to keep the pool to 4 engineers (Souvik, Aaron, Anusha, Ruben). + +Each PR gets **2 reviewers** assigned (configurable via `REQUIRED_REVIEWERS`). + +### Weighted Load Balancing + +The bot calculates a *weight* for each PR based on its labels, then distributes reviewers so the total weighted load is balanced across the pool. + +| Label | Weight | +|---|---| +| *(no size label)* | 1 | +| `size-medium` | 2 | +| `size-large` | 3 | +| `priority-high` | +1 (additive) | + +**Examples:** +- A default PR = weight 1 +- A `size-large` + `priority-high` PR = weight 4 + +Size labels are **manual and optional**. If no size label is set, the PR gets a default weight of 1 and the bot still assigns reviewers — it just won't do weighted load balancing. Auto-detection of PR size is a possible future enhancement. + +### Reviewer Lifecycle & Availability + +A reviewer goes through a defined lifecycle per PR assignment. The bot must respect this when deciding who is available. + +#### Active Review State (Blocked) + +A reviewer is considered **actively reviewing** a PR — and therefore **blocked** — from the moment they are assigned until they complete one of the following actions: +- **Approve** the PR +- **Request changes** on the PR +- **Comment** (submit a review with comments) +- **The PR is closed or merged** + +While a reviewer is in the active-review state for a PR, that PR counts toward their load. The bot **prefers unblocked reviewers** (those with no pending active reviews) when selecting candidates. If all reviewers in the pool have active reviews, the bot falls back to round-robin by least total load (see below). + +#### Round-Robin Fallback + +When **every reviewer** in the pool has active assignments, the bot does not stop — it continues assigning using the existing weighted load-balancing logic: +- Candidates are sorted by current weighted load (ascending). +- The reviewer(s) with the **fewest active assignments** are picked first. +- This naturally round-robins through the pool as load accumulates. + +This ensures PRs are never left without reviewers even when the team is busy. + +#### Freeing a Reviewer + +A reviewer's assignment on a PR is **released** (no longer counts toward their load) when any of these occur: +1. The reviewer **submits a review** (approve, request changes, or comment). +2. The PR is **merged**. +3. The PR is **closed**. + +Once freed, the reviewer becomes available for new assignments with a reduced load count. + +#### Re-Review Requests + +If a PR author **requests a re-review** (e.g., after addressing feedback and pushing new changes), the assignment lifecycle **starts from scratch**: +- The bot treats the PR as needing reviewers again. +- The same rules apply — load balancing, author exclusion, active-review checks. +- Previously assigned reviewers may or may not be re-assigned depending on current load. +- This can be triggered by removing and re-adding the `assign-for-review` label, or by a `reopened` event. + +### Assignment Algorithm + +The bot operates in two modes depending on the trigger: + +**Single-PR mode** (`pull_request_target`): Only processes the triggering PR. Early-exits if the PR is ineligible (draft, missing label, stale, or already fully assigned). Still builds the global load map for fair selection. + +**Full-scan mode** (`workflow_dispatch` / schedule): Scans all open, non-stale, eligible PRs and assigns reviewers to those that need them. Larger/higher-priority PRs are processed first. + +In both modes, the core assignment logic per PR is: +1. Exclude the PR author and already-assigned reviewers from candidates. +2. **Prefer unblocked reviewers** (no active reviews) over blocked ones. +3. Within the same availability tier, sort by weighted load (ascending), then alphabetical. +4. Pick the top N candidates needed. +5. Assign (or log in dry-run mode) and update the in-memory load map. + +### Optimizations + +| Optimization | Effect | +|---|---| +| **Single-PR fast path** | `pull_request_target` only assigns the triggering PR instead of scanning all PRs | +| **Early exit** | Skips immediately if trigger PR is ineligible or already fully assigned | +| **Stale PR filter** | PRs not updated in 90 days are excluded from eligibility and load calculation | +| **Sorted fetch** | PRs fetched sorted by `updated_at` desc for cache-friendly access | +| **Conditional `listReviews`** | Only calls `listReviews` for PRs that have pool assignees (skips zero-assignee PRs) | + +### Reviewer Counting + +The bot uses the **Assignees** field (`pr.assignees`) to track who has been bot-assigned to a PR. This avoids the CODEOWNERS problem where `requested_reviewers` is pre-populated with all code owners. + +To determine active vs. freed reviewers, the bot calls `pulls.listReviews` for each eligible PR: +- **Active reviewer:** Assigned to the PR but has **not** submitted a review yet → counts toward load. +- **Freed reviewer:** Has submitted a review (approve, request changes, or comment) → does **not** count toward load. + +Only reviewers from the configured pool are counted. CODEOWNERS or external reviewers are ignored. + +### Dry-Run Mode + +The bot is currently in **dry-run mode** (`DRY_RUN = true`). In this mode it logs `[DRY RUN] Would assign [...]` but does not actually add reviewers. + +To go live, set `DRY_RUN` to `false` in the workflow (or select `false` when triggering manually via `workflow_dispatch`). + +### Concurrency + +The workflow uses a concurrency group (`pr-review-assignment`) with `cancel-in-progress: false` to prevent overlapping runs from producing inconsistent assignments. + +--- + +## Configuration Reference + +| Setting | Location | Default | Description | +|---|---|---|---| +| `REVIEWERS` | Script constant | 5-person pool | GitHub usernames eligible to be assigned | +| `REQUIRED_REVIEWERS` | Script constant | `2` | Number of reviewers per PR | +| `STALE_DAYS` | Script constant | `90` | PRs not updated in this many days are skipped | +| `DRY_RUN` | Script constant / workflow input | `true` | When true, logs only — no actual assignments | +| `assign-for-review` | PR label | — | Label that opts a PR into auto-assignment (removed after assignment when live) | +| Size labels | PR labels | `size-medium`, `size-large` | Increase PR weight for load balancing | +| `priority-high` | PR label | — | Adds +1 to PR weight | + +--- + +## How to Validate (Dry-Run Testing) + +1. Open (or reopen) a PR and add the **`assign-for-review`** label. +2. Go to the **Actions** tab in the repo. +3. Find the latest **"PR Review Assignment Bot"** workflow run. +4. Open the **"Assign reviewers to eligible PRs"** step log. +5. Look for `[DRY RUN] Would assign [...]` lines — these show who would be assigned and to which PR. +6. Verify the assignments look correct (author excluded, load balanced, right reviewer count). + +You can also trigger a manual run from the **Actions** tab → **PR Review Assignment Bot** → **Run workflow** and choose dry-run `true` or `false`. + +--- + +## Changelog + +| Date | Change | +|---|---| +| March 26 | Initial workflow created with `pulls.requestReviewers`, dry-run mode | +| April 8 | **BUG FIX:** CODEOWNERS inflated `requested_reviewers` → switched to `pr.assignees` | +| April 8 | **BUG FIX:** `pulls.requestReviewers` wrote to Reviewers field → switched to `issues.addAssignees` | +| April 8 | **BUG FIX:** Dry-run toggle `\|\|` expression always true → fixed with `context.eventName` check | +| April 8 | Removed Jerry from pool (PM role) | +| April 8 | Added active-review blocking, reviewer-freed logic, round-robin fallback | +| April 8 | Added optimizations: single-PR fast path, early exit, stale PR filter (90 days), conditional `listReviews` | +| April 8 | Paginated `listReviews` to handle PRs with 30+ reviews (review comment fix) | +| April 8 | Auto-remove `assign-for-review` label after successful assignment (review comment fix) | + +--- + +## Open Items + +| # | Item | Owner | Status | +|---|---|---|---| +| 1 | **Decide on Aniruddh's inclusion** — remove to reduce his load, or keep as equal | Team | Pending | +| 2 | **Re-enable workflow & test dry-run for 1 week** | Souvik + Team | Next step | +| 3 | **Flip to live mode** after dry-run validation | Souvik | Blocked on #2 | +| 4 | Consider auto-detecting PR size from line count | Souvik | Future | +| 5 | Consider removing label requirement (auto-assign all non-draft PRs) | Team | Future | + +--- + +## Testing Plan + +1. **Re-enable the workflow** in dry-run mode. +2. **Team adds `assign-for-review` labels** to published PRs for 1 week. +3. **Check logs** — verify `[DRY RUN] Would assign [...]` output: + - Author excluded, 2 reviewers selected, load balanced. + - Stale PRs (>90 days) are skipped. + - Unblocked reviewers preferred over blocked ones. + - Round-robin fallback when everyone has active reviews. + - Re-review (label re-add) triggers fresh assignment. +4. **Flip to live** — set `DRY_RUN` to `false`. +5. **Monitor** for 1-2 weeks, refine as needed. + +--- + +## Team Communication + +### Original Communication (March 26) + +> Hi Team, +> +> We now have a GitHub Actions workflow that automatically assigns reviewers to pull requests. +> +> **How it works:** +> - Scans all open PRs for the `assign-for-review` label — this is the opt-in mechanism. +> - Assigns 2 reviewers per PR from the pool: Souvik, Aniruddh, Aaron, Anusha, Ruben, Jerry. +> - Uses weighted load balancing so reviewers are distributed fairly: +> - `size-medium` = 2x weight, `size-large` = 3x, default = 1x. +> - `priority-high` adds +1 to the weight. +> - Reviewers with the lightest current load are picked first. +> - Larger/higher-priority PRs are assigned first for best reviewer availability. +> - The PR author is automatically excluded from being assigned on their own PR. +> +> **Current state:** The bot is in dry-run mode and has been tested against draft PRs. It logs who would be assigned but doesn't actually add reviewers yet. Once the team is comfortable, we flip DRY_RUN to false to go live. +> +> Thanks, +> Souvik + +### Updated Communication (Draft) + +> **Subject: PR Review Assignment Bot — Ready for Testing** +> +> Hi Team, +> +> The PR review assignment bot has been updated with all the fixes from our demo discussion. Here's the summary: +> +> **What it does:** +> - Scans PRs for the `assign-for-review` label and assigns **2 reviewers** from the pool: Souvik, Aniruddh, Aaron, Anusha, Ruben. +> - Uses **weighted load balancing** — reviewers with the lightest active load are picked first. +> - **Prefers available reviewers** — those not currently reviewing any PR are prioritized. If everyone is busy, it round-robins by least load. +> - Reviewers are **freed** once they submit a review (approve, request changes, or comment). +> - PR author is automatically excluded. Stale PRs (>90 days) are skipped. +> +> **What changed:** +> - Fixed CODEOWNERS bug — assignments now use the **Assignees** field (not Reviewers), so CODEOWNERS no longer interferes. +> - Fixed API — bot now uses `issues.addAssignees` to write to the correct field. +> - Fixed dry-run toggle logic. +> - Added reviewer lifecycle: active-review blocking, freed-on-submit, round-robin fallback. +> - Optimized: single-PR mode for faster `pull_request_target` runs, stale PR filter. +> - Removed Jerry from pool (PM role). +> +> **Next step:** Add the `assign-for-review` label to your published PRs. Check the Actions tab logs to verify the `[DRY RUN]` output looks correct. Once we're confident, we flip to live. +> +> Thanks, +> Souvik diff --git a/.github/workflows/auto-assign-reviewers.yml b/.github/workflows/auto-assign-reviewers.yml new file mode 100644 index 0000000000..970f8587c8 --- /dev/null +++ b/.github/workflows/auto-assign-reviewers.yml @@ -0,0 +1,217 @@ +name: PR Review Assignment Bot + +on: + pull_request_target: + types: [opened, reopened, labeled] + + # TODO: Remove before merging — temporary trigger for testing from PR branch. + push: + branches: + - 'Usr/sogh/prreviewbot' + + workflow_dispatch: + inputs: + dry_run: + description: 'Run in dry-run mode (no actual assignments)' + required: false + default: 'true' + type: choice + options: + - 'true' + - 'false' + + # schedule: + # - cron: "*/10 * * * *" + +permissions: + pull-requests: write + issues: write + contents: read + +concurrency: + group: pr-review-assignment + cancel-in-progress: false + +jobs: + assign-reviewers: + runs-on: ubuntu-latest + steps: + - name: Assign reviewers to eligible PRs + uses: actions/github-script@v7 + with: + script: | + // ── Configuration ── + const REVIEWERS = ["souvikghosh04", "Aniruddh25", "aaronburtle", "anushakolan", "RubenCerna2079"]; + const REQUIRED_REVIEWERS = 2; + const STALE_DAYS = 90; + const DRY_RUN = context.eventName === "workflow_dispatch" + ? '${{ github.event.inputs.dry_run }}' === 'true' + : false; + + const { owner, repo } = context.repo; + const staleCutoff = new Date(); + staleCutoff.setDate(staleCutoff.getDate() - STALE_DAYS); + + // ── Helper functions ── + + function getPoolAssignees(pr) { + return (pr.assignees || []) + .map((a) => a.login) + .filter((a) => REVIEWERS.includes(a)); + } + + function isEligible(pr) { + if (pr.draft) return false; + if (!pr.labels.some((l) => l.name === "assign-for-review")) return false; + if (new Date(pr.updated_at) < staleCutoff) return false; + return true; + } + + function getWeight(pr) { + const labels = pr.labels.map((l) => l.name); + let w = 1; + if (labels.includes("size-medium")) w = 2; + else if (labels.includes("size-large")) w = 3; + if (labels.includes("priority-high")) w += 1; + return w; + } + + async function getActiveReviewers(pr) { + const assigned = getPoolAssignees(pr); + if (assigned.length === 0) return []; + const reviews = await github.paginate(github.rest.pulls.listReviews, { + owner, repo, pull_number: pr.number, per_page: 100, + }); + const submitted = new Set( + reviews.map((r) => r.user.login).filter((r) => REVIEWERS.includes(r)) + ); + return assigned.filter((r) => !submitted.has(r)); + } + + // Assigns reviewers to a single PR, mutating load/activeCount in place. + async function assignReviewers(pr, load, activeCount) { + const author = pr.user.login; + // Exclude the author from assigned count — they can't review their own PR. + const assigned = getPoolAssignees(pr).filter((a) => a !== author); + const needed = REQUIRED_REVIEWERS - assigned.length; + const weight = getWeight(pr); + + if (needed <= 0) { + core.info(`PR #${pr.number} — already has ${assigned.length} pool assignees, skipping.`); + return; + } + + const candidates = REVIEWERS.filter((r) => r !== author && !assigned.includes(r)); + if (candidates.length === 0) { + core.info(`PR #${pr.number} — no candidates after filtering.`); + return; + } + + // Sort: prefer unblocked → then lowest load → then alphabetical. + candidates.sort((a, b) => { + const aBlocked = activeCount[a] > 0 ? 1 : 0; + const bBlocked = activeCount[b] > 0 ? 1 : 0; + if (aBlocked !== bBlocked) return aBlocked - bBlocked; + if (load[a] !== load[b]) return load[a] - load[b]; + return a.localeCompare(b); + }); + + const selected = candidates.slice(0, needed); + core.info(`PR #${pr.number} — weight: ${weight}, assigned: [${assigned}], candidates: ${JSON.stringify(candidates.map((c) => `${c}(active:${activeCount[c]},load:${load[c]})`))}, selected: [${selected}]`); + + if (DRY_RUN) { + core.info(`[DRY RUN] Would assign [${selected}] to PR #${pr.number}`); + } else { + await github.rest.issues.addAssignees({ + owner, repo, issue_number: pr.number, assignees: selected, + }); + core.info(`Assigned [${selected}] to PR #${pr.number}`); + + // Remove the label once reviewers are fully assigned so the bot + // doesn't re-process this PR on subsequent runs. + const newAssignedCount = assigned.length + selected.length; + if (newAssignedCount >= REQUIRED_REVIEWERS) { + await github.rest.issues.removeLabel({ + owner, repo, issue_number: pr.number, name: "assign-for-review", + }).catch((e) => core.warning(`Could not remove label from PR #${pr.number}: ${e.message}`)); + core.info(`Removed 'assign-for-review' label from PR #${pr.number}`); + } + } + + for (const r of selected) { + load[r] += weight; + activeCount[r] += 1; + } + } + + // ── Main logic ── + + // For pull_request_target events, early-exit if the triggering PR isn't eligible. + if (context.eventName === "pull_request_target") { + const triggerPR = context.payload.pull_request; + if (!isEligible(triggerPR)) { + core.info(`Triggering PR #${triggerPR.number} is not eligible (draft, missing label, or stale). Exiting.`); + return; + } + const assigned = getPoolAssignees(triggerPR); + const author = triggerPR.user.login; + if (assigned.filter((a) => a !== author).length >= REQUIRED_REVIEWERS) { + core.info(`Triggering PR #${triggerPR.number} already has ${assigned.length} pool assignees (excl. author). Exiting.`); + return; + } + core.info(`Triggering PR #${triggerPR.number} needs reviewers. Proceeding with load calculation.`); + } + + // Fetch all open PRs (sorted by most recently updated, skip stale). + const allPRs = await github.paginate(github.rest.pulls.list, { + owner, repo, state: "open", sort: "updated", direction: "desc", per_page: 100, + }); + const freshPRs = allPRs.filter((pr) => new Date(pr.updated_at) >= staleCutoff); + const eligiblePRs = freshPRs.filter(isEligible); + + core.info(`Open PRs: ${allPRs.length}, fresh (≤${STALE_DAYS}d): ${freshPRs.length}, eligible: ${eligiblePRs.length}`); + + // Build load map from active reviews across all eligible PRs. + const load = {}; + const activeCount = {}; + for (const r of REVIEWERS) { load[r] = 0; activeCount[r] = 0; } + + for (const pr of eligiblePRs) { + const assigned = getPoolAssignees(pr); + // Only call listReviews if this PR has pool assignees (saves API calls). + if (assigned.length === 0) continue; + const active = await getActiveReviewers(pr); + const weight = getWeight(pr); + for (const r of active) { + load[r] += weight; + activeCount[r] += 1; + } + } + core.info(`Active load: ${JSON.stringify(load)}`); + core.info(`Active counts: ${JSON.stringify(activeCount)}`); + + // Determine which PRs to process. + if (context.eventName === "pull_request_target") { + // Single-PR mode: only assign the triggering PR. + const triggerPR = context.payload.pull_request; + // Re-fetch full PR object to get latest assignees. + const { data: freshPR } = await github.rest.pulls.get({ + owner, repo, pull_number: triggerPR.number, + }); + await assignReviewers(freshPR, load, activeCount); + } else { + // Full-scan mode (workflow_dispatch / schedule): process all eligible PRs that need reviewers. + const needsReviewers = eligiblePRs + .filter((pr) => { + const assigned = getPoolAssignees(pr).filter((a) => a !== pr.user.login); + return assigned.length < REQUIRED_REVIEWERS; + }) + .sort((a, b) => getWeight(b) - getWeight(a)); + + core.info(`PRs needing reviewers: ${needsReviewers.length}`); + for (const pr of needsReviewers) { + await assignReviewers(pr, load, activeCount); + } + } + + core.info("Review assignment complete.");