Skip to content
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5ddd48f
PR review bot
souvikghosh04 Mar 26, 2026
5ff5538
updated reviewers
souvikghosh04 Mar 26, 2026
d64bb42
draft PR inclusion
souvikghosh04 Mar 26, 2026
12105c5
test draft PR for review bot
souvikghosh04 Mar 26, 2026
8c11874
copilot review fixes
souvikghosh04 Mar 27, 2026
7e2463c
Merge branch 'main' into Usr/sogh/prreviewbot
souvikghosh04 Mar 27, 2026
d085f1f
Merge branch 'main' into Usr/sogh/prreviewbot
Aniruddh25 Mar 30, 2026
9cfc781
added fixes and PR review suggesstions
souvikghosh04 Mar 31, 2026
0ec4dfc
Merge branch 'main' of https://github.com/Azure/data-api-builder into…
souvikghosh04 Apr 3, 2026
77733d6
Fixed review comments
souvikghosh04 Apr 3, 2026
194e274
Fixes and improvements
souvikghosh04 Apr 8, 2026
b169d89
Address review comments: paginate listReviews, strip label after assi…
souvikghosh04 Apr 8, 2026
d2fe2f3
Address review comments + add temp push trigger for testing
souvikghosh04 Apr 8, 2026
d74bcae
Merge branch 'main' into Usr/sogh/prreviewbot
souvikghosh04 Apr 8, 2026
9fb3069
Merge branch 'Usr/sogh/prreviewbot' of https://github.com/Azure/data-…
souvikghosh04 Apr 8, 2026
f9d9723
Fix: exclude PR author from assigned count — author cannot review the…
souvikghosh04 Apr 8, 2026
f700c93
Fix: remove duplicate author declaration
souvikghosh04 Apr 8, 2026
ac432e6
Turn off dry-run for live testing
souvikghosh04 Apr 8, 2026
97ec29a
Remove GHCP inst file and TODO section
souvikghosh04 Apr 13, 2026
1ca8743
remove GHCP inst file
souvikghosh04 Apr 13, 2026
7b206d6
Merge branch 'main' into Usr/sogh/prreviewbot
souvikghosh04 Apr 13, 2026
a43a208
Merge branch 'main' into Usr/sogh/prreviewbot
Aniruddh25 Apr 13, 2026
7973eef
Set DRY_RUN to false for pull_request_target events
souvikghosh04 Apr 14, 2026
2972316
Randomize assignees instead of fixed/alphabetical ordering
souvikghosh04 Apr 14, 2026
ca3d956
Merge branch 'main' into Usr/sogh/prreviewbot
aaronburtle Apr 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
269 changes: 269 additions & 0 deletions .github/pr-review-bot-README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
# PR Review Assignment Bot

Automated GitHub Actions workflow that assigns reviewers to pull requests using weighted load balancing.

**Workflow file:** [workflows/auto-assign-reviewers.yml](workflows/auto-assign-reviewers.yml)
**Branch:** `Usr/sogh/prreviewbot` (PR [#3326](https://github.com/Azure/data-api-builder/pull/3326))
**Status:** Dry-run mode. Ready for team testing.

---

## How It Works

### Trigger Events

| Event | When it fires |
|---|---|
| `pull_request_target: opened` | A new PR is created |
| `pull_request_target: reopened` | A closed PR is reopened |
| `pull_request_target: labeled` | A label is added to a PR |
| `workflow_dispatch` | Manual run from the Actions tab (with dry-run toggle) |

> A cron schedule (`*/10 * * * *`) is available but currently commented out.

### Opt-In Mechanism

A PR must have the **`assign-for-review`** label to be considered. Draft PRs are automatically skipped.

The label is added **manually** by the PR author when they consider the PR ready for review. Future improvements may automate this (e.g., auto-add on publish, or remove the label requirement entirely and trigger on all non-draft PRs).

### Reviewer Pool

| GitHub Handle | Name | Status |
|---|---|---|
| `souvikghosh04` | Souvik | Active |
| `Aniruddh25` | Aniruddh | Active (see note) |
| `aaronburtle` | Aaron | Active |
| `anushakolan` | Anusha | Active |
| `RubenCerna2079` | Ruben | Active |

> Jerry (`JerryNixon`) has been removed from the pool — PM role, should not review code PRs.

> **Decision needed:** Ruben raised that Aniruddh already reviews a disproportionate number of PRs. Options: (a) remove Ani from the bot pool so he's not auto-assigned, (b) keep him as equal, or (c) add a reduced-weight mechanism. Team leaning toward removing both Jerry and Ani to keep the pool to 4 engineers (Souvik, Aaron, Anusha, Ruben).

Each PR gets **2 reviewers** assigned (configurable via `REQUIRED_REVIEWERS`).

### Weighted Load Balancing

The bot calculates a *weight* for each PR based on its labels, then distributes reviewers so the total weighted load is balanced across the pool.

| Label | Weight |
|---|---|
| *(no size label)* | 1 |
| `size-medium` | 2 |
| `size-large` | 3 |
| `priority-high` | +1 (additive) |

**Examples:**
- A default PR = weight 1
- A `size-large` + `priority-high` PR = weight 4

Size labels are **manual and optional**. If no size label is set, the PR gets a default weight of 1 and the bot still assigns reviewers — it just won't do weighted load balancing. Auto-detection of PR size is a possible future enhancement.

### Reviewer Lifecycle & Availability

A reviewer goes through a defined lifecycle per PR assignment. The bot must respect this when deciding who is available.

#### Active Review State (Blocked)

A reviewer is considered **actively reviewing** a PR — and therefore **blocked** — from the moment they are assigned until they complete one of the following actions:
- **Approve** the PR
- **Request changes** on the PR
- **Comment** (submit a review with comments)
- **The PR is closed or merged**

While a reviewer is in the active-review state for a PR, that PR counts toward their load. The bot **prefers unblocked reviewers** (those with no pending active reviews) when selecting candidates. If all reviewers in the pool have active reviews, the bot falls back to round-robin by least total load (see below).

#### Round-Robin Fallback

When **every reviewer** in the pool has active assignments, the bot does not stop — it continues assigning using the existing weighted load-balancing logic:
- Candidates are sorted by current weighted load (ascending).
- The reviewer(s) with the **fewest active assignments** are picked first.
- This naturally round-robins through the pool as load accumulates.

This ensures PRs are never left without reviewers even when the team is busy.

#### Freeing a Reviewer

A reviewer's assignment on a PR is **released** (no longer counts toward their load) when any of these occur:
1. The reviewer **submits a review** (approve, request changes, or comment).
2. The PR is **merged**.
3. The PR is **closed**.

Once freed, the reviewer becomes available for new assignments with a reduced load count.

#### Re-Review Requests

If a PR author **requests a re-review** (e.g., after addressing feedback and pushing new changes), the assignment lifecycle **starts from scratch**:
- The bot treats the PR as needing reviewers again.
- The same rules apply — load balancing, author exclusion, active-review checks.
- Previously assigned reviewers may or may not be re-assigned depending on current load.
- This can be triggered by removing and re-adding the `assign-for-review` label, or by a `reopened` event.

### Assignment Algorithm

The bot operates in two modes depending on the trigger:

**Single-PR mode** (`pull_request_target`): Only processes the triggering PR. Early-exits if the PR is ineligible (draft, missing label, stale, or already fully assigned). Still builds the global load map for fair selection.

**Full-scan mode** (`workflow_dispatch` / schedule): Scans all open, non-stale, eligible PRs and assigns reviewers to those that need them. Larger/higher-priority PRs are processed first.

In both modes, the core assignment logic per PR is:
1. Exclude the PR author and already-assigned reviewers from candidates.
2. **Prefer unblocked reviewers** (no active reviews) over blocked ones.
3. Within the same availability tier, sort by weighted load (ascending), then alphabetical.
4. Pick the top N candidates needed.
5. Assign (or log in dry-run mode) and update the in-memory load map.

### Optimizations

| Optimization | Effect |
|---|---|
| **Single-PR fast path** | `pull_request_target` only assigns the triggering PR instead of scanning all PRs |
| **Early exit** | Skips immediately if trigger PR is ineligible or already fully assigned |
| **Stale PR filter** | PRs not updated in 90 days are excluded from eligibility and load calculation |
| **Sorted fetch** | PRs fetched sorted by `updated_at` desc for cache-friendly access |
| **Conditional `listReviews`** | Only calls `listReviews` for PRs that have pool assignees (skips zero-assignee PRs) |

### Reviewer Counting

The bot uses the **Assignees** field (`pr.assignees`) to track who has been bot-assigned to a PR. This avoids the CODEOWNERS problem where `requested_reviewers` is pre-populated with all code owners.

To determine active vs. freed reviewers, the bot calls `pulls.listReviews` for each eligible PR:
- **Active reviewer:** Assigned to the PR but has **not** submitted a review yet → counts toward load.
- **Freed reviewer:** Has submitted a review (approve, request changes, or comment) → does **not** count toward load.

Only reviewers from the configured pool are counted. CODEOWNERS or external reviewers are ignored.

### Dry-Run Mode

The bot is currently in **dry-run mode** (`DRY_RUN = true`). In this mode it logs `[DRY RUN] Would assign [...]` but does not actually add reviewers.

To go live, set `DRY_RUN` to `false` in the workflow (or select `false` when triggering manually via `workflow_dispatch`).

### Concurrency

The workflow uses a concurrency group (`pr-review-assignment`) with `cancel-in-progress: false` to prevent overlapping runs from producing inconsistent assignments.

---

## Configuration Reference

| Setting | Location | Default | Description |
|---|---|---|---|
| `REVIEWERS` | Script constant | 5-person pool | GitHub usernames eligible to be assigned |
| `REQUIRED_REVIEWERS` | Script constant | `2` | Number of reviewers per PR |
| `STALE_DAYS` | Script constant | `90` | PRs not updated in this many days are skipped |
| `DRY_RUN` | Script constant / workflow input | `true` | When true, logs only — no actual assignments |
| `assign-for-review` | PR label | — | Label that opts a PR into auto-assignment (removed after assignment when live) |
| Size labels | PR labels | `size-medium`, `size-large` | Increase PR weight for load balancing |
| `priority-high` | PR label | — | Adds +1 to PR weight |

---

## How to Validate (Dry-Run Testing)

1. Open (or reopen) a PR and add the **`assign-for-review`** label.
2. Go to the **Actions** tab in the repo.
3. Find the latest **"PR Review Assignment Bot"** workflow run.
4. Open the **"Assign reviewers to eligible PRs"** step log.
5. Look for `[DRY RUN] Would assign [...]` lines — these show who would be assigned and to which PR.
6. Verify the assignments look correct (author excluded, load balanced, right reviewer count).

You can also trigger a manual run from the **Actions** tab → **PR Review Assignment Bot** → **Run workflow** and choose dry-run `true` or `false`.

---

## Changelog
Comment thread
souvikghosh04 marked this conversation as resolved.
Outdated

| Date | Change |
|---|---|
| March 26 | Initial workflow created with `pulls.requestReviewers`, dry-run mode |
| April 8 | **BUG FIX:** CODEOWNERS inflated `requested_reviewers` → switched to `pr.assignees` |
| April 8 | **BUG FIX:** `pulls.requestReviewers` wrote to Reviewers field → switched to `issues.addAssignees` |
| April 8 | **BUG FIX:** Dry-run toggle `\|\|` expression always true → fixed with `context.eventName` check |
| April 8 | Removed Jerry from pool (PM role) |
| April 8 | Added active-review blocking, reviewer-freed logic, round-robin fallback |
| April 8 | Added optimizations: single-PR fast path, early exit, stale PR filter (90 days), conditional `listReviews` |
| April 8 | Paginated `listReviews` to handle PRs with 30+ reviews (review comment fix) |
| April 8 | Auto-remove `assign-for-review` label after successful assignment (review comment fix) |

---

## Open Items
Comment thread
souvikghosh04 marked this conversation as resolved.
Outdated

| # | Item | Owner | Status |
|---|---|---|---|
| 1 | **Decide on Aniruddh's inclusion** — remove to reduce his load, or keep as equal | Team | Pending |
Comment thread
souvikghosh04 marked this conversation as resolved.
Outdated
| 2 | **Re-enable workflow & test dry-run for 1 week** | Souvik + Team | Next step |
| 3 | **Flip to live mode** after dry-run validation | Souvik | Blocked on #2 |
| 4 | Consider auto-detecting PR size from line count | Souvik | Future |
| 5 | Consider removing label requirement (auto-assign all non-draft PRs) | Team | Future |

---

## Testing Plan

1. **Re-enable the workflow** in dry-run mode.
2. **Team adds `assign-for-review` labels** to published PRs for 1 week.
3. **Check logs** — verify `[DRY RUN] Would assign [...]` output:
- Author excluded, 2 reviewers selected, load balanced.
- Stale PRs (>90 days) are skipped.
- Unblocked reviewers preferred over blocked ones.
- Round-robin fallback when everyone has active reviews.
- Re-review (label re-add) triggers fresh assignment.
4. **Flip to live** — set `DRY_RUN` to `false`.
5. **Monitor** for 1-2 weeks, refine as needed.

---

## Team Communication
Comment thread
souvikghosh04 marked this conversation as resolved.
Outdated

### Original Communication (March 26)

> Hi Team,
>
> We now have a GitHub Actions workflow that automatically assigns reviewers to pull requests.
>
> **How it works:**
> - Scans all open PRs for the `assign-for-review` label — this is the opt-in mechanism.
> - Assigns 2 reviewers per PR from the pool: Souvik, Aniruddh, Aaron, Anusha, Ruben, Jerry.
> - Uses weighted load balancing so reviewers are distributed fairly:
> - `size-medium` = 2x weight, `size-large` = 3x, default = 1x.
> - `priority-high` adds +1 to the weight.
> - Reviewers with the lightest current load are picked first.
> - Larger/higher-priority PRs are assigned first for best reviewer availability.
> - The PR author is automatically excluded from being assigned on their own PR.
>
> **Current state:** The bot is in dry-run mode and has been tested against draft PRs. It logs who would be assigned but doesn't actually add reviewers yet. Once the team is comfortable, we flip DRY_RUN to false to go live.
>
> Thanks,
> Souvik

### Updated Communication (Draft)

> **Subject: PR Review Assignment Bot — Ready for Testing**
>
> Hi Team,
>
> The PR review assignment bot has been updated with all the fixes from our demo discussion. Here's the summary:
>
> **What it does:**
> - Scans PRs for the `assign-for-review` label and assigns **2 reviewers** from the pool: Souvik, Aniruddh, Aaron, Anusha, Ruben.
> - Uses **weighted load balancing** — reviewers with the lightest active load are picked first.
> - **Prefers available reviewers** — those not currently reviewing any PR are prioritized. If everyone is busy, it round-robins by least load.
> - Reviewers are **freed** once they submit a review (approve, request changes, or comment).
> - PR author is automatically excluded. Stale PRs (>90 days) are skipped.
>
> **What changed:**
> - Fixed CODEOWNERS bug — assignments now use the **Assignees** field (not Reviewers), so CODEOWNERS no longer interferes.
> - Fixed API — bot now uses `issues.addAssignees` to write to the correct field.
> - Fixed dry-run toggle logic.
> - Added reviewer lifecycle: active-review blocking, freed-on-submit, round-robin fallback.
> - Optimized: single-PR mode for faster `pull_request_target` runs, stale PR filter.
> - Removed Jerry from pool (PM role).
>
> **Next step:** Add the `assign-for-review` label to your published PRs. Check the Actions tab logs to verify the `[DRY RUN]` output looks correct. Once we're confident, we flip to live.
>
> Thanks,
> Souvik
Loading
Loading