fix: incremental PR review, auto-approve, and bot operational improvements#263
fix: incremental PR review, auto-approve, and bot operational improvements#263sudsali wants to merge 2 commits into
Conversation
| head_sha = item.get("head", {}).get("sha", "") | ||
| pr_files = gh.get_pr_files(number) | ||
| full_sources = "" | ||
| for pf in pr_files: |
There was a problem hiding this comment.
BUG: item.get("head", {}).get("sha", "") extracts head_sha from the PR object, but head_sha is used later (line ~240) in gh.get_ci_status(head_sha) and in the full-source-file fetch loop. However, the PR object is fetched via gh.get_pr(number) and stored in item. If the PR API response doesn't include a head key (e.g., the test test_missing_head_sha_falls_back_gracefully simulates this), head_sha will be "" (falsy). Then gh.get_ci_status(head_sha) is skipped (guarded by if head_sha), but the full_sources loop calls gh.get_file_content(fname) without a ref — fetching from the default branch, not the PR head. This means the <full_source_files> context sent to the model will contain default-branch content, not the PR's version of the files. While the test acknowledges this as a fallback, it's a silent data-correctness issue that could lead to false-positive or false-negative reviews. Consider logging a warning when head_sha is empty.
| @@ -172,14 +213,50 @@ def analyze(): | |||
| inline_comments = pr_result.get("comments", []) | |||
There was a problem hiding this comment.
BUG: When _parse_file_review_multi(raw) is used as fallback (JSON decode fails), the returned dicts won't contain severity or evidence keys. The subsequent NIT filter (line ~225) checks c.get("severity", "").upper() != "NIT" which will pass all through (empty string != "NIT"), and the formatting loop (line ~232) will produce prefix = "" and suffix = "". This is acceptable degradation, but the incremental file filter (line ~219) relies on c.get("file", "") which _parse_file_review_multi does populate. So this path is safe but produces unstructured comments without severity labels — worth a comment or log warning.
| existing_feedback = _format_pr_feedback(comments_data, review_comments) | ||
|
|
||
| # Incremental review: on synchronize, compute what changed since last push | ||
| incremental_diff = "" |
There was a problem hiding this comment.
EDGE_CASE: is_pr_update is used to determine whether this is a re-review (for NIT filtering), but it's not defined in the visible diff context. Based on the existing code pattern, it's likely derived from event_action == 'synchronize'. However, if a PR is opened with event_action == 'opened' but cfg.event_before and cfg.event_after are both set (e.g., from a misconfigured workflow), the incremental diff will be fetched and used for file filtering, but NITs won't be filtered because is_pr_update is False. This inconsistency could lead to stale comments being filtered by file but NITs still being posted. Verify that is_pr_update is correctly derived.
| passed: True (all green), False (something failed), None (pending/unknown).""" | ||
| status = self._get(f"/repos/{self._repo}/commits/{sha}/status") | ||
| if status is None: | ||
| return None, "CI status unavailable" |
There was a problem hiding this comment.
EDGE_CASE: get_ci_status calls self._get(...) for check-runs on line after the status call. If the first _get (commit status) succeeds but the second _get (check-runs) returns None, check_data.get("check_runs", []) will raise AttributeError because None.get(...) fails. The code has check_data.get(...) if check_data else [] which handles this, but the variable runs assignment uses a ternary that's easy to miss. This is actually correct — just noting it's handled.
|
|
||
| def _is_own_check(name): | ||
| lower = name.lower() | ||
| return "bot" in lower and ("analyze" in lower or "/ act" in lower) |
There was a problem hiding this comment.
DESIGN: The _is_own_check heuristic filters checks where the name contains both 'bot' AND ('analyze' OR '/ act'). However, the check "robot-tests" in the test test_non_bot_check_with_bot_in_name_not_filtered passes because it contains 'bot' but not 'analyze' or '/ act'. But a check named e.g. "my-bot-analyzer" would be incorrectly filtered out. The heuristic is fragile — consider matching against the exact workflow names used by this repo.
| _sm_client = boto3.client("secretsmanager") | ||
| return _sm_client | ||
|
|
||
|
|
There was a problem hiding this comment.
DESIGN: The _sm_client is a module-level global initialized lazily. If the AWS region is not configured or credentials are missing, boto3.client("secretsmanager") will succeed (it's lazy) but the subsequent get_secret_value call will fail. The error is caught and logged, returning empty string. However, this means if Secrets Manager is misconfigured, the bot will silently proceed with empty prompts, likely producing garbage or no output. Consider failing fast (or at least logging at ERROR level with a clear message) when the prompt is empty after SM lookup.
| self.enable_repo_search = os.getenv("ENABLE_REPO_SEARCH", "true").lower() == "true" | ||
|
|
||
| self.upstream_repo = os.getenv("UPSTREAM_REPO", "awslabs/python-deequ") | ||
| self.upstream_repo = os.getenv("UPSTREAM_REPO", "awslabs/deequ") |
There was a problem hiding this comment.
DESIGN: upstream_repo changed from "awslabs/python-deequ" to "awslabs/deequ". This means when the bot searches for upstream code references (via _read_requested_files), it will look in the Scala Deequ repo instead of the Python one. For a Python project bot, this seems intentional (to reference Deequ's Scala source), but the _read_requested_files function on line 684 of main.py now wraps content in ```scala code blocks. If the bot is reviewing Python code and references Python files from the upstream, they'll be incorrectly syntax-highlighted as Scala.
|
|
||
| def test_no_sm_env_var_returns_empty(self, monkeypatch): | ||
| monkeypatch.setenv("PR_FILE_REVIEW_PROMPT", "") | ||
| monkeypatch.setenv("SM_PR_FILE_REVIEW_PROMPT", "") |
There was a problem hiding this comment.
BUG: In test_empty_env_var_falls_through_to_sm, the test patches issue_bot.prompts._read_from_sm but then calls get_pr_file_review_prompt(). Since Python modules cache imports, if get_pr_file_review_prompt was already imported/called in a previous test in the same process, the function's behavior depends on import order. More critically, the _get_prompt function reads os.getenv(env_var, "") — if PR_FILE_REVIEW_PROMPT is set to empty string via monkeypatch.setenv, os.getenv returns "" which is falsy, so it falls through to SM. This is correct. However, the mock patches _read_from_sm at module level but the function is called within _get_prompt — this should work since it's the same module reference.
| jobs: | ||
| approve: | ||
| runs-on: ubuntu-latest | ||
| if: github.event.workflow_run.conclusion == 'success' && github.event.workflow_run.event == 'pull_request' |
There was a problem hiding this comment.
BUG: The workflow_run trigger references workflows: [".github/workflows/base.yml"] but this should be the workflow name (the name: field in base.yml), not the file path. GitHub Actions matches on the workflow name string, not the filename. If base.yml has name: CI then this should be workflows: ["CI"]. As written, this will never trigger unless base.yml literally has name: .github/workflows/base.yml.
6e0fda0 to
909a89d
Compare
| head_sha = item.get("head", {}).get("sha", "") | ||
| pr_files = gh.get_pr_files(number) | ||
| full_sources = "" | ||
| for pf in pr_files: |
There was a problem hiding this comment.
BUG: item.get("head", {}).get("sha", "") extracts head_sha from the PR object stored in item. However, looking at the code flow, item is the result of gh.get_pr(number). The PR object from GitHub's API always has head.sha, but if it doesn't (as tested in test_missing_head_sha_falls_back_gracefully), the get_ci_status call is skipped but get_file_content is called without a ref, fetching default-branch content instead of PR head content. This silently provides wrong context to the model. At minimum, log a warning when head_sha is empty.
| # Incremental review: on synchronize, compute what changed since last push | ||
| incremental_diff = "" | ||
| incremental_files = set() | ||
| if is_pr_update and cfg.event_before and cfg.event_after: |
There was a problem hiding this comment.
BUG: The condition if is_pr_update and cfg.event_before and cfg.event_after: guards the incremental diff fetch. However, the NIT filter on line ~225 uses if is_pr_update and inline_comments: independently. If event_action == 'synchronize' but cfg.event_before is empty (e.g., workflow misconfiguration), is_pr_update is True but no incremental diff is fetched, so incremental_files stays empty and no file filtering happens. Yet NITs are still filtered. This means on a force-push where compare fails, NITs are dropped even though it's effectively a full review. The is_pr_update check for NIT filtering should be gated on whether an incremental review actually happened (i.e., if incremental_diff and inline_comments: or similar).
| @@ -134,31 +190,34 @@ def post_pr_review(self, number, summary, inline_comments): | |||
| else: | |||
There was a problem hiding this comment.
BUG: When valid_comments is empty but invalid_comments is non-empty, the payload has "event": event (e.g., "COMMENT") but no "comments" key. The GitHub Reviews API requires that if you submit a review with event=COMMENT and no inline comments, the body must be non-empty. This works because body will contain the summary + invalid comments text. However, if summary is empty string and there are no invalid_comments either (all comments valid but empty list edge), the body could be empty, which GitHub rejects with 422. Consider adding a guard: if body is empty and no comments, skip the API call.
| @@ -172,14 +213,50 @@ def analyze(): | |||
| inline_comments = pr_result.get("comments", []) | |||
There was a problem hiding this comment.
EDGE_CASE: When _parse_file_review_multi(raw) is used as fallback (JSON decode fails), the returned dicts won't have severity or evidence keys. The formatting loop (line ~232) will produce empty prefix/suffix which is fine, but the incremental file filter and NIT filter will silently pass everything through. This is acceptable degradation but should be logged as a warning so operators know the model returned non-JSON.
| workflow_run: | ||
| workflows: [".github/workflows/base.yml"] | ||
| types: [completed] | ||
|
|
There was a problem hiding this comment.
BUG: The workflows field in workflow_run trigger must match the workflow name (the name: key in the YAML), not the file path. ".github/workflows/base.yml" is a file path, not a workflow name. This workflow will never trigger unless base.yml has name: .github/workflows/base.yml. Change this to the actual name: value from base.yml (e.g., "CI" or whatever it's named).
|
|
||
| self.upstream_repo = os.getenv("UPSTREAM_REPO", "awslabs/python-deequ") | ||
| self.codebase_src_dir = os.getenv("CODEBASE_SRC_DIR", "pydeequ") | ||
| self.codebase_file_ext = os.getenv("CODEBASE_FILE_EXT", ".py") |
There was a problem hiding this comment.
DESIGN: max_context_chars was increased from 200,000 to 800,000. Combined with the new full_sources section (up to 3MB budget on line 175 of main.py), the system prompt could easily exceed Bedrock/Claude's context window limits or cause excessive token costs. The 800K char limit (~200K tokens) may exceed model context for some Bedrock model IDs. Consider validating that len(system_prompt) <= max_context_chars before invoking Bedrock, or at least truncating full_sources to fit within the budget.
| _sm_client = boto3.client("secretsmanager") | ||
| return _sm_client | ||
|
|
||
|
|
There was a problem hiding this comment.
DESIGN: boto3.client("secretsmanager") is created without specifying a region. In CI environments (GitHub Actions), if AWS_DEFAULT_REGION or AWS_REGION is not set, this will use the default region from the SDK (us-east-1) or fail. Since the workflow doesn't explicitly set a region env var, this could silently read from the wrong region or fail. Consider making the region configurable or documenting the requirement.
Summary
Replicates bot improvements from awslabs/deequ#710 to python-deequ.
Changes:
CODEBASE_SRC_DIR=pydeequ,CODEBASE_FILE_EXT=.py)Context
Motivated by python-deequ PR #248 where the bot posted 12 reviews, re-raised resolved issues, and exhausted the contributor.
Test plan
python -m pytest tests/test_bot.py --noconftest)workflow_dispatchon a test PR