[WIP] test: add gemini cli skill content preservation test by minzznguyen · Pull Request #401 · GoogleCloudPlatform/evalbench

minzznguyen · 2026-05-21T22:07:37Z

Add a new integration test to verify that skill contents are preserved when copied to the fake home sandbox.

TAG=agy
CONV=a151643e-4527-4bbe-9908-a1f4f33be7e3

Add a new integration test to verify that skill contents are preserved when copied to the fake home sandbox. TAG=agy CONV=a151643e-4527-4bbe-9908-a1f4f33be7e3

IsmailMehdi

Nice instinct adding test coverage for _setup_skills — there's not much for the generator boundary today, and the integration test in particular (test_skill_content_preserved) is a good idea since tmp_path lets you exercise real file ops safely. A few things should be fixed before this lands.

Blocking

Missing sys.path setup — the file likely doesn't import from the repo root. Every other test in evalbench/test/ (e.g. trajectory_matcher_test.py:7, tool_naming_test.py:9) prepends the parent dir to sys.path so from generators.models... resolves:
```
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
```
Without this, pytest evalbench/test/gemini_cli_test.py from the repo root will fail with ModuleNotFoundError: No module named 'generators'. Please add it, or run the test yourself from the repo root (uv run pytest evalbench/test/gemini_cli_test.py) to confirm it imports.
print() statements throughout. There are ~40 prints used as narration. pytest captures stdout by default, so they only appear on failure or with -s, and they make the test body 3× longer than it needs to be. Behavior-level asserts already say what's being checked. Please strip them — for the rare case where a future debugger needs them back, pytest -v plus a good assertion message does the same job.
try/except AssertionError: raise e around mock assertions. pytest's introspection already produces a clear diff for assert_called_once_with. Wrapping in try/except adds noise and raise e (vs bare raise) truncates the traceback. Replace with the bare assertion call.

Medium

Log-message coupling is brittle. mock_logging.info.assert_any_call("Syncing skill: my-single-skill") will break the next time someone tweaks that log line. The mock_copytree.assert_called_once_with(...) assertion above it is the behavioral check that matters — drop the log assertion.
Hardcoded /home/jamesamn in test_setup_single_skill_string. Looks like a leftover from local iteration. Use a generic value like /fake/real_home (or str(tmp_path)) — accidentally fine, but a reviewer notices.

Minor

Inconsistent env-mocking style. Test 1 uses patch.dict(os.environ, {"HOME": ...}), test 2 uses monkeypatch.setenv. Pick one — monkeypatch is the pytest-native idiom and is already in test 2.
test_skill_content_preserved is really a setup-integration test. It does exercise content preservation, but it relies on _setup_npm_auth and _setup not blowing up either — worth a docstring noting the broader scope, or split out a narrower variant that calls _setup_skills directly with a pre-built generator if you want a true unit test of the copy step.
Worth covering at least one of the dict skill_config branches (link/install/enable/disable/uninstall) — they're the lion's share of _setup_skills and currently untested. subprocess.run is already mocked, so adding one is cheap.

The integration approach in test 2 is good — once you trim the prints and add the sys.path fix, that's a solid sandbox for further skill-setup tests.

Cleaned-up version

Here's the file with blocking + medium fixes applied — drop in if it's easier than editing piecemeal:

import os
import sys

import pytest
from unittest.mock import patch, MagicMock

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from generators.models.gemini_cli import GeminiCliGenerator


@patch('generators.models.gemini_cli.logging')
@patch('generators.models.gemini_cli.subprocess.run')
@patch('generators.models.gemini_cli.os.path.exists')
@patch('generators.models.gemini_cli.shutil.copytree')
@patch('generators.models.gemini_cli.shutil.rmtree')
def test_setup_single_skill_string(
    mock_rmtree,
    mock_copytree,
    mock_exists,
    mock_run,
    mock_logging,
):
    """A string entry in setup.skills triggers a copytree from real to fake home."""
    real_home = "/fake/real_home"
    real_skill_path = os.path.join(real_home, ".gemini", "skills", "my-single-skill")

    mock_exists.side_effect = lambda path: path == real_skill_path
    mock_run.return_value = MagicMock(returncode=0, stdout="fake-token")

    config = {"setup": {"skills": ["my-single-skill"]}}

    with patch('generators.models.gemini_cli.os.makedirs'), \
         patch('generators.models.gemini_cli.open', create=True), \
         patch.dict(os.environ, {"HOME": real_home}):
        GeminiCliGenerator(config)

    expected_fake_home = os.path.abspath(os.path.join(".venv", "fake_home"))
    expected_fake_skill_path = os.path.join(
        expected_fake_home, ".gemini", "skills", "my-single-skill"
    )
    mock_copytree.assert_called_once_with(real_skill_path, expected_fake_skill_path)


@patch('generators.models.gemini_cli.subprocess.run')
def test_skill_content_preserved(mock_run, tmp_path, monkeypatch):
    """End-to-end: a skill file's contents survive copy into the fake-home sandbox.

    Exercises the full GeminiCliGenerator.__init__ -> _setup -> _setup_skills
    path with real filesystem operations confined to tmp_path. Only subprocess
    (gcloud auth) is mocked.
    """
    real_home = tmp_path / "real_home"
    real_home.mkdir()

    skill_name = "my-content-skill"
    real_skill_dir = real_home / ".gemini" / "skills" / skill_name
    real_skill_dir.mkdir(parents=True)

    skill_file = real_skill_dir / "secret.txt"
    expected_content = "hello world"
    skill_file.write_text(expected_content)

    mock_run.return_value = MagicMock(returncode=0, stdout="fake-token")

    monkeypatch.chdir(tmp_path)
    monkeypatch.setenv("HOME", str(real_home))

    GeminiCliGenerator({"setup": {"skills": [skill_name]}})

    expected_fake_skill_file = (
        tmp_path / ".venv" / "fake_home" / ".gemini" / "skills" / skill_name / "secret.txt"
    )
    assert expected_fake_skill_file.exists(), (
        f"Copied skill file not found at {expected_fake_skill_file}"
    )
    assert expected_fake_skill_file.read_text() == expected_content

(Also swapped the "password is xyz" sample content for a benign "hello world" so the file doesn't read like a credential.)

IsmailMehdi · 2026-05-21T22:16:39Z

+import os
+import pytest
+from unittest.mock import patch, MagicMock
+from generators.models.gemini_cli import GeminiCliGenerator


Without sys.path.append(...) above this line, pytest from the repo root will fail with ModuleNotFoundError: No module named 'generators'. See other tests in this directory for the pattern.

Suggested change

from generators.models.gemini_cli import GeminiCliGenerator

from unittest.mock import patch, MagicMock

import sys

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from generators.models.gemini_cli import GeminiCliGenerator

IsmailMehdi · 2026-05-21T22:16:39Z

+    """Test that a single skill defined as a string is copied to fake home."""
+    print("\n--- STARTING TEST: test_setup_single_skill_string ---")
+
+    real_home = "/home/jamesamn"


Looks like a leftover from local iteration — please use a generic value.

Suggested change

real_home = "/home/jamesamn"

real_home = "/fake/real_home"

IsmailMehdi · 2026-05-21T22:16:39Z

+    print("[Mock subprocess.run] Configured to simulate successful 'gcloud auth'")
+
+    # 3. Configure CWD and HOME env var to use temp directory
+    monkeypatch.chdir(tmp_path)


Minor: "password is xyz" reads like a credential when grep'd. Suggest a benign placeholder so future scanners don't flag the test file.

Suggested change

monkeypatch.chdir(tmp_path)

expected_content = "hello world"

IsmailMehdi · 2026-05-21T22:18:43Z

Also take a look at the style issues.

test: add gemini cli skill content preservation test

a7cc976

Add a new integration test to verify that skill contents are preserved when copied to the fake home sandbox. TAG=agy CONV=a151643e-4527-4bbe-9908-a1f4f33be7e3

minzznguyen requested a review from IsmailMehdi as a code owner May 21, 2026 22:07

IsmailMehdi requested changes May 21, 2026

View reviewed changes

minzznguyen changed the title ~~test: add gemini cli skill content preservation test~~ [WIP] test: add gemini cli skill content preservation test May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] test: add gemini cli skill content preservation test#401

[WIP] test: add gemini cli skill content preservation test#401
minzznguyen wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
minzznguyen:unit_test_gemini_cli

minzznguyen commented May 21, 2026

Uh oh!

IsmailMehdi left a comment

Uh oh!

IsmailMehdi May 21, 2026

Uh oh!

IsmailMehdi May 21, 2026

Uh oh!

IsmailMehdi May 21, 2026

Uh oh!

IsmailMehdi commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

minzznguyen commented May 21, 2026

Uh oh!

IsmailMehdi left a comment

Choose a reason for hiding this comment

Blocking

Medium

Minor

Cleaned-up version

Uh oh!

IsmailMehdi May 21, 2026

Choose a reason for hiding this comment

Uh oh!

IsmailMehdi May 21, 2026

Choose a reason for hiding this comment

Uh oh!

IsmailMehdi May 21, 2026

Choose a reason for hiding this comment

Uh oh!

IsmailMehdi commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants