Skip to content

SCANPY-248 Auto detect test code#318

Open
guillaume-dequenne wants to merge 4 commits into
masterfrom
test-code
Open

SCANPY-248 Auto detect test code#318
guillaume-dequenne wants to merge 4 commits into
masterfrom
test-code

Conversation

@guillaume-dequenne
Copy link
Copy Markdown
Contributor

No description provided.

@hashicorp-vault-sonar-prod hashicorp-vault-sonar-prod Bot changed the title Auto detect test code SCANPY-248 Auto detect test code May 8, 2026
@hashicorp-vault-sonar-prod
Copy link
Copy Markdown

hashicorp-vault-sonar-prod Bot commented May 8, 2026

SCANPY-248

@sonar-review-alpha
Copy link
Copy Markdown

sonar-review-alpha Bot commented May 8, 2026

Summary

This PR adds automatic test directory detection for Python projects. The feature infers sonar.tests configuration by checking multiple sources in order: pyproject.toml [tool.pytest.ini_options] testpaths, pytest.ini, tox.ini, setup.cfg, and—as a fallback—filesystem conventions (directories named "tests", "test", or "testing"). The detection only runs if sonar.tests hasn't been explicitly set by the user in higher-priority sources (CLI args, environment variables, or config files), avoiding unwanted overrides. This eliminates manual configuration for users following Python testing conventions.

What reviewers should know

Core logic: The python_project_loader.py module implements a chain of loaders that try each config source in order. Each loader returns None if the config file is absent, an empty string if testpaths is declared but empty or invalid, or a comma-separated path string if valid directories are found. Only the filesystem fallback runs if no config file declared testpaths.

Key decision: _existing_paths() filters out non-existent directories and handles absolute paths (including Windows drive-relative paths) carefully. This prevents silent failures where users declare invalid paths.

Configuration integration: In configuration_loader.py, the auto-detection runs last—after all higher-priority sources—and only if SONAR_TESTS is not already set. This respects user intent.

Test coverage: Comprehensive unit tests (test_python_project_loader.py) cover each config source, multiline/list formats, non-existent paths, malformed files, and edge cases. Integration tests (test_auto_detect_tests.py) verify end-to-end functionality with SonarQube, confirming test files are correctly classified.

Minor setup: .gitignore adds .sonar/, and pyproject.toml configures norecursedirs to prevent pytest from scanning test source directories during development.


  • Generate Walkthrough
  • Generate Diagram

🗣️ Give feedback

sonar-review-alpha[bot]

This comment was marked as resolved.

sonar-review-alpha[bot]

This comment was marked as outdated.

sonar-review-alpha[bot]

This comment was marked as resolved.

sonar-review-alpha[bot]

This comment was marked as outdated.

sonar-review-alpha[bot]

This comment was marked as outdated.

sonar-review-alpha[bot]

This comment was marked as outdated.

sonar-review-alpha[bot]

This comment was marked as resolved.

sonar-review-alpha[bot]

This comment was marked as resolved.

sonar-review-alpha[bot]

This comment was marked as resolved.

sonar-review-alpha[bot]

This comment was marked as outdated.

sonar-review-alpha[bot]

This comment was marked as outdated.

@sonarqube-next
Copy link
Copy Markdown

Copy link
Copy Markdown

@sonar-review-alpha sonar-review-alpha Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid implementation overall — the chain-of-responsibility pattern is clean, the test coverage is thorough, and the integration with the existing configuration loader is correct. One real bug needs fixing before merge.

The previously flagged get_project_test_files pagination issue is still open.

🗣️ Give feedback

Comment on lines +41 to +44
if result is None:
continue # file absent or no testpaths key — try next source
if result:
return {SONAR_TESTS: result}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The three-way return distinguishes None (file absent / no key) from "" (key declared, all paths invalid) from a non-empty string (success), but the caller in load() checks if result: which collapses None and "" into the same branch only on the truthy/falsy axis. The logic actually works today because of the preceding if result is None: continue, but the empty-string sentinel is a fragile contract.

More importantly: _existing_paths silently skips paths that are outside the project root (returns them filtered out), so if all declared paths are outside the root, paths is [] and the function falls into the warning + return "" branch — correct. But if _existing_paths raises an unexpected exception (e.g. PermissionError from is_dir()), the outer try/except in _load_from_pyproject_toml catches it and returns None instead of "", letting the chain fall through. This was flagged before and acknowledged, but the sentinel design amplifies the impact — the caller cannot distinguish between the two cases.

No code change required for the acknowledged filesystem I/O issue, but the return "" sentinel deserves a comment explaining why "" and not None (it's used to stop the chain when testpaths was declared but all paths were invalid).

  • Mark as noise

Comment on lines +97 to +100
logging.debug(f"Error reading pyproject.toml for pytest testpaths: {e}")
return None
testpaths = toml_dict.get("tool", {}).get("pytest", {}).get("ini_options", {}).get("testpaths")
if not testpaths:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testpaths value from pyproject.toml can be a string (space-separated) in addition to a list, per pytest docs. The current code handles that via testpaths.split() when it's not a list. However, if testpaths is an integer or another scalar type in the TOML (e.g. testpaths = 1), str(p).strip() will coerce it silently rather than skipping it, and _existing_paths will then look for a directory named "1". This is a minor edge case but worth guarding:

if isinstance(testpaths, list):
    raw = [str(p) for p in testpaths if str(p).strip()]
elif isinstance(testpaths, str):
    raw = [p for p in testpaths.split() if p]
else:
    logging.debug(f"Unexpected type for testpaths in pyproject.toml: {type(testpaths)}")
    return None
  • Mark as noise

resp.raise_for_status()
return resp.json().get("task", {})

def wait_for_ce_task_by_id(self, task_id: str) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic duplication: wait_for_ce_task_by_id polls with MAX_RETRIES * 10 iterations and time.sleep(2) between them. The existing wait_for_analysis_completion method (not shown in the diff but referenced in the fallback path) almost certainly uses a similar polling loop. If the retry budget, sleep interval, or terminal-state set ever needs to change, both loops must be updated. Extract the shared polling logic into a private _poll_until helper, or at minimum define the sleep duration and terminal states as class-level constants so both methods share them.

  • Mark as noise

Copy link
Copy Markdown
Contributor

@Seppli11 Seppli11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the cahnge looks good. It's really cool to see that we're finally parsing settings from the pyproject.toml file and co. (Hopefully the python version is next 🤞 )

One thing I'm not sure about is if it makes sense to also have a heuristic based on the folders in the scanner. It seems to create quite a few strange places where, IMO, the outcome isn't that straight forward.

The other, more nitpicky thing, python_project_loader doesn't seem like an appropriate name for this module, given how it is used and how closely tight it usage is to the present of the test property.

Let me know if you want to discuss anything

# Running python_project_loader unconditionally would emit confusing warnings about
# pytest config even when the result would be discarded.
if SONAR_TESTS not in resolved_properties:
resolved_properties.update(python_project_loader.load(base_dir))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a more holistic view, applying the properties from the python_project_loader only if sonar.tests is not set doesn't really make sense to me. If we for example extend the python_project_loader to also detect the python version, things become weird.

Maybe it could make sense renaming the module to python_project_test_source_loader or similar (maybe you find a catchier name 😅 )

return _load_from_ini_file(base_dir, "setup.cfg", _SETUP_CFG_PYTEST_SECTION)


def _load_from_filesystem(base_dir: pathlib.Path) -> Optional[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add this heuristic here as well, given that sonar-python itself has a more extensive set of heuristics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the heuristic from sonar-python is a bit of a crutch in the sense that it will only disable rules and still report metrics based on the FileType, because we can't contradict the scanner for the file type classification.
This will reduce noise but ultimately still produce an "inconsistent" analysis.

Having it here makes the analysis more consistent because the rule execution is in sync with the metrics reporting. It's an argument to use pysonar instead of the generic sonar-scanner-cli because here, the manual setup of sonar.tests is truly optional (assuming what is inferred matches the user expectations).

self._wait_for_ce_completion(workdir)
return process

def _wait_for_ce_completion(self, workdir: pathlib.Path) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does a lot of automatic checks and fallbacks. Is this really necessary for a test?

resp.raise_for_status()
return resp.json()

def get_latest_ce_task(self) -> Optional[dict]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be never called. Same applies to search_projects and get_project_measures

testpaths = []
""",
)
result = python_project_loader.load(Path("."))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the heuristic, this could lead to a weird behavior where the pyrpoject.tol loader returned an empty set, while the file heuristic loader found a folder. This would now override what the user set

self.fs.create_dir("tests")
# nonexistent/ is not on disk — only tests/ should be returned
result = python_project_loader.load(Path("."))
self.assertEqual(result[SONAR_TESTS], "tests")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar issue could appear here, where for whatever reason, all the user specified tests are not valid paths. If the heuristic picks them up, we're overriding what the user explicitly told us.

def test_load_from_pytest_ini(self):
self.fs.create_file(
"pytest.ini",
contents="[pytest]\ntestpaths = tests integration\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: multiline strings would be nicer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants