feat(scripts): Add dependency version scanner tool by chalmerlowe · Pull Request #16867 · googleapis/google-cloud-python

chalmerlowe · 2026-04-29T12:30:39Z

This adds a utility with the ability to scan for common references to dependencies (Python runtimes and package dependencies) to facilitate updating code when runtimes and dependencies change.

It can be run against an entire repo OR against specific packages within a monorepo
It is customizable with regex patterns and examples here
The test suite checks each regex against the examples to ensure the efficacy of the patterns
The current patterns account for edge cases such as finding < 3.8 when searching for references to 3.7 since they are semantically equivalent even if syntactically different.
The scanner produces a CSV report with:

path/filename, package name, line number, matching pattern, full line for context, etc.

gemini-code-assist

Code Review

This pull request introduces a new dependency version scanner, including a configuration-driven regex scanner, a benchmarking tool, and comprehensive unit and integration tests. The review feedback highlights several areas for improvement: optimizing regex compilation in the scanner to avoid performance bottlenecks, using the tempfile module in the benchmark script to prevent race conditions, removing redundant code, improving test robustness by checking subprocess exit codes, and adhering to PEP 8 by moving imports to the top of files.

…d tests

…e and add tests

…ication

…lines

…changelog.md

…boundaries for explicit_version_string

…kup logic

… to .scannerignore

daniel-sanche · 2026-05-08T22:01:21Z

@@ -0,0 +1,34 @@
+import csv


It looks like the copytight header is missing (applies to all code files)

Added a license header to code files.
Per convention, did not include license header in:

config files such as .gitignore, requirements.txt

OR data files used for testing

daniel-sanche · 2026-05-08T22:05:29Z

+Run the script from the repository root:
+
+```bash
+python3 scripts/version_scanner/version_scanner.py -d <dependency> -v <version> [options]


When I ran this, I gt a ModuleNotFound error. is there a requirements.txt or anything that captures the dependencies?

Added requirements.txt

daniel-sanche · 2026-05-08T22:07:18Z

+This plan outlines the approach to update Python packages to drop support for end-of-life Python runtimes (3.7, 3.8, 3.9) OR for deprecated dependencies, and ensure the packages are configured for modern Python.
+
+#### High-Level Strategy
+- **One Branch Per Package**: To keep PRs manageable and isolated, we suggest a dedicated worktree and branch for each package (e.g., `feat/drop-<dependency>-<version>-<package-name>` i.e. `feat/drop-protobuf-4.25.8-google-cloud-bigquery`).


This is only for hand-written packages, right? I assume others would get their updates through the generator?

Should we recommend doing a generator update first, to clean up most of the packages?

There is a note to the effect that the if the templates in the gapic-generator are update, then the changes will trickle down to generated packages. This not is in the README.md in the vicinity of lines 34 - 38.

daniel-sanche · 2026-05-08T22:12:22Z

@@ -0,0 +1,5 @@
+packages/google-cloud-access-context-manager


what is this?

I don't know if you are asking about google-cloud-access-context-manager or about the file.

The file small_package_list.txt is a way to present a list of packages to the scanner versus scanning only one package OR scanning all packages. The use of this file is explained in the README.md near line 19.

The specific packages listed here are just packages chosen at random for example purposes. I asked Gemini to grab a couple package names to help me do a test of functionality.

daniel-sanche · 2026-05-08T22:14:41Z

+        self.variables = self._compute_variables()
+
+    def _compute_variables(self) -> Dict[str, str]:
+        """Compute variables for interpolation from version string."""


nit: more detailed comments/examples could be helpful for future maintainers. I'm not sure what a variable is, or the expected version string format

Added additional comments.

daniel-sanche · 2026-05-08T22:17:33Z

+    try:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            skip_next = False
+            for line_num, line in enumerate(f, 1):


are there any issues with statements that span lines?

Added a limitation in the README.md clarifying that this is solely a single-line scanning engine. Mostly because references to version numbers tend to happen on a single line and to keep complexity low. If we determine that not having multi-line versioning is an issue, we can include that as a feature in future updates.

daniel-sanche · 2026-05-08T22:23:57Z

+def upload_to_drive(csv_path: str, matches: List[Dict[str, str]], github_repo: str = None, branch: str = "main") -> str:
+    """
+    Upload matches to a Google Sheet in Drive.
+    """


Is this necessary? It seems to add extra complexity, dependencies and test surface area, when Google Sheets makes it pretty easy to import a csv natively already

Great question! During dependency cleanup sessions, I typically ran the scanner multiple times:

comparing 'before and after' results

rescanning a library if I detected a new regex pattern should be used

and doing manual QA cycles

Having an automated --upload feature to instantly publish a shared, readable Google Sheet saves significant toil compared to repeatedly exporting and manually importing CSVs into Sheets.

daniel-sanche · 2026-05-08T22:26:27Z

+        parts = rel_root.split(os.sep)
+
+        # Monorepo filtering
+        if target_packages and parts[0] == "packages":


There's talk of separating the packages directory into separate ones for generated and handwritten libraries. Will that be easy to address here?

We have the ability to handle any aggregating folder from this list:

packages/

handwritten/

generated/

third-party/

other folder names can be added simply enough to the version_scanner if we end up using different naming conventions during a future mono-repo upgrade.

This update can be found near version_scanner.pyL#515

daniel-sanche · 2026-05-08T22:28:13Z

+
+    package_group.add_argument(
+        "--package",
+        help="Specific subdirectory filter (useful for monorepos)"


Is this specific to the structure of the monorepo's package directory? Os is this more of a generic subdirectory filter?

…e api clients

…and package-file details

…ayout-agnostic package naming

…limits

chalmerlowe

@daniel-sanche

I responded to the comments with some code updates and with some explanations. Please take a look.

chalmerlowe · 2026-05-19T12:01:34Z

@@ -0,0 +1,34 @@
+import csv


Added a license header to code files.
Per convention, did not include license header in:

config files such as .gitignore, requirements.txt

OR data files used for testing

chalmerlowe · 2026-05-19T12:02:04Z

+Run the script from the repository root:
+
+```bash
+python3 scripts/version_scanner/version_scanner.py -d <dependency> -v <version> [options]


Added requirements.txt

chalmerlowe · 2026-05-19T12:05:16Z

+This plan outlines the approach to update Python packages to drop support for end-of-life Python runtimes (3.7, 3.8, 3.9) OR for deprecated dependencies, and ensure the packages are configured for modern Python.
+
+#### High-Level Strategy
+- **One Branch Per Package**: To keep PRs manageable and isolated, we suggest a dedicated worktree and branch for each package (e.g., `feat/drop-<dependency>-<version>-<package-name>` i.e. `feat/drop-protobuf-4.25.8-google-cloud-bigquery`).


There is a note to the effect that the if the templates in the gapic-generator are update, then the changes will trickle down to generated packages. This not is in the README.md in the vicinity of lines 34 - 38.

chalmerlowe · 2026-05-19T12:09:08Z

@@ -0,0 +1,5 @@
+packages/google-cloud-access-context-manager


I don't know if you are asking about google-cloud-access-context-manager or about the file.

The file small_package_list.txt is a way to present a list of packages to the scanner versus scanning only one package OR scanning all packages. The use of this file is explained in the README.md near line 19.

The specific packages listed here are just packages chosen at random for example purposes. I asked Gemini to grab a couple package names to help me do a test of functionality.

chalmerlowe · 2026-05-19T12:10:55Z

+        self.variables = self._compute_variables()
+
+    def _compute_variables(self) -> Dict[str, str]:
+        """Compute variables for interpolation from version string."""


Added additional comments.

chalmerlowe · 2026-05-19T12:12:47Z

+    try:
+        with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+            skip_next = False
+            for line_num, line in enumerate(f, 1):


Added a limitation in the README.md clarifying that this is solely a single-line scanning engine. Mostly because references to version numbers tend to happen on a single line and to keep complexity low. If we determine that not having multi-line versioning is an issue, we can include that as a feature in future updates.

chalmerlowe · 2026-05-19T12:14:25Z

+def upload_to_drive(csv_path: str, matches: List[Dict[str, str]], github_repo: str = None, branch: str = "main") -> str:
+    """
+    Upload matches to a Google Sheet in Drive.
+    """


Great question! During dependency cleanup sessions, I typically ran the scanner multiple times:

comparing 'before and after' results

rescanning a library if I detected a new regex pattern should be used

and doing manual QA cycles

Having an automated --upload feature to instantly publish a shared, readable Google Sheet saves significant toil compared to repeatedly exporting and manually importing CSVs into Sheets.

chalmerlowe · 2026-05-19T12:17:17Z

+        parts = rel_root.split(os.sep)
+
+        # Monorepo filtering
+        if target_packages and parts[0] == "packages":


We have the ability to handle any aggregating folder from this list:

packages/

handwritten/

generated/

third-party/

other folder names can be added simply enough to the version_scanner if we end up using different naming conventions during a future mono-repo upgrade.

This update can be found near version_scanner.pyL#515

feat(scripts): Add dependency version scanner tool

f446ff7

chalmerlowe changed the title ~~feat(scripts): Add dependency version scanner tool~~ feat(scripts): [WIP] Add dependency version scanner tool Apr 29, 2026

chalmerlowe added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

chalmerlowe added 26 commits April 29, 2026 08:40

perf(search): Apply bot suggestions for regex optimization and imports

256b048

refactor(benchmark): Use tempfile for unique names and safe cleanup

1010399

refactor(benchmark): Remove redundant directory check

68f61ee

test(integration): Check exit code of subprocess in integration test

cc960b4

test(unit): Remove redundant and brittle test_regex_patterns

a4ad9ce

test(unit): Move import yaml to top of file

2743957

refactor(benchmark): Remove redundant directory check in main

47450bb

test(unit): Remove duplicate import yaml from function

c777e44

feat(version_scanner): handle invalid format strings in config and ad…

8aab801

…d tests

feat(version_scanner): handle PermissionError when reading config fil…

f63053c

…e and add tests

feat(version_scanner): extract read_package_file and handle file errors

2af97b3

refactor(version_scanner): simplify target resolution and remove dupl…

cb29438

…ication

feat(version_scanner): add format_match_for_csv helper and tests

ea0e8be

feat(version_scanner): integrate GitHub link generation into CSV report

a8824af

feat(version_scanner): default output to results directory

baafb74

feat(version_scanner): ignore version_scanner directory during scan

a1cc08e

feat(version_scanner): broaden version regex and add case insensitivity

3ceea9b

feat(version_scanner): strip newlines from matched strings

d756c07

feat(version_scanner): add word boundaries and truncate long context …

075d04b

…lines

feat(version_scanner): add console summary table

85e9ff5

feat(version_scanner): add .scannerignore file support

5c8f673

feat(version_scanner): move ignore defaults to .scannerignore file

efb3331

docs(version_scanner): add README.md

bf39072

docs(version_scanner): update README options and CLI help strings

9d9ce22

feat(version_scanner): set default for --github-repo

14e4dcc

feat(version_scanner): default config path to script directory

7fc03ca

chalmerlowe added 9 commits April 30, 2026 09:29

feat(version_scanner): support case-insensitive file ignores and add …

f64eac4

…changelog.md

feat(version_scanner): update small package list for demos

fc47dd6

Merge remote-tracking branch 'origin/main' into feat/add-version-scanner

95f6f19

Merge branch 'origin/main' into feat/add-version-scanner

761def6

feat(version_scanner): add combined_version_string rule and use word …

9289c8c

…boundaries for explicit_version_string

feat(scanner): add ability to detect ignore pragma

d771258

feat(scanner): move .scannerignore to script directory and update loo…

bafae70

…kup logic

chore(scanner): ignore repositories.bzl in scanner

94174bb

feat(scanner): add filename scanning support

d652dbf

chalmerlowe marked this pull request as ready for review May 5, 2026 13:03

chalmerlowe requested a review from a team as a code owner May 5, 2026 13:03

chalmerlowe removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 5, 2026

chalmerlowe changed the title ~~feat(scripts): [WIP] Add dependency version scanner tool~~ feat(scripts): Add dependency version scanner tool May 5, 2026

docs(scanner): update README with known issues and add binary ignores…

a1188c8

… to .scannerignore

chalmerlowe added this to the Drop support for 3.7-3.9 milestone May 5, 2026

parthea self-assigned this May 6, 2026

docs(version-scanner): merge migration guide into README.md

0a6ae92

daniel-sanche reviewed May 8, 2026

View reviewed changes

chalmerlowe added 6 commits May 11, 2026 12:16

Merge branch 'main' into feat/add-version-scanner

7cdbe72

chore(version_scanner): add Apache 2.0 copyright headers

303906d

feat(version_scanner): implement lazy optional dependencies for googl…

919ae7e

…e api clients

docs(version_scanner): update README with setup, scope, limitations, …

a7907a9

…and package-file details

feat(version_scanner): implement generic subdirectory filtering and l…

208aa74

…ayout-agnostic package naming

docs(version_scanner): add disclaimer regarding prompt usage and LLM …

4287c04

…limits

chalmerlowe commented May 19, 2026

View reviewed changes

Conversation

chalmerlowe commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chalmerlowe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chalmerlowe commented Apr 29, 2026 •

edited

Loading