♻️ Refactor RL actions by SDK by flowerthrower · Pull Request #680 · munich-quantum-toolkit/predictor

flowerthrower · 2026-05-12T14:35:34Z

Description

Restructures RL actions into SDK-specific modules for Qiskit, TKET, and BQSKit. Also moves the corresponding wrapper logic closer to each SDK action implementation and keeps compatibility exports for existing imports.

This immensely improves modularity and reduces SDK-dependent patches scattered across RL modules.
It also enables much cleaner and proper testing of pass invariants.

Moved individual RL passes from actions.py to SDK-specific action modules.
Moved shared action types and registration to rl/actions/__init__.py.
Moved SDK-specific pass runners, layout handling, and masking logic out of PredictorEnv.
Removed parsing.py; conversion/layout helpers now live with the owning SDK.
Updated tests to verify SDK actions do what they promise to do (including invariant checks).

Fixes #668
Fixes #66

Assisted by: GPT5.5 via Codex

Checklist

The pull request only contains commits that are focused and relevant to this change.
I have added appropriate tests that cover the new/changed functionality.
I have updated the documentation to reflect these changes.
I have added entries to the changelog for any noteworthy additions, changes, fixes, or removals.
I have added migration instructions to the upgrade guide (if needed).
The changes follow the project's style guidelines and introduce no new warnings.
The changes are fully tested and pass the CI checks.
I have reviewed my own code changes.

If PR contains AI-assisted content:

I have disclosed the use of AI tools in the PR description as per our AI Usage Guidelines.
AI-assisted commits include an Assisted-by: [Model Name] via [Tool Name] footer.
I confirm that I have personally reviewed and understood all AI-generated content, and accept full responsibility for it.

## Description This PR addresses critical bugs in the RL training process with the following key changes: **Structure Improvements:** - **Redesigned action validation logic** (`predictorenv.py`): Rewrote `determine_valid_actions_for_state()` with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations. - Added helper methods `is_circuit_laid_out()` and `is_circuit_routed()` to replace the buggy `CheckMap` pass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode. - **Fixed type annotation** (`actions.py`): Corrected `do_while` parameter type from `dict[str, Circuit]` to `PropertySet` and added missing import for Qiskit's `PropertySet`. - **Added reproducibility** (`predictor.py`): Set random seed for non-test training runs to ensure reproducible results. - **Improved VF2Layout error handling** (`predictorenv.py`): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training. **Test Updates:** - Suppressed deprecation warnings in tket routing test --------- Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com> Co-authored-by: flowerthrower <flowerthrower@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…ove-action-pass-imports-and-wrappers

codecov · 2026-05-12T17:49:10Z

Codecov Report

❌ Patch coverage is 94.42379% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/mqt/predictor/rl/actions/qiskit_actions.py	91.6%	6 Missing ⚠️
src/mqt/predictor/rl/predictorenv.py	87.1%	5 Missing ⚠️
src/mqt/predictor/rl/actions/bqskit_actions.py	93.1%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ts-and-wrappers

…ove-action-pass-imports-and-wrappers

flowerthrower · 2026-06-05T13:48:36Z

@coderabbitai full review

coderabbitai · 2026-06-05T13:48:43Z

✅ Action performed

Full review finished.

coderabbitai · 2026-06-05T14:01:29Z

📝 Walkthrough

Summary by CodeRabbit

Release Notes

Refactor
- Reorganized RL compilation actions into modular SDK-level structures (Qiskit, TKET, BQSKit) for improved maintainability.
- Migrated action registry and helpers into dedicated modules; consolidates pass definitions and execution logic.
- Updated environment integration to dispatch actions through the new modular system while preserving existing compiler pass support.

Walkthrough

The monolithic src/mqt/predictor/rl/actions.py and rl/parsing.py files are deleted and replaced by a new rl/actions/ package. The package exposes a shared action model/registry in __init__.py and separates SDK-specific action factories and execution helpers into qiskit_actions.py, tket_actions.py, and bqskit_actions.py. PredictorEnv is updated to delegate dispatch and availability checking to these helpers, and the integration tests are rewritten as invariant-based checks over compilation states.

Changes

RL Actions Architecture Refactor

Layer / File(s)	Summary
Action contract and registry `src/mqt/predictor/rl/actions/__init__.py`	Defines `CompilationOrigin`/`PassType` enums, `Action`/`DeviceIndependentAction`/`DeviceDependentAction` dataclasses, global `_ACTIONS` registry with `register_action`/`get_actions_by_pass_type`, and import-time registration of all SDK actions plus a built-in `terminate` action.
Qiskit SDK action implementation `src/mqt/predictor/rl/actions/qiskit_actions.py`	Implements optimization, O3, final-opt, layout, mapping, and synthesis action factories; handles VF2PostLayout metadata reconstruction; executes passes via `PassManager` with `DoWhileController` support; gates `VF2PostLayout` to IBM devices.
TKET SDK action implementation `src/mqt/predictor/rl/actions/tket_actions.py`	Implements `PreProcessTKETRoutingAfterQiskitLayout`, optimization/routing action factories, `final_layout_pytket_to_qiskit`, `run_tket_action` with Qiskit↔TKET circuit round-trip, and `is_tket_action_available` layout gate.
BQSKit SDK action implementation `src/mqt/predictor/rl/actions/bqskit_actions.py`	Implements `bqskit_to_qiskit` via OpenQASM 2 with `U1q`→`r` rewrite; defines optimization/mapping/synthesis action factories; implements cached `get_bqskit_native_gates`, `final_layout_bqskit_to_qiskit`, `run_bqskit_action`, and `is_bqskit_action_available`.
PredictorEnv delegation to SDK helpers `src/mqt/predictor/rl/predictorenv.py`	Replaces inline `_apply__action` methods with `run__action` dispatch; rewrites `action_masks` using SDK availability checks; wraps `apply_action` in `try/except` returning truncated episodes on error; removes deleted `parsing` imports.
Integration test refactor `tests/compilation/test_integration_further_SDKs.py`	Replaces BQSKit/TKET-specific tests with invariant-based tests using `_lay_out`/`_route`/`_synthesize` helpers and four tests validating synthesis, layout, routing, and optimization preservation contracts.
Unit test and helper test updates `tests/compilation/test_helper_rl.py`, `tests/compilation/test_predictor_rl.py`	Redirects `bqskit_to_qiskit`/`get_bqskit_native_gates`/`postprocess_vf2postlayout` imports to new SDK modules; patches `qiskit_actions.PassManager`; isolates registry state in `test_register_action`.
Changelog update `CHANGELOG.md`	Adds Unreleased → Changed entry and PR link reference for the RL-pass refactor.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

munich-quantum-toolkit/predictor#573: Updated DeviceDependentAction.do_while to accept a Qiskit PropertySet and extended PredictorEnv state-based valid-action logic — both directly overlap with the new action model and masking changes in this PR.
munich-quantum-toolkit/predictor#677: Introduced determine_valid_actions_for_state() and the synthesized/laid-out/routed classification logic that this PR preserves and only updates documentation for.
munich-quantum-toolkit/predictor#679: Added IQM native r gate support in the old rl/parsing.py code that this PR moves and refactors into rl/actions/bqskit_actions.py.

Suggested labels

refactor

Suggested reviewers

burgholzer

Poem

🐇 Hop, hop, the modules split apart,
Each SDK gets its own fresh start.
Qiskit, TKET, BQSKit in a row,
The registry bundles them all just so.
No more one file to rule them all —
Clean little packages line the hall! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '♻️ Refactor RL actions by SDK' accurately summarizes the main refactoring objective: restructuring RL actions into SDK-specific modules (Qiskit, TKET, BQSKit). It is concise, clear, and reflects the primary change.
Description check	✅ Passed	The PR description explains the restructuring, lists specific changes (moving passes to SDK modules, consolidating types in init.py, removing parsing.py, updating tests), and references fixed issues (`#668`, `#66`). However, most checklist items remain unchecked and AI assistance is disclosed but some sections are incomplete.
Linked Issues check	✅ Passed	For `#668`: The PR restructures RL actions into SDK-specific modules (qiskit_actions.py, tket_actions.py, bqskit_actions.py) with wrapper logic co-located, consolidates types in rl/actions/init.py, and removes parsing.py—fully meeting the objective. For `#66`: The PR addresses logging structure adjustments, though specific logging improvements are not explicitly detailed in the changeset summary.
Out of Scope Changes check	✅ Passed	The changes are tightly scoped to RL action refactoring (moving actions to SDK modules, consolidating types, updating imports) and corresponding test updates. The CHANGELOG update and passing AI disclosure are appropriate housekeeping. No unrelated refactoring, feature additions, or out-of-scope changes are evident.
Docstring Coverage	✅ Passed	Docstring coverage is 98.04% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch 668-improve-action-pass-imports-and-wrappers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 14: Update the changelog entry that currently reads "♻️ Restructure
existing RL passes into SDK-level action modules ([`#644`])" to reference the
correct PR number ([`#680`]) and add a corresponding link definition for [`#680`] in
the PR links section so the markdown link resolves; modify the entry text where
the phrase "Restructure existing RL passes into SDK-level action modules"
appears and add the matching link definition in the PR links block (the section
that contains other bracketed PR references).

In `@src/mqt/predictor/rl/actions/__init__.py`:
- Around line 86-95: The docstring for remove_action promises ValueError on
missing names but the function currently raises KeyError; change the raised
exception to ValueError (e.g., replace raise KeyError(msg) with raise
ValueError(msg)) so the implementation matches the documented API, keeping the
existing message and using the same _ACTIONS lookup logic in remove_action to
locate the correct place to modify.
- Around line 44-68: The public dataclasses Action and DeviceDependentAction are
missing Google-style docstrings for their public fields; update the docstrings
on both classes (Action, DeviceIndependentAction, DeviceDependentAction) to use
Google-style format and add an "Attributes:" section that documents each exposed
field (for Action: name, origin, pass_type, transpile_pass, preserves_layout,
preserves_routing, preserves_synthesis; for DeviceDependentAction also document
transpile_pass override and do_while) with a brief one-line description and type
for each attribute, keeping descriptions concise and matching the existing field
names and types (e.g., Callable[[PropertySet], bool] | None for do_while).
- Around line 73-103: The three functions register_action, remove_action, and
get_actions_by_pass_type lack Google-style docstring sections; update each
function's docstring to include "Args:" documenting parameters (e.g., action:
Action for register_action, name: str for remove_action), "Returns:" documenting
the return types (Action for register_action, None for remove_action,
dict[PassType, list[Action]] for get_actions_by_pass_type), and keep the
existing "Raises:" entries (ValueError/KeyError) as appropriate; reference the
Action and PassType types and the global _ACTIONS registry in the descriptions
to make the purpose clear.

In `@src/mqt/predictor/rl/actions/bqskit_actions.py`:
- Around line 135-145: Update the docstrings to Google style for
get_bqskit_native_gates and final_layout_bqskit_to_qiskit: replace "Arguments:"
with "Args:", add structured "Returns:" and "Raises:" sections (and for
final_layout_bqskit_to_qiskit also add an "Args:" describing the input
types/semantics and a "Returns:" describing the returned layout mapping and its
format), and ensure types (e.g., device: Target, return: list[Gate] or mapping
types) and any error conditions are documented exactly as in the functions
get_bqskit_native_gates and final_layout_bqskit_to_qiskit so they conform to the
repository Google-style docstring guidelines.

In `@src/mqt/predictor/rl/actions/qiskit_actions.py`:
- Around line 326-350: The VF2PostLayout branch currently discards the updated
ApplyLayout property set returned by postprocess_vf2postlayout and rebuilds
TranspileLayout from stale values; instead, after calling
postprocess_vf2postlayout(altered_qc, post_layout, layout) and updating
altered_qc, use the updated entries in property_set (e.g.
property_set["layout"], property_set["original_qubit_indices"],
property_set["final_layout"]) when constructing and returning TranspileLayout so
the returned layout metadata matches the rewritten circuit; keep using the
altered_qc and _input_qubit_count as before.

In `@src/mqt/predictor/rl/actions/tket_actions.py`:
- Around line 150-162: The runtime contract mismatch: is_tket_action_available
currently allows PassType.ROUTING when has_layout is false but run_tket_action
asserts layout is not None before writing final_layout; fix by making
run_tket_action tolerant of a None layout for routing: when action.pass_type ==
PassType.ROUTING and layout is None, construct a new TranspileLayout (or
equivalent empty layout object used elsewhere, e.g. the same type expected by
final_layout_pytket_to_qiskit) before assigning layout.final_layout =
final_layout_pytket_to_qiskit(tket_qc, altered_qc); keep the existing behavior
if layout is provided. Ensure you touch run_tket_action (the layout handling
path) and do not change is_tket_action_available unless you prefer the
alternative of enforcing has_layout for ROUTING.

In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 172-183: The broad `except Exception:  # noqa: BLE001` around the
apply_action call masks unexpected bugs; either narrow it to the specific error
classes `apply_action` can raise (e.g., ValueError, RuntimeError, ActionError)
and handle those, or keep a single broad handler but add a short rationale
comment explaining why catching all exceptions is necessary (e.g., to truncate
the environment on any action failure) and re-raise or log truly unexpected
exceptions if needed; update the block around self.used_actions.append /
altered_qc = self.apply_action(action) and set self.error_occurred as before,
but prefer enumerating concrete exception types or justify the BLE001
suppression in a comment next to the except line referencing apply_action and
error handling policy.
- Around line 325-360: apply_action currently only validates that action_index
exists in self.action_set, allowing masked-out actions to be executed; call the
environment's action mask (e.g., self.action_masks()) at the start of
apply_action, verify that the bit/entry for action_index is True/allowed, and if
not raise an informative ValueError (include action_index and action.origin);
apply this check before handling PassType.TERMINATE and before calling
run_qiskit_action/run_tket_action/run_bqskit_action so masked TERM and SDK
actions are rejected rather than executed.
- Around line 313-324: Update the docstring for the method apply_action to use
Google-style section headers: replace "Arguments:" with "Args:", keep "Returns:"
and "Raises:" as-is but ensure formatting matches Google style (indent param
lines under Args:, describe return under Returns:, and exceptions under
Raises:). Ensure the parameter name action_index, its type and description, the
returned QuantumCircuit, and the ValueError explanation appear under the correct
Google-style headers in the apply_action docstring.
- Around line 140-143: reset() currently clears self.layout causing
determine_valid_actions_for_state() to miscompute laid_out and step() allows
actions without checking valid masks; update reset() to preserve an incoming
circuit layout when present (do not unconditionally set self.layout = None) and
ensure determine_valid_actions_for_state() uses self.layout correctly to compute
laid_out; in step(), validate the requested action against self.valid_actions or
action_masks() before calling apply_action() (disallow TERMINATE or SDK actions
when not in self.valid_actions); narrow the broad "except Exception" to a
specific exception class (or add a comment explaining why a catch-all is
required) and update apply_action() docstring heading from "Arguments:" to
Google-style "Args:" so docs are consistent.

In `@tests/compilation/test_integration_further_SDKs.py`:
- Line 107: Replace the private access layout_before._input_qubit_count by
deriving the count from public metadata (e.g. compute it as
len(layout_before.initial_layout) or another public property that lists the
input qubits) and remove the "# noqa: SLF001" suppression; update the call site
that used _input_qubit_count to use the computed integer instead (reference
symbols: layout_before and _input_qubit_count).
- Around line 234-243: The preserves_layout check can be bypassed when
env.layout is None; change the conditional so the test fails if layout was
dropped: assert env.layout is not None, (f"{action.name} on
{env.device.description} VIOLATED INVARIANT preserves_layout: layout metadata
was removed"), then compute post_v2p =
dict(env.layout.initial_layout.get_virtual_bits()) and assert post_v2p ==
pre_v2p with the existing descriptive message; keep the existing
is_circuit_laid_out(compiled, layout) assertion but replace the optional if
block around env.layout with an explicit assertion that env.layout remains
present before comparing mappings (references: env.layout,
initial_layout.get_virtual_bits(), pre_v2p, preserves_layout).
- Around line 36-45: Both helper functions use short one-line docstrings—replace
them with Google-style docstrings: for _setup_env(PredictorEnv env,
QuantumCircuit circuit, TranspileLayout|None layout, int n_qubits) include an
Args section describing each parameter and a Returns section noting None and the
side-effect of resetting the env; for _is_available(PredictorEnv env, int idx)
include an Args section for env and idx and a Returns section describing that it
returns bool indicating whether the action is structurally and SDK-valid and
that it updates env.valid_actions via determine_valid_actions_for_state. Ensure
parameter types match the signatures and keep descriptions concise.

In `@tests/compilation/test_predictor_rl.py`:
- Around line 182-183: The test currently mutates env.action_set[0] which is
safe because PredictorEnv builds self.action_set per-instance, but it’s brittle
to hardcode index 0; change the test to pick a valid routing index from the
environment (e.g., use an index from env.actions_routing_indices or another
existing key in env.action_set) and assign the action to that index instead of 0
so the test does not depend on internal action ordering in PredictorEnv.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a836606e-13dc-4e30-9f71-5b2118832d30

📥 Commits

Reviewing files that changed from the base of the PR and between 10b56d0 and 3ec22c7.

📒 Files selected for processing (12)

.gitignore
CHANGELOG.md
src/mqt/predictor/rl/actions.py
src/mqt/predictor/rl/actions/__init__.py
src/mqt/predictor/rl/actions/bqskit_actions.py
src/mqt/predictor/rl/actions/qiskit_actions.py
src/mqt/predictor/rl/actions/tket_actions.py
src/mqt/predictor/rl/parsing.py
src/mqt/predictor/rl/predictorenv.py
tests/compilation/test_helper_rl.py
tests/compilation/test_integration_further_SDKs.py
tests/compilation/test_predictor_rl.py

💤 Files with no reviewable changes (2)

src/mqt/predictor/rl/actions.py
src/mqt/predictor/rl/parsing.py

coderabbitai · 2026-06-05T14:01:34Z

        if action_index not in self.action_set:
            msg = f"Action {action_index} not supported."
            raise ValueError(msg)

        action = self.action_set[action_index]

-        if action.name == "terminate":
+        if action.pass_type == PassType.TERMINATE:
            return self.state
+
        if action.origin == CompilationOrigin.QISKIT:
-            return self._apply_qiskit_action(action, action_index)
-        if action.origin == CompilationOrigin.TKET:
-            return self._apply_tket_action(action, action_index)
-        if action.origin == CompilationOrigin.BQSKIT:
-            return self._apply_bqskit_action(action, action_index)
-        msg = f"Origin {action.origin} not supported."
-        raise ValueError(msg)
-
-    def _apply_qiskit_action(self, action: Action, action_index: int) -> QuantumCircuit:
-        if action.name == "QiskitO3" and isinstance(action, DeviceDependentAction):
-            factory = cast("Callable[[list[str], CouplingMap | None], list[Task]]", action.transpile_pass)
-            passes = factory(
-                self.device.operation_names,
-                CouplingMap(self.device.build_coupling_map()) if self.layout else None,
+            altered_qc, self.layout = run_qiskit_action(
+                action=action,
+                circuit=self.state,
+                device=self.device,
+                layout=self.layout,
+                input_qubit_count=self.num_qubits_uncompiled_circuit,
            )
-            assert action.do_while is not None
-            pm = PassManager([DoWhileController(passes, do_while=action.do_while)])
-        else:
-            if callable(action.transpile_pass):
-                factory = cast("Callable[[Target], list[Task]]", action.transpile_pass)
-                passes = factory(self.device)
-            else:
-                passes = cast("list[Task]", action.transpile_pass)
-            pm = PassManager(passes)
-
-        altered_qc = pm.run(self.state)
-
-        if action_index in (
-            self.actions_layout_indices + self.actions_mapping_indices + self.actions_final_optimization_indices
-        ):
-            altered_qc = self._handle_qiskit_layout_postprocessing(action, pm, altered_qc)
-
-        elif (
-            action_index in self.actions_routing_indices and self.layout and pm.property_set["final_layout"] is not None
-        ):
-            self.layout.final_layout = pm.property_set["final_layout"]
-
-        # BasisTranslator errors on unitary gates; decompose them immediately so
-        # the circuit is always in a consistent state after a Qiskit action.
-        if altered_qc.count_ops().get("unitary"):
-            altered_qc = altered_qc.decompose(gates_to_decompose="unitary")
-
-        return altered_qc
-
-    def _handle_qiskit_layout_postprocessing(
-        self, action: Action, pm: PassManager, altered_qc: QuantumCircuit
-    ) -> QuantumCircuit:
-        if action.name == "VF2PostLayout":
-            assert pm.property_set["VF2PostLayout_stop_reason"] is not None
-            post_layout = pm.property_set["post_layout"]
-            if post_layout:
-                assert self.layout is not None
-                altered_qc, _ = postprocess_vf2postlayout(altered_qc, post_layout, self.layout)
-        elif action.name == "VF2Layout":
-            if pm.property_set["VF2Layout_stop_reason"] != VF2LayoutStopReason.SOLUTION_FOUND:
-                logger.warning(
-                    "VF2Layout pass did not find a solution. Reason: %s",
-                    pm.property_set["VF2Layout_stop_reason"],
-                )
-            else:
-                assert pm.property_set["layout"]
-        else:
-            assert pm.property_set["layout"]
-
-        if pm.property_set["layout"]:
-            # Layout/mapping passes create the base logical-to-physical mapping;
-            # later routing actions only update final_layout.
-            self.layout = TranspileLayout(
-                initial_layout=pm.property_set["layout"],
-                input_qubit_mapping=pm.property_set["original_qubit_indices"],
-                final_layout=pm.property_set["final_layout"],
-                _output_qubit_list=altered_qc.qubits,
-                _input_qubit_count=self.num_qubits_uncompiled_circuit,
+        elif action.origin == CompilationOrigin.TKET:
+            altered_qc, self.layout = run_tket_action(
+                action=action,
+                circuit=self.state,
+                device=self.device,
+                layout=self.layout,
            )
-        return altered_qc
-
-    def _apply_tket_action(self, action: Action, action_index: int) -> QuantumCircuit:
-        tket_qc = qiskit_to_tk(self.state, preserve_param_uuid=True)
-        if callable(action.transpile_pass):
-            factory = cast("Callable[[Target], list[Task]]", action.transpile_pass)
-            passes = factory(self.device)
-        else:
-            passes = cast("list[Task]", action.transpile_pass)
-        for pass_ in passes:
-            assert isinstance(pass_, TketBasePass | PreProcessTKETRoutingAfterQiskitLayout)
-            pass_.apply(tket_qc)
-
-        qbs = tket_qc.qubits
-        tket_qc.rename_units({qbs[i]: Qubit("q", i) for i in range(len(qbs))})
-        altered_qc = tk_to_qiskit(tket_qc, replace_implicit_swaps=True)
-
-        if action_index in self.actions_routing_indices:
-            assert self.layout is not None
-            self.layout.final_layout = final_layout_pytket_to_qiskit(tket_qc, altered_qc)
-
-        return altered_qc
-
-    def _apply_bqskit_action(self, action: Action, action_index: int) -> QuantumCircuit:
-        """Applies the given BQSKit action to the current state and returns the altered state.
-
-        Arguments:
-            action: The BQSKit action to be applied.
-            action_index: The index of the action in the action set.
-
-        Returns:
-            The altered quantum circuit after applying the action.
-
-        Raises:
-            ValueError: If the action index is not in the action set or if the action origin is not supported.
-        """
-        bqskit_qc = qiskit_to_bqskit(self.state)
-        if action_index in self.actions_opt_indices:
-            transpile = cast("Callable[[Circuit], Circuit]", action.transpile_pass)
-            bqskit_compiled_qc = transpile(bqskit_qc)
-        elif action_index in self.actions_synthesis_indices:
-            factory = cast("Callable[[Target], Callable[[Circuit], Circuit]]", action.transpile_pass)
-            bqskit_compiled_qc = factory(self.device)(bqskit_qc)
-        elif action_index in self.actions_mapping_indices:
-            factory = cast(
-                "Callable[[Target], Callable[[Circuit], tuple[Circuit, tuple[int, ...], tuple[int, ...]]]]",
-                action.transpile_pass,
+        elif action.origin == CompilationOrigin.BQSKIT:
+            altered_qc, self.layout = run_bqskit_action(
+                action=action,
+                circuit=self.state,
+                device=self.device,
+                layout=self.layout,
            )
-            bqskit_compiled_qc, initial, final = factory(self.device)(bqskit_qc)
-            compiled_qiskit_qc = bqskit_to_qiskit(bqskit_compiled_qc)
-            self.layout = final_layout_bqskit_to_qiskit(initial, final, compiled_qiskit_qc, self.state)
-            return compiled_qiskit_qc
        else:
-            msg = f"Unhandled BQSKit action index: {action_index}"
+            msg = f"Origin {action.origin} not supported."
            raise ValueError(msg)

-        return bqskit_to_qiskit(bqskit_compiled_qc)
+        return altered_qc


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject actions that are masked out for the current state.

apply_action() now only checks that the index exists. That lets callers bypass action_masks() entirely: TERMINATE succeeds from the initial state, and SDK actions filtered out by availability are still dispatched until a helper fails. Enforce the current mask here before returning self.state or calling any run_*_action helper.

Suggested fix

if action_index not in self.action_set: msg = f"Action {action_index} not supported." raise ValueError(msg) + if not self.action_masks()[action_index]: + msg = f"Action {action_index} is not valid for the current circuit state." + raise ValueError(msg) action = self.action_set[action_index]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mqt/predictor/rl/predictorenv.py` around lines 325 - 360, apply_action currently only validates that action_index exists in self.action_set, allowing masked-out actions to be executed; call the environment's action mask (e.g., self.action_masks()) at the start of apply_action, verify that the bit/entry for action_index is True/allowed, and if not raise an informative ValueError (include action_index and action.origin); apply this check before handling PassType.TERMINATE and before calling run_qiskit_action/run_tket_action/run_bqskit_action so masked TERM and SDK actions are rejected rather than executed.

as above, this should not happen (or do you see a path?)

@flowerthrower, you're right — in the normal RL training loop the sequence is:

policy → action_masks() → step() → apply_action()

By the time apply_action() is reached, the policy has already used the mask to select only structurally and SDK-valid actions, so an additional guard inside apply_action() would be redundant, for exactly the same reason as the existing guard-free design of step(). I don't see a reachable path in the current codebase where a masked-out action could arrive there. Withdrawing the suggestion.

✏️ Learnings added

Learnt from: flowerthrower Repo: munich-quantum-toolkit/predictor PR: 680 File: src/mqt/predictor/rl/predictorenv.py:325-360 Timestamp: 2026-06-08T13:25:02.753Z Learning: In munich-quantum-toolkit/predictor, `PredictorEnv.apply_action()` in `src/mqt/predictor/rl/predictorenv.py` does not re-validate the requested action against `action_masks()` before dispatching to `run_*_action` helpers. This is intentional: `apply_action()` is only ever called from `step()`, which itself is only reached after the RL policy has already filtered actions through `action_masks()`, making an additional guard inside `apply_action()` redundant.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: flowerthrower Repo: munich-quantum-toolkit/predictor PR: 680 File: src/mqt/predictor/rl/predictorenv.py:140-143 Timestamp: 2026-06-08T13:11:33.062Z Learning: In munich-quantum-toolkit/predictor, `PredictorEnv.step()` in `src/mqt/predictor/rl/predictorenv.py` does not re-validate the requested action against `self.valid_actions` or `action_masks()` before calling `apply_action()`. This is intentional: step() is only called in a tightly controlled RL training loop after actions have already been filtered by masking, so an additional guard would be redundant.

Learnt from: flowerthrower Repo: munich-quantum-toolkit/predictor PR: 680 File: src/mqt/predictor/rl/predictorenv.py:140-143 Timestamp: 2026-06-08T13:11:33.062Z Learning: In munich-quantum-toolkit/predictor, the broad `except Exception` catch (with `# noqa: BLE001`) in `PredictorEnv.step()` in `src/mqt/predictor/rl/predictorenv.py` is intentional. Qiskit, TKET, and BQSKit compilation passes can raise a wide variety of unpredictable exceptions, and catching them all to truncate the episode gracefully is the desired design.

Learnt from: denialhaag Repo: munich-quantum-toolkit/predictor PR: 572 File: tests/compilation/test_integration_further_SDKs.py:20-20 Timestamp: 2026-01-21T00:18:21.993Z Learning: In the munich-quantum-toolkit/predictor repository, Ruff is configured with preview mode enabled (`preview = true`). The PLC2701 rule (import from private modules) is active, so `# noqa: PLC2701` directives are necessary when importing from private modules like `pytket._tket.passes`, even if Ruff reports RUF100 warnings suggesting the directive is unused.

Learnt from: flowerthrower Repo: munich-quantum-toolkit/predictor PR: 680 File: src/mqt/predictor/rl/predictorenv.py:140-143 Timestamp: 2026-06-08T13:11:33.062Z Learning: In munich-quantum-toolkit/predictor, `PredictorEnv.reset()` in `src/mqt/predictor/rl/predictorenv.py` unconditionally sets `self.layout = None`. This is intentional: reset() is only called at the start of a new rollout from the fully uncompiled training circuit set, so no pre-existing layout should ever be assumed or preserved.

Learnt from: denialhaag Repo: munich-quantum-toolkit/predictor PR: 572 File: src/mqt/predictor/rl/predictorenv.py:0-0 Timestamp: 2026-01-21T00:17:19.184Z Learning: In the munich-quantum-toolkit/predictor repository, Ruff is configured with preview = true and the "PL" (pylint) rule category in extend-select. PLC2701 (import-private-name) is a preview rule that flags imports from private modules (names starting with underscore). The `# noqa: PLC2701` directive on the import `from pytket._tket.passes import BasePass as TketBasePass` in src/mqt/predictor/rl/predictorenv.py is necessary and appropriate, even if Ruff reports RUF100 warnings suggesting the directive is unused.

Learnt from: flowerthrower Repo: munich-quantum-toolkit/predictor PR: 526 File: src/mqt/predictor/rl/predictorenv.py:238-243 Timestamp: 2026-02-09T13:15:50.387Z Learning: In src/mqt/predictor/rl/predictorenv.py, when reward_function is 'estimated_hellinger_distance', compute the Hellinger-distance reward at every step rather than only at episode termination. Currently, non-terminal steps receive no_effect_penalty, which prevents intermediate shaping signals. Implement delta-based reward shaping by producing or accumulating a per-step shaping signal (analogous to how it's done for expected_fidelity and estimated_success_probability) so that intermediate rewards reflect progress toward the goal. Maintain the existing terminal reward behavior on episode end. Add or adjust tests to verify step-wise rewards for this case and ensure the shaping signal is correctly integrated into the total return.

Learnt from: burgholzer Repo: munich-quantum-toolkit/predictor PR: 526 File: src/mqt/predictor/rl/predictorenv.py:271-271 Timestamp: 2025-12-25T13:28:19.850Z Learning: In the munich-quantum-toolkit/predictor repository, Ruff has a broad rule set including FLake8-SLF (slf) in extend-select. For private member access (e.g., self.state._layout = self.layout) in src/mqt/predictor/rl/predictorenv.py, include a per-file noqa directive: # noqa: SLF001. This is appropriate even if Ruff reports RUF100 as unused, to acknowledge intentional private attribute access and to avoid false positives in this specific code path. Apply this directive only to the files where private-member access is intentionally used and where SLF001 is the correct rationale.

Learnt from: linus-hologram Repo: munich-quantum-toolkit/predictor PR: 641 File: src/mqt/predictor/rl/tracer.py:172-186 Timestamp: 2026-04-13T21:24:06.882Z Learning: In the munich-quantum-toolkit/predictor repository, Ruff’s flake8-boolean-trap integration (FBT) is not enabled (FBT is not in Ruff’s `extend-select` in `pyproject.toml`), and the specific rule `FBT001` is not active. Therefore, during code reviews for this repo, do not flag boolean-typed positional arguments as `FBT001` Ruff violations unless `FBT001` is re-enabled in `pyproject.toml`.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

…ove-action-pass-imports-and-wrappers

flowerthrower · 2026-06-19T14:45:50Z

@coderabbitai review

coderabbitai · 2026-06-19T14:45:57Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/compilation/test_predictor_rl.py`:
- Around line 25-26: The import statement for qiskit_actions uses an alias that
matches the module name, which Ruff suggests simplifying per PLR0402. Replace
the import statement `import mqt.predictor.rl.actions.qiskit_actions as
qiskit_actions` with a from-import style: `from mqt.predictor.rl.actions import
qiskit_actions`. This eliminates the redundant alias and improves code clarity.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 73f17e94-e81a-4060-a17a-eb0063efc4fa

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec22c7 and b3f9f06.

📒 Files selected for processing (8)

CHANGELOG.md
src/mqt/predictor/rl/actions/__init__.py
src/mqt/predictor/rl/actions/bqskit_actions.py
src/mqt/predictor/rl/actions/qiskit_actions.py
src/mqt/predictor/rl/predictorenv.py
tests/compilation/test_helper_rl.py
tests/compilation/test_integration_further_SDKs.py
tests/compilation/test_predictor_rl.py

coderabbitai · 2026-06-19T14:52:07Z

+import mqt.predictor.rl.actions as actions_module
+import mqt.predictor.rl.actions.qiskit_actions as qiskit_actions


🧹 Nitpick | 🔵 Trivial | 💤 Low value

Simplify import per Ruff PLR0402.

The alias qiskit_actions is identical to the module name. Ruff suggests using a from import for clarity.

♻️ Proposed simplification

import mqt.predictor.rl.actions as actions_module -import mqt.predictor.rl.actions.qiskit_actions as qiskit_actions +from mqt.predictor.rl.actions import qiskit_actions

🧰 Tools

🪛 Ruff (0.15.17)

[warning] 26-26: Use from mqt.predictor.rl.actions import qiskit_actions in lieu of alias

Replace with from mqt.predictor.rl.actions import qiskit_actions

(PLR0402)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/compilation/test_predictor_rl.py` around lines 25 - 26, The import statement for qiskit_actions uses an alias that matches the module name, which Ruff suggests simplifying per PLR0402. Replace the import statement `import mqt.predictor.rl.actions.qiskit_actions as qiskit_actions` with a from-import style: `from mqt.predictor.rl.actions import qiskit_actions`. This eliminates the redundant alias and improves code clarity.

Source: Linters/SAST tools

flowerthrower and others added 17 commits March 11, 2026 12:24

Merge commit '90ec2cf' into fix-RL-training-bug

4a0ed8d

🎨 improve seed and training defaults

68eb338

🎨 adjust test step limits

f9de637

⏪ revert unrelated changes

55e5e08

🎨 pre-commit fixes

ba9042d

🎨 improve comments

dca4827

✅ fix synthesis size limit for bqskit passes

1e523e1

🎨 pre-commit fixes

6d6487a

🎨 reduce test training overhead

7a300a2

🎨 add comments

d64a97f

🎨 reduce number of training steps

9f2697e

🚚 move actions imports

f685fdd

🚚 move parsing logic

74a0f9a

🎨 add changelog entry

512d2b0

✅ add tests

c398324

🎨 add changelog entry

010fa68

flowerthrower linked an issue May 12, 2026 that may be closed by this pull request

🎨 improve action/pass imports and wrappers #668

Open

pre-commit-ci Bot and others added 4 commits May 12, 2026 14:35

🎨 pre-commit fixes

59bdf52

🔀 pull changes from #679

2dd86dd

Merge commit '59bdf52374e47aaf2ba6107453430adf37c458b9' into 668-impr…

ad83a28

…ove-action-pass-imports-and-wrappers

🎨 improve docstring

86d078a

flowerthrower and others added 7 commits May 29, 2026 16:00

Merge remote-tracking branch 'origin/main' into fix-RL-training-bug

788ec25

🎨 imporve error reporting

a474a8f

✅ improve coverage

51b20af

✅ fix test for qiskit<2

7e9a369

🎨 pre-commit fixes

0694749

Merge branch 'main' into fix-RL-training-bug

ce7491a

Merge branch 'fix-RL-training-bug' into 668-improve-action-pass-impor…

067c6de

…ts-and-wrappers

flowerthrower added 5 commits June 5, 2026 10:09

Merge commit '3d600d14576b04dcc29fbaafb5667ae0b7467b6d' into 668-impr…

514acfb

…ove-action-pass-imports-and-wrappers

🎨 move validity check into sdk specific modules

95c2f7b

🔥 remove parsing shell module

1815f5b

🎨 add invariants

7ddf267

✅ update tests with individual passes

3ec22c7

coderabbitai Bot requested changes Jun 5, 2026

View reviewed changes

flowerthrower and others added 11 commits June 5, 2026 16:17

🎨 fix changelog

32c1edd

🎨 improve docstrings

760fc57

Apply suggestions from code review

65527f3

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>

Merge commit '65527f3d2121ffb6c58080c3d9e54e12b69072e0' into 668-impr…

4e090ab

…ove-action-pass-imports-and-wrappers

🎨 update post-layout in qiskit pass

a7f5bc8

🎨 improve test

a02ecec

🎨 pre-commit fixes

9774e4d

🎨 comply with goolge docstings

06f6f61

🎨 calm down rabbit complaining about Arguments

aa24d03

🎨 improve tests

93a714d

Merge branch 'main' into 668-improve-action-pass-imports-and-wrappers

09389f3

flowerthrower marked this pull request as ready for review June 8, 2026 14:04

flowerthrower added 6 commits June 8, 2026 16:32

🎨 remove import forwarding

69f42b3

🎨 improve tests

846bb39

⏪ revert out-of-scope changes

e9eec21

⏪ revert out-of-scope changes

b8dce82

Merge commit 'be0d490fb5ef5731cf32ca2b2efde1f5b6c7ddf9' into 668-impr…

46414ea

…ove-action-pass-imports-and-wrappers

🔥 remove dead helper

b3f9f06

coderabbitai Bot requested changes Jun 19, 2026

View reviewed changes

		import mqt.predictor.rl.actions as actions_module
		import mqt.predictor.rl.actions.qiskit_actions as qiskit_actions

Uh oh!

Conversation

flowerthrower commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

flowerthrower commented Jun 5, 2026

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flowerthrower Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flowerthrower commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

flowerthrower commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading