Add EnvGraphSpec class & its yaml parser#690
Conversation
There was a problem hiding this comment.
🤖 Automated Code Review
Summary
This PR introduces a well-structured YAML parser for environment graph specifications using frozen dataclasses. The design enables typed, validated environment specs for graph-based simulation environments. The implementation is clean and the validation logic is thorough.
✅ Strengths
- Immutable design: Frozen dataclasses prevent accidental mutation of spec objects after creation
- Comprehensive validation: Reference integrity checks ensure task state_specs and constraint nodes exist
- Clean factory pattern:
from_dictandfrom_yamlmethods provide clear construction APIs - Convenient accessors:
nodes_by_id,tasks_by_id,state_specs_by_idproperties enable O(1) lookups - Good test coverage: Tests verify happy path parsing, duplicate IDs, and invalid references
🔧 Issues to Address
1. Bug in _optional_dict helper (High)
File: env_graph_spec.py lines 196-199
def _optional_dict(data: dict[str, Any], key: str) -> dict[str, Any]:
value = data.get(key, {}) # value defaults to {} when missing
assert value is None or isinstance(value, dict), ... # value can never be None here
return dict(value or {})The assertion cannot catch invalid types because data.get(key, {}) returns {} when the key is missing. If the key exists with a non-dict value (e.g., a string or list), the assertion works, but the None check is dead code.
Suggested fix:
def _optional_dict(data: dict[str, Any], key: str) -> dict[str, Any]:
value = data.get(key)
assert value is None or isinstance(value, dict), f"Optional field '{key}' must be a dict when set"
return dict(value or {})2. Assertions vs Exceptions for Input Validation (Medium)
Using assert for input validation is risky because assertions are disabled when Python runs with -O (optimized mode). In production, malformed YAML would silently pass validation.
Suggestion: Consider raising ValueError or defining a custom SpecValidationError:
if not isinstance(value, str) or not value:
raise ValueError(f"Missing required string field '{key}'")3. Node parent references not validated (Medium)
EnvGraphNodeSpec has a parent field, but _assert_references_exist does not verify that node parents reference valid node IDs. For example, a node with parent: "nonexistent_node" would be accepted.
Suggestion: Add validation in _assert_references_exist:
for node in nodes:
if node.parent is not None:
assert node.parent in node_ids, f"Node '{node.id}' references unknown parent '{node.parent}'"4. Task task_args references not validated (Low)
The task_args dictionary in tasks (e.g., object, destination, background) contains node ID references that are not validated against the node list. This is consistent with the stated "Next Step" to connect specs to ArenaEnvBuilder, but could catch errors earlier.
📋 Test Coverage Suggestions
- Add tests for malformed YAML (wrong types for required fields)
- Add tests for empty collections (no nodes, no tasks, no state_specs)
- Add a test for invalid node parent references
- Test
from_dictdirectly with edge cases, not just viafrom_yaml
💡 Minor Suggestions
- Copyright year inconsistency: Python files use "2025-2026", YAML/test files use "2026"
- Consider adding a module-level docstring to
env_graph_spec.py - The
*_by_idproperties rebuild dicts on each access — consider caching if accessed frequently
Overall this is a solid foundation for typed environment graph specs. The _optional_dict bug should be fixed before merge, and adding node parent validation would improve reference integrity.
Greptile SummaryThis PR introduces
Confidence Score: 4/5Do not merge until the The accidental deletion of
Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class EnvGraphSpec {
+str name
+list~EnvGraphNodeSpec~ nodes
+list~EnvGraphTaskSpec~ tasks
+list~EnvGraphStateSpec~ state_specs
+from_yaml(path) EnvGraphSpec
+from_dict(data) EnvGraphSpec
+nodes_by_id() dict
+tasks_by_id() dict
+state_specs_by_id() dict
}
class EnvGraphNodeSpec {
+str id
+str name
+str type
+str|None parent
+str|None prim_path
+str|None object_type
+dict params
}
class EnvGraphTaskSpec {
+str id
+str name
+str type
+dict~str,str~ state_specs
+dict task_args
}
class EnvGraphStateSpec {
+str id
+str name
+EnvGraphEdgesSpec edges
}
class EnvGraphEdgesSpec {
+list~EnvGraphConstraintSpec~ spatial_constraints
+list~EnvGraphConstraintSpec~ task_constraints
}
class EnvGraphConstraintSpec {
+str id
+str type
+str|None parent
+str|None child
+dict params
}
EnvGraphSpec --> EnvGraphNodeSpec : nodes
EnvGraphSpec --> EnvGraphTaskSpec : tasks
EnvGraphSpec --> EnvGraphStateSpec : state_specs
EnvGraphStateSpec --> EnvGraphEdgesSpec : edges
EnvGraphEdgesSpec --> EnvGraphConstraintSpec : constraints
EnvGraphTaskSpec ..> EnvGraphStateSpec : references
EnvGraphConstraintSpec ..> EnvGraphNodeSpec : parent/child refs
|
| def _required_str(data: dict[str, Any], key: str) -> str: | ||
| value = data.get(key) | ||
| assert isinstance(value, str) and value, f"Missing required string field '{key}'" | ||
| return value | ||
|
|
||
|
|
||
| def _optional_str(data: dict[str, Any], key: str) -> str | None: | ||
| value = data.get(key) | ||
| assert value is None or isinstance(value, str), f"Optional field '{key}' must be a string when set" | ||
| return value | ||
|
|
||
|
|
||
| def _optional_dict(data: dict[str, Any], key: str) -> dict[str, Any]: | ||
| value = data.get(key, {}) | ||
| assert value is None or isinstance(value, dict), f"Optional field '{key}' must be a dict when set" | ||
| return dict(value or {}) |
There was a problem hiding this comment.
All three helper functions use
assert for input validation. Python strips assert statements when the interpreter is run with -O or -OO (the optimize flag), so every type check silently becomes a no-op. For example, with -O, _required_str will return None (or a non-string value) without raising, and the caller gets a frozen dataclass with id=None, name=None, or type=None, which propagates errors to a much later (and harder to diagnose) point.
| def _required_str(data: dict[str, Any], key: str) -> str: | |
| value = data.get(key) | |
| assert isinstance(value, str) and value, f"Missing required string field '{key}'" | |
| return value | |
| def _optional_str(data: dict[str, Any], key: str) -> str | None: | |
| value = data.get(key) | |
| assert value is None or isinstance(value, str), f"Optional field '{key}' must be a string when set" | |
| return value | |
| def _optional_dict(data: dict[str, Any], key: str) -> dict[str, Any]: | |
| value = data.get(key, {}) | |
| assert value is None or isinstance(value, dict), f"Optional field '{key}' must be a dict when set" | |
| return dict(value or {}) | |
| def _required_str(data: dict[str, Any], key: str) -> str: | |
| value = data.get(key) | |
| if not isinstance(value, str) or not value: | |
| raise ValueError(f"Missing required string field '{key}'") | |
| return value | |
| def _optional_str(data: dict[str, Any], key: str) -> str | None: | |
| value = data.get(key) | |
| if value is not None and not isinstance(value, str): | |
| raise TypeError(f"Optional field '{key}' must be a string when set, got {type(value).__name__}") | |
| return value | |
| def _optional_dict(data: dict[str, Any], key: str) -> dict[str, Any]: | |
| value = data.get(key, {}) | |
| if value is not None and not isinstance(value, dict): | |
| raise TypeError(f"Optional field '{key}' must be a dict when set, got {type(value).__name__}") | |
| return dict(value or {}) |
| @property | ||
| def nodes_by_id(self) -> dict[str, EnvGraphNodeSpec]: | ||
| """Return nodes keyed by id.""" | ||
| return {node.id: node for node in self.nodes} | ||
|
|
||
| @property | ||
| def tasks_by_id(self) -> dict[str, EnvGraphTaskSpec]: | ||
| """Return tasks keyed by id.""" | ||
| return {task.id: task for task in self.tasks} | ||
|
|
||
| @property | ||
| def state_specs_by_id(self) -> dict[str, EnvGraphStateSpec]: | ||
| """Return state specs keyed by id.""" | ||
| return {state_spec.id: state_spec for state_spec in self.state_specs} |
There was a problem hiding this comment.
Silent data loss on duplicate IDs
nodes_by_id, tasks_by_id, and state_specs_by_id are plain dict comprehensions — if two entries share the same id, the later entry silently overwrites the earlier one with no error. A YAML author who accidentally duplicates a node ID (or a programmatic builder that emits a duplicate) will get a graph that is missing an entry with no indication of the problem. Since these properties are the primary lookup API for downstream consumers, this is a correctness hazard worth catching at parse time in from_dict.
| def _optional_str(data: dict[str, Any], key: str) -> str | None: | ||
| value = data.get(key) | ||
| assert value is None or isinstance(value, str), f"Optional field '{key}' must be a string when set" | ||
| return value |
There was a problem hiding this comment.
_optional_str accepts an empty string "" as a valid return value (the assert only guards the type, not emptiness). _required_str, by contrast, explicitly rejects empty strings via and value. A YAML entry like parent: "" would propagate as a non-None but semantically meaningless reference rather than being treated the same as parent: null. Aligning the guard makes the behaviour consistent.
| def _optional_str(data: dict[str, Any], key: str) -> str | None: | |
| value = data.get(key) | |
| assert value is None or isinstance(value, str), f"Optional field '{key}' must be a string when set" | |
| return value | |
| def _optional_str(data: dict[str, Any], key: str) -> str | None: | |
| value = data.get(key) | |
| if value is not None and not isinstance(value, str): | |
| raise TypeError(f"Optional field '{key}' must be a string when set, got {type(value).__name__}") | |
| return value or None |
| cube_limits = initial_state.edges.spatial_constraints[2] | ||
| assert cube_limits.type == "position_limits" | ||
| assert cube_limits.child == "rubiks_cube_hot3d_robolab" | ||
| assert cube_limits.params == { | ||
| "x_min": 0.55, | ||
| "x_max": 0.70, | ||
| "y_min": -0.40, | ||
| "y_max": -0.10, | ||
| } | ||
|
|
||
| final_state = spec.state_specs_by_id["state_spec_1"] | ||
| assert isinstance(final_state, EnvGraphStateSpec) | ||
| in_constraint = final_state.edges.spatial_constraints[3] | ||
| assert in_constraint.type == "in" | ||
| assert in_constraint.parent == "bowl_ycb_robolab" | ||
| assert in_constraint.child == "rubiks_cube_hot3d_robolab" | ||
|
|
||
| reach_constraint = final_state.edges.task_constraints[0] |
There was a problem hiding this comment.
Fragile positional constraint lookups
spatial_constraints[2], spatial_constraints[3], and task_constraints[0] rely on the YAML declaration order being stable. If a future edit reorders constraints in the fixture YAML, these assertions silently test the wrong constraint (or raise an IndexError). Prefer looking up by id — a helper {c.id: c for c in initial_state.edges.spatial_constraints} mirrors the pattern already used for nodes and tasks.
a35fea4 to
790734f
Compare
| def _assert_references_exist( | ||
| nodes: list[EnvGraphNodeSpec], | ||
| tasks: list[EnvGraphTaskSpec], | ||
| state_specs: list[EnvGraphStateSpec], | ||
| ) -> None: | ||
| node_ids = {node.id for node in nodes} | ||
| state_spec_ids = {state_spec.id for state_spec in state_specs} | ||
|
|
||
| for task in tasks: | ||
| for label, state_spec_id in task.state_specs.items(): | ||
| assert state_spec_id in state_spec_ids, ( | ||
| f"Task '{task.id}' references unknown state spec '{state_spec_id}' for '{label}'" | ||
| ) | ||
|
|
||
| for state_spec in state_specs: | ||
| constraints = state_spec.edges.spatial_constraints + state_spec.edges.task_constraints | ||
| for constraint in constraints: | ||
| if constraint.parent is not None: | ||
| assert constraint.parent in node_ids, ( | ||
| f"Constraint '{constraint.id}' references unknown parent node '{constraint.parent}'" | ||
| ) | ||
| if constraint.child is not None: | ||
| assert constraint.child in node_ids, ( | ||
| f"Constraint '{constraint.id}' references unknown child node '{constraint.child}'" | ||
| ) |
There was a problem hiding this comment.
Node
parent reference not validated
_assert_references_exist validates every constraint.parent and constraint.child field against node_ids, but never validates EnvGraphNodeSpec.parent. In the fixture, maple_table_robolab_table has parent: maple_table_robolab — a typo here (e.g., maple_table_robolab_x) would pass validation and produce a spec object whose parent silently points to a non-existent node, yielding a KeyError or None only when downstream code eventually tries to resolve it via nodes_by_id.
790734f to
967b27c
Compare
|
🔄 Follow-up review (commit 967b27c) Reviewed the incremental changes from Changes in this push:
No changes to Previous feedback status:
|
Summary
Add EnvGraphSpec class & its yaml parser
Detailed description
EnvGraphSpec/EnvGraphStateSpec, parser test, and maple-table pick-and-place fixture.Next Step