Maze Generation and NL Interface Setup by arushi-jain-27 · Pull Request #1 · ManifoldRG/MultiNet-v2.0

arushi-jain-27 · 2026-04-25T08:22:31Z

NLU Interface

Purpose: nlu_benchmark evaluates models on language-guided navigation in a grid world loaded from the same maze JSON (maze + mechanisms) used by automatic maze generation. One run = one episode: reset the env, talk to the model, parse its actions, step the simulator, repeat until success, timeout, or repeated failure.

Simulation: GridWorldEnv implements TURN_LEFT / TURN_RIGHT / MOVE_FORWARD / PICKUP / TOGGLE / DONE, with walls, color-matched keys/doors, switches that control gates, and clear StepEvent outcomes (MOVED, BLOCKED, PICKUP, TOGGLED, DONE, WRONG_DONE, etc.). That logic is the ground truth; the model only sees what the prompt (and optional image) exposes.

Experimental knobs (ExperimentConfig):

prompting: minimal vs standard (mechanism list) vs verbose (rules + richer per-turn hints).
observation: text-only, text + live PNG each turn, or screenshot-first (minimal text, action-only history).
context_window: current turn only vs last 3 steps in the user message.
querying: step-by-step (one actionable token per model call), subgoal (batched SUB_GOAL + ACTIONS, re-query when the queue empties or after failures), or full trajectory (one long ACTIONS list per episode).

Prompting pipeline: The runner (ExperimentRunner / build_runner) assembles system instructions (task, valid actions, output format) and, for text/image+text modes, a fixed initial maze description once per episode. Each user turn adds the current situation (position, facing, inventory, live mechanism state), last step result, optional history, and optional base64 PNG renders. PNGs use the same drawing path as automatic_maze_generation/render_dataset.py so benchmark images match dataset style.

Model interface: The “agent” is any messages -> reply text function; included helpers cover random, Hugging Face Router, and local Transformers. Replies are parsed via FINAL_OUTPUT: and/or SUB_GOAL / ACTIONS depending on querying mode. The run returns success, steps used, final state, and a step transcript plus the serialized config for reproducibility.

Smoke testing: Sample mazes are exercised with a BFS shortest-path solver over the full mechanism state, converted into actions, and stepped through GridWorldEnv (e.g. nlu_benchmark/smoke_tests/smoke_smart_manual.py) to sanity-check loading, rendering, and env semantics without an LLM.

Maze generation

Generation is spec-driven: a MazeGenSpec picks a backbone topology (winding corridor, multi-route, side vault, sequential chain, dense maze), a logic chain (none, key–door, switch–gate, ordered multi-step chains, etc.), and optional distractors (wrong keys, dead ends, distractor chains). The pipeline is roughly sample spec → layout generation → mechanism placement → validation (orchestrator, generators, mechanisms, validator). Validation checks things like solvability, avoiding unintended shortcuts, and respecting prerequisite order before blockers.

The artifact is a JSON payload: maze (grid dimensions, wall segments, start, goal), mechanisms (keys, doors, switches, gates, plus reserved slots for other types), and validation (validity, reasons, solver cost/path and interaction traces where applicable). mazegen/generate_dataset.py turns validated layouts into these files for dataset builds. render_dataset.py reads the same JSON and draws PNG layouts with matplotlib (including a row/col → display-space conversion helper).

Init commit

8b5030c

pranavguru requested review from myhott163com, pranavguru and seanrivera May 1, 2026 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maze Generation and NL Interface Setup#1

Maze Generation and NL Interface Setup#1
arushi-jain-27 wants to merge 1 commit into
mainfrom
maze_gen_and_interface

arushi-jain-27 commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arushi-jain-27 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NLU Interface

Maze generation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arushi-jain-27 commented Apr 25, 2026 •

edited

Loading