Maze Quick Reference Card

🚨 Critical Rules (Never Violate)

Grammar Design

- ❌ ?start: function_body        # llguidance doesn't support inline rules
+ ✅ start: function_body          # Use standard Lark syntax

Use the Right Grammar

# ❌ WRONG - Full grammar for completion task
prompt = "def calculate_sum(a: int, b: int) -> int:"
grammar = PYTHON_FUNCTION  # Starts with "def" - will duplicate signature!

# ✅ CORRECT - Completion grammar for completion task  
prompt = "def calculate_sum(a: int, b: int) -> int:"
grammar = PYTHON_FUNCTION_BODY  # Starts with body only

Test Properly

# ❌ WRONG - Meaningless test
assert result.code is not None

# ✅ CORRECT - Validate grammar enforcement
grammar = "start: simple\nsimple: \"return \" NUMBER\nNUMBER: /[0-9]+/"
result = generate(prompt, grammar)
assert "return" in result.code
assert any(c.isdigit() for c in result.code)
assert "#" not in result.code  # Grammar forbids comments

vLLM V1 API

# ❌ WRONG - Deprecated API
SamplingParams(guided_grammar=grammar)

# ✅ CORRECT - V1 API
from vllm.sampling_params import StructuredOutputsParams
SamplingParams(structured_outputs=StructuredOutputsParams(grammar=grammar))

# Initialize with guidance backend
LLM(model="...", structured_outputs_config={"backend": "guidance"})

📚 When to Use Which Grammar

Scenario	Prompt Example	Grammar to Use
Completion	`"def foo():"`	`PYTHON_FUNCTION_BODY`
Completion	`"function bar():"`	`TYPESCRIPT_FUNCTION_BODY`
Completion	`"fn baz() ->"`	`RUST_FUNCTION_BODY`
Full generation	`"Write a function to calculate sum"`	`PYTHON_FUNCTION`
Full generation	`"Create a TypeScript validator"`	`TYPESCRIPT_FUNCTION`

Detection heuristic (in pipeline.py):

Python: Prompt ends with :
TypeScript: Prompt contains function AND )
Rust: Prompt contains fn AND (-> OR ))

⚡ Performance Expectations

Metric	Value	Notes
Cold start	60-120s	Model loading on Modal
Warm latency (no grammar)	0.4s	Unconstrained generation
Warm latency (with grammar)	1-3s	Worth it for 100% validity
Throughput (no grammar)	70-80 tok/s	Fast but ~70% valid
Throughput (with grammar)	10-12 tok/s	Slow but 100% valid
Cache hit target	>70%	Required for performance

🔧 Modal Deployment

# Deploy
modal deploy deployment/modal/modal_app.py

# Endpoint
export MODAL_ENDPOINT_URL=https://rand--maze-inference-mazeinferenceserver-fastapi-app.modal.run

# Health check
curl $MODAL_ENDPOINT_URL/health

# Generate
curl -X POST $MODAL_ENDPOINT_URL/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "def test():", "grammar": "start: simple\nsimple: \"return \" NUMBER\nNUMBER: /[0-9]+/", "max_tokens": 16}'

🧪 Testing Commands

# Validate grammar constraints work
uv run pytest tests/validation/test_constraint_enforcement.py -v

# Performance benchmarks
uv run pytest -m performance -v

# Full test suite
uv run pytest tests/unit -v

# With coverage
uv run pytest --cov=maze --cov-report=html

📖 Documentation Links

Grammar Constraints Guide: Complete reference
Agent Guide: AI agent workflows and anti-patterns
Modal Deployment: Deployment guide

🐛 Common Errors

Error	Cause	Fix
"Failed to convert grammar from GBNF to Lark"	Used `?start:`	Change to `start:`
Signature duplication	Used full grammar for completion	Use `*_FUNCTION_BODY` grammar
"expected an indented block"	Only comments generated	Lower temperature, use directive prompt
Timeout (120s)	Cold start	Increase timeout to 120s+
Unclosed paren/bracket	Hit token limit mid-expression	Increase `max_tokens`

💡 Pro Tips

Always test with real Modal endpoint - Mocks hide grammar issues
Use temperature 0.0-0.1 for code completion - More deterministic
Start with minimal grammars - Expand gradually
Monitor cache hit rates - Should be >70%
Read docs/GRAMMAR_CONSTRAINTS.md - Contains all the hard-won lessons

Last Updated: 2025-11-12
Version: Post completion-grammar implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maze Quick Reference Card

🚨 Critical Rules (Never Violate)

Grammar Design

Use the Right Grammar

Test Properly

vLLM V1 API

📚 When to Use Which Grammar

⚡ Performance Expectations

🔧 Modal Deployment

🧪 Testing Commands

📖 Documentation Links

🐛 Common Errors

💡 Pro Tips

FilesExpand file tree

QUICK_REFERENCE.md

Latest commit

History

QUICK_REFERENCE.md

File metadata and controls

Maze Quick Reference Card

🚨 Critical Rules (Never Violate)

Grammar Design

Use the Right Grammar

Test Properly

vLLM V1 API

📚 When to Use Which Grammar

⚡ Performance Expectations

🔧 Modal Deployment

🧪 Testing Commands

📖 Documentation Links

🐛 Common Errors

💡 Pro Tips