Skip to content

Latest commit

 

History

History
201 lines (160 loc) · 8.24 KB

File metadata and controls

201 lines (160 loc) · 8.24 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

DagShell is a virtual POSIX filesystem with content-addressable DAG structure. It provides four primary interfaces:

  1. Core Filesystem (dagshell.py) - Immutable, content-addressed nodes (FileNode, DirNode, DeviceNode)
  2. Fluent API (dagshell_fluent.py) - Chainable Python interface with method chaining and piping
  3. Terminal Emulator (terminal.py) - Full shell with interactive, command, and one-shot modes
  4. Scheme Interpreter (scheme_interpreter.py) - Embedded Scheme DSL for filesystem operations

Development Commands

Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_dagshell.py -v

# Run specific test class or method
python -m pytest tests/test_terminal.py::TestCommandParser -v
python -m pytest tests/test_terminal.py::TestCommandExecutor::test_execute_simple_command -xvs

# Run with coverage
python -m pytest tests/ --cov=dagshell --cov-report=html --cov-report=term

# Run without traceback for clean output
python -m pytest tests/test_terminal.py -v --tb=no

Code Quality

# Format code
black dagshell/ tests/

# Check formatting
black --check dagshell/ tests/

# Sort imports
isort dagshell/ tests/

# Lint
flake8 dagshell/ tests/

# Type checking
mypy dagshell/

Running the Terminal

# Interactive mode
dagshell
python -m dagshell.terminal

# One-shot mode (execute single command)
dagshell ls -la /home
dagshell pwd

# Command mode (execute command string)
dagshell -c "mkdir /a && touch /a/b"

# With filesystem persistence
dagshell --fs project.json ls /           # Load from file
dagshell --fs project.json --save mkdir /new  # Save changes back
dagshell -o output.json -c "mkdir /test"  # Save to specific file

# Environment variable for default filesystem
export DAGSHELL_FS=project.json
dagshell cat /data/file.txt

Architecture

Core Filesystem (dagshell.py)

  • Immutability: All nodes are frozen dataclasses. Changes create new nodes.
  • Content Addressing: SHA256 hashing for deduplication. Hash includes all metadata (mode, uid, gid, mtime).
  • DAG Structure: Directories reference nodes by hash, creating a directed acyclic graph.
  • Node Types:
    • FileNode: Regular files with byte content
    • DirNode: Directories with entries dict mapping names to hashes
    • DeviceNode: Virtual devices (/dev/null, /dev/random, /dev/zero)
    • SymlinkNode: Symbolic links with target path
  • FileSystem Class: Manages the DAG with nodes (hash→node) and paths (path→hash).
  • Type Checking: All is_file()/is_dir()/is_device()/is_symlink() use proper POSIX bitmask (mode & 0o170000) == TYPE.

Fluent API (dagshell_fluent.py)

  • CommandResult: Wrapper enabling method chaining. Has .data, .text, .exit_code.
  • DagShell Class: Stateful shell with _cwd, _env, fs (FileSystem instance).
  • Dual Nature: Methods return CommandResult (Python object) OR redirect to virtual filesystem via .out().
  • Method Chaining: shell.mkdir("/project").cd("/project").echo("text").out("file.txt")
  • Piping: Store last result in _last_result, accessible via _() method for pipe-like composition.
  • Directory Stack: pushd/popd for directory navigation with _dir_stack.
  • Command History: _history list tracks executed commands, accessible via history() method.
  • Import/Export: import_file() and export_file() for real filesystem interaction.

Terminal Emulator (terminal.py)

  • TerminalSession: Main session with readline, tab completion, history expansion, and aliases.
  • HistoryManager: Persistent command history with expansion (!!, !n, !-n, !prefix).
  • AliasManager: Command aliases with JSON persistence.
  • TabCompleter: Readline-based completion for commands and paths.
  • CommandExecutor: Translates parsed Command objects to fluent API method calls.
  • CommandParser (command_parser.py): Parses shell syntax into structured Command/Pipeline/CommandGroup objects.
  • CLI Modes: Interactive (REPL), command (-c), and one-shot (positional args).
  • Persistence: --fs loads from JSON, --save writes back, -o specifies output file.
  • Data Flow: Raw command string → CommandParser → Command objects → CommandExecutor → Fluent API calls → CommandResult

Scheme Interpreter (scheme_interpreter.py)

  • Purpose: Provides Scheme DSL for filesystem operations as an alternative interface.
  • Core Components: Tokenizer, parser, evaluator with Environment for lexical scoping.
  • Integration: SchemeREPL has FileSystem reference, exposes filesystem operations as Scheme primitives.

Key Design Principles

  1. Separation of Concerns:

    • Command parsing is separate from execution
    • Fluent API is separate from terminal emulation
    • Each layer can be used independently
  2. Composability:

    • Small, focused methods that chain together
    • Unix philosophy: do one thing well
    • CommandResult enables both Python object access and filesystem redirection
  3. Immutability:

    • Filesystem nodes are immutable
    • Operations return new filesystem states
    • History preserved in DAG structure
  4. Testability:

    • Pure functions where possible
    • Clear interfaces between components
    • Comprehensive test suite with 859 tests at 88% coverage

Testing Strategy

  • Comprehensive Test Coverage: Target is high coverage across all modules
  • Test Organization (26 test files):
    • Core filesystem: test_dagshell.py, test_dagshell_extended.py, test_core_filesystem_comprehensive.py
    • Fluent API: test_fluent.py, test_text_processing_comprehensive.py, test_import_export_comprehensive.py
    • Terminal: test_terminal.py, test_terminal_features.py, test_terminal_advanced.py, test_terminal_coverage.py
    • Command parser: test_command_parser_coverage.py
    • Scheme: test_scheme_interpreter.py, test_scheme_integration_comprehensive.py, test_scheme_extended.py
    • Integration: test_integration.py, test_persistence_comprehensive.py
    • Regression/review: test_regression_fixes.py, test_code_review_fixes.py
    • Edge cases: test_edge_cases_comprehensive.py, test_user_permissions.py
  • When adding features: Write tests first or alongside implementation
  • After changes: Run relevant test suite and verify coverage hasn't dropped

Common Patterns

Adding a New Command

  1. Add method to DagShell class in dagshell_fluent.py
  2. Add flag mappings in CommandParser.FLAG_MAPPINGS in command_parser.py if needed
  3. Add execution logic in CommandExecutor._execute_command() in terminal.py
  4. Add tests in appropriate test file
  5. Update help system if user-facing

Working with the Filesystem

# Access the FileSystem instance
fs = shell.fs

# Nodes are immutable - operations return new filesystem
new_fs = fs.write("/path/file.txt", b"content")

# Resolve paths
abs_path = shell._resolve_path("relative/path")

# Check permissions
if not fs.can_read(path, uid, gid):
    raise PermissionError(...)

Method Chaining Pattern

def new_command(self, arg1, arg2):
    # Do filesystem operation
    result_data = ...

    # Return CommandResult for chaining
    return CommandResult(
        data=result_data,
        text=str(result_data),
        exit_code=0,
        _shell=self
    )

Important Notes

  • Minimal Dependencies: Only pyreadline3 on Windows (readline is built-in on Linux/macOS)
  • Python 3.8+: Minimum supported version
  • Virtual Devices: /dev/null, /dev/random, /dev/zero are implemented as DeviceNode with special behavior
  • Permissions: Unix-style permissions (mode bits) are enforced via can_read(), can_write(), can_execute()
  • Path Resolution: Always use _resolve_path() to handle relative paths correctly
  • Content Encoding: Files store bytes, conversion to/from str handled at API boundaries
  • Path Authority: self.paths is the canonical path→hash mapping. Directory children dicts may be stale after mutations (write/chmod/chown only propagate one level up). This is a known limitation.
  • Serialization: from_json() uses .get() (not .pop()) to avoid mutating input data. Keep it idempotent.