Skip to content

Latest commit

 

History

History
268 lines (209 loc) · 11.4 KB

File metadata and controls

268 lines (209 loc) · 11.4 KB

Testing Infrastructure Architecture

Overview

Reflaxe.Elixir uses a dual-mode testing system with shared utilities to provide both sequential and parallel test execution with identical behavior and 100% reliability.

CI lanes (why “WAE” and “mix-0x” exist)

GitHub Actions runs a few higher-level “integration” checks in addition to snapshot tests:

  • Examples (Elixir WAE): compiles each example’s generated Elixir under mix compile --warnings-as-errors.
    • WAE = “warnings as errors”. This is a release-hygiene gate: warnings often indicate subtle codegen/stdlib issues (unused vars, undefined funcs, module conflicts) that would otherwise slip through.
    • Keep example dependencies compatible with the CI Elixir version. Newer Elixir releases can introduce stricter warnings that older deps still emit (which can slow compiles and break WAE gates).
  • mix-01, mix-02, … shards: the examples list is split into multiple matrix shards so the WAE job:
    • finishes faster (parallel execution),
    • avoids per-job timeouts,
    • and isolates failures (you get a single shard log instead of one huge run).

The same WAE philosophy is used by the todo-app QA sentinel when it runs mix compile --warnings-as-errors.

How shards map to the repo

The WAE job is implemented by:

  • CI workflow: .github/workflows/ci.yml (matrix shards mix-01mix-11)
  • Runner script: scripts/test-examples-elixir.sh (called by npm run test:examples-elixir)

Each shard sets EXAMPLES_ELIXIR_WAE_ONLY=<example-name> so we compile one example per job. This keeps wall-clock time predictable, and it makes the failure logs focused (CI uploads _tmp/examples-elixir-wae/ as an artifact on failure).

Why you sometimes see “cancelled” checks

This repo uses GitHub Actions concurrency cancellation (cancel-in-progress). If you push multiple commits quickly, older runs will often show cancelled even when nothing is wrong. Only the latest run for the current HEAD matters.

Quick no-auth sanity check:

  • Use the GitHub API to query the CI workflow run for the current HEAD SHA and ensure all jobs conclude success.

Regression Snapshots (Compiler Correctness)

When a bug is found in a real app (especially examples/todo-app/), add a focused regression snapshot under:

  • test/snapshot/regression/**

These tests are intentionally small and deterministic. They lock in compiler behavior at the AST/lowering level and prevent “it works in the todo-app but regresses in a later refactor” outcomes.

Case Study: reduce_while Accumulator Updates (Todo-App Presence)

The todo-app “online users” UI exposed a correctness bug in the Enum.reduce_while lowering pipeline.

  • Symptom: Presence state existed, but derived UI lists stayed empty (@online_user_count rendered 0, no avatars).
  • Root cause: Haxe for loops lower into Enum.reduce_while(...), and list “updates” like Array.push/Array.concat become list ++ [value] on the Elixir target. A bug in accumulator-threading meant updates inside nested control-flow (if branches in the reducer) did not escape the recursion, so the reducer returned the original accumulator.
  • Fix: implemented in src/reflaxe/elixir/ast/transformers/ReduceWhileAccumulatorTransform.hx and supporting passes to ensure nested-branch updates become explicit accumulator rebindings that survive try wrappers and branch lowering.
  • Regression snapshot: test/snapshot/regression/ReduceWhileAccumulatorBranchUpdates/ guards:
    • nested if updates via names = names.concat([k])
    • statement-position views.push({...}) lowering into explicit accumulator rebindings

This regression is intentionally generic: any code that builds lists inside loops benefits, not just Presence.

Architecture Components

┌─────────────────┐    ┌─────────────────┐
│   TestRunner    │    │ ParallelTestRunner │
│  (Sequential)   │    │   (Parallel)    │
└─────────┬───────┘    └─────────┬───────┘
          │                      │
          │        ┌─────────────────┐
          └────────┤   TestCommon    ├────────┘
                   │ (Shared Utils)  │
                   └─────────────────┘

TestRunner.hx (Sequential)

  • Purpose: Traditional sequential test execution
  • Usage: Development debugging, CI fallback
  • Behavior: Processes tests one at a time
  • Output: Detailed differences for debugging

ParallelTestRunner.hx (Parallel)

  • Purpose: High-performance parallel test execution
  • Usage: Default test mode, development workflow
  • Behavior: 16 workers with file-based locking
  • Output: Boolean success/failure for speed

TestCommon.hx (Shared Utilities)

  • Purpose: Eliminate code duplication and ensure consistency
  • Functions: File operations, content normalization, directory comparison
  • Benefit: Single source of truth for test logic

Shared Functions

getAllFiles(dir: String, prefix: String = ""): Array<String>

Purpose: Recursively collect all files from a directory

// Usage examples
TestCommon.getAllFiles("test/out")           // ["Main.ex", "User.ex"]
TestCommon.getAllFiles("test/out", "src/")   // ["src/Main.ex", "src/User.ex"]

Features:

  • Handles non-existent directories gracefully (returns [])
  • Supports optional prefix for relative path construction
  • Platform-agnostic file system operations

normalizeContent(content: String, fileName: String = ""): String

Purpose: Normalize file content for reliable comparison

// Standard normalization
var normalized = TestCommon.normalizeContent(fileContent);

// Special handling for generated files
var normalized = TestCommon.normalizeContent(jsonContent, "_GeneratedFiles.json");

Features:

  • Line ending normalization: \r\n\n, \r\n
  • Whitespace cleanup: Trim trailing spaces, remove trailing empty lines
  • Special file handling: Filters incremental ID fields from _GeneratedFiles.json
  • Error resilience: Graceful fallback on parsing failures

Directory Comparison Functions

compareDirectoriesDetailed(actualDir, intendedDir): Array<String>

Purpose: Detailed comparison for TestRunner debugging

var differences = TestCommon.compareDirectoriesDetailed("out/", "intended/");
// Returns: ["Missing file: Main.ex", "Content differs: User.ex", "Extra file: Debug.ex"]

Used by: TestRunner for detailed error reporting

compareDirectoriesSimple(actualDir, intendedDir): Bool

Purpose: Fast boolean comparison for ParallelTestRunner

var matches = TestCommon.compareDirectoriesSimple("out/", "intended/");
// Returns: true or false

Used by: ParallelTestRunner for performance

Key Design Patterns

1. Graceful Directory Handling

// Both functions handle missing directories correctly
TestCommon.getAllFiles("/nonexistent/path")  // Returns []
TestCommon.compareDirectoriesSimple("out/", "intended/")  // Handles missing 'out'

2. Special File Processing

// _GeneratedFiles.json contains incrementing ID field that must be ignored
{
  "filesGenerated": ["Main.ex", "User.ex"],
  "id": 123,  // ← This changes on each compilation
  "version": 1
}

// normalizeContent() filters out the "id" line for consistent comparison

3. Two-Pattern Comparison

  • Detailed Pattern: Returns difference descriptions for debugging
  • Simple Pattern: Returns boolean for performance
  • Same Logic: Both use identical file processing and content normalization

Critical Bug Fixes Resolved

Issue 1: _GeneratedFiles.json False Failures

Problem: Incremental ID field caused test failures

// Intended
{"id": 120, "files": ["Main.ex"]}
// Actual  
{"id": 121, "files": ["Main.ex"]}

Solution: normalizeContent() filters ID lines with regex /^\s*"id"\s*:\s*\d+,?$/

Issue 2: Empty Directory Handling Divergence

Problem: ParallelTestRunner failed when no output generated

// OLD ParallelTestRunner (FAILED)
if (!sys.FileSystem.exists(actualDir) || !sys.FileSystem.exists(intendedDir)) {
    return false;  // ← Failed for missing actualDir
}

// NEW TestCommon approach (WORKS)
if (!sys.FileSystem.exists(intendedDir)) return false;  // Only check intended
var intended = getAllFiles(intendedDir);  // []
var actual = getAllFiles(actualDir);      // [] (handles missing dir)
return intended.length == actual.length;  // 0 == 0 = true ✅

Issue 3: Code Duplication Issues

Problem: ~100 lines of duplicated logic between test runners Solution: Centralized shared functions in TestCommon.hx Benefit: Single point of maintenance, guaranteed consistency

Performance Impact

Before TestCommon

  • Test Success: 57/57 (100%) - All tests passing with new file naming conventions
  • Maintenance: Duplicated logic in both runners
  • Reliability: Inconsistent behavior between sequential and parallel

After TestCommon

  • Test Success: 57/57 (100%) - Zero failures
  • Maintenance: Single source of truth for test logic
  • Reliability: Identical behavior guaranteed
  • Performance: No impact on 87-90% speed improvement

Usage Guidelines

For Test Development

import test.TestCommon;

// Use shared functions for consistent behavior
var files = TestCommon.getAllFiles(outputDir);
var content = TestCommon.normalizeContent(fileContent, fileName);
var matches = TestCommon.compareDirectoriesSimple(actual, intended);

For Test Runner Implementation

// Sequential runner - get detailed differences
var differences = TestCommon.compareDirectoriesDetailed(outPath, intendedPath);
if (differences.length > 0) {
    // Show detailed error messages
}

// Parallel runner - get fast boolean result  
var success = TestCommon.compareDirectoriesSimple(outPath, intendedPath);
return success;

Lessons Learned

  1. Shared Utilities Prevent Divergence: Code duplication leads to inconsistent behavior
  2. Special Case Handling: Generated files need special normalization (ID filtering)
  3. Directory Existence Logic: Be careful about existence checks - intended vs actual
  4. Performance vs Detail Trade-off: Two comparison functions serve different needs
  5. Testing the Tests: Test infrastructure needs its own reliability validation

Future Enhancements

  1. Test Result Caching: Cache normalized content for unchanged files
  2. Parallel Directory Operations: Parallelize file reading within directory comparison
  3. Smart Content Filtering: Configurable filtering rules for other generated files
  4. Test Isolation: Enhanced directory sandboxing for even better isolation
  5. Metrics Collection: Track test timing and failure patterns

Conclusion

The TestCommon.hx architecture ensures that Reflaxe.Elixir's testing infrastructure is:

  • Reliable: 100% test success rate
  • Maintainable: Single source of truth for test logic
  • Performant: No impact on parallel execution speed
  • Consistent: Identical behavior between sequential and parallel modes
  • Robust: Handles edge cases like missing directories and special file formats

This foundation supports the project's commitment to reliable, fast development workflows.