Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
e4b2ea2
Add multi-threading/multi-core support to fraglets
claude Jan 10, 2026
55e70b2
Add .gitignore to exclude build artifacts
claude Jan 10, 2026
638d4c8
Add multi-threading benchmarks and graphviz stub for testing
claude Jan 10, 2026
974cb76
Update .gitignore to exclude test executables
claude Jan 10, 2026
a694fb3
Add comprehensive multi-threading benchmarks with visualization
claude Jan 10, 2026
c6444c8
Add partition and merge operations for parallel sorting
claude Jan 10, 2026
626cf08
Add comprehensive benchmarks showing 99% speedup with parallel sort
claude Jan 10, 2026
071cddf
Add large dataset benchmarks and threading tests
claude Jan 10, 2026
0a85731
Test 100K number dataset showing threading overhead still dominates
claude Jan 10, 2026
fd69b84
Add parallel-first fraglets redesign proposal
claude Jan 10, 2026
f771456
Implement spatial fraglets in Rust with near-linear speedup
claude Jan 10, 2026
8f30af6
Add comprehensive summary of Rust spatial fraglets success
claude Jan 10, 2026
d648743
Add performance visualizations showing near-linear speedup
claude Jan 10, 2026
1a477bc
Add full .fra file compatibility to Rust fraglets
claude Jan 10, 2026
5e2a2a9
Add migration guide showing Rust fraglets can replace C++
claude Jan 10, 2026
5021cd5
Add C++ compatibility status - Rust can fully replace C++
claude Jan 10, 2026
ca0e75e
Make Rust the primary implementation - remove C++
claude Jan 10, 2026
5d4e62e
Remove rust_impl subdirectory - everything now at root
claude Jan 10, 2026
d487a23
Add graphviz visualization support for molecule reaction networks
claude Jan 10, 2026
a911ab7
Add reaction tracking and fix critical bugs (sort still broken)
claude Jan 10, 2026
c736304
Fix critical bugs in split, empty, and length operations
claude Jan 11, 2026
2760935
Fix pop, pop2, and lt operations to match C++ behavior
claude Jan 11, 2026
6424530
Fix pop2 and exch operations - SORT NOW WORKS!
claude Jan 11, 2026
482e5cc
Implement pattern-based routing with persistent matchp rules
claude Jan 11, 2026
fd588dc
Add parallel execution benchmarks and design documentation
claude Jan 11, 2026
04fa32a
Add test scripts and analysis tools for parallel execution
claude Jan 11, 2026
650f5ef
Add MapReduce graphviz visualization tool
claude Jan 11, 2026
6620345
Add threading model comparison demonstrating need for regions
claude Jan 11, 2026
58fe2e1
Add demonstration of region limitations with sequential algorithms
claude Jan 11, 2026
d0d2d52
Add comprehensive automatic parallelism language design analysis
claude Jan 11, 2026
397ff8a
Add comprehensive NESL language examples and comparison
claude Jan 11, 2026
e2d4d5c
Add analysis of how chemistry achieves massive parallelism
claude Jan 11, 2026
928448d
Add analysis: Could we remove regions using shared pool instead?
claude Jan 11, 2026
22708eb
Add analysis: Could we parallelize only unimolecular operations?
claude Jan 11, 2026
a9c5bd1
Add analysis: Parallel unimol + sequential matchp architecture
claude Jan 11, 2026
e0575b8
Add honest assessment: What are regions good for if they break sort?
claude Jan 11, 2026
67fdb67
Add analysis: Can fraglets run on quantum computers?
claude Jan 11, 2026
247149a
Add analysis: What if we remove pattern matching from fraglets?
claude Jan 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Rust build artifacts
/target/
Cargo.lock

# Test binaries
/check_sort_distribution
/test_distribution
/test_heavy_parallel
/test_parallel_routing
/test_sort_roundrobin
/test_super_heavy
/parallel_benchmark
/run_mapreduce
/test_simple_threading
/compare_threading_models
/demonstrate_region_limitation
/test_lockfree_queue

# IDE files
.vscode/
.idea/
*.swp
*.swo

# OS files
.DS_Store
Thumbs.db

# Visualization outputs
*.dot
*.png
320 changes: 320 additions & 0 deletions AUTOMATIC_PARALLELISM_DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,320 @@
# Designing Languages for Automatic Parallelism

## The Challenge: What Makes Parallelism Hard?

In fraglets, we saw that automatic parallelism is hard because:
1. **Dependencies are implicit** - we can't tell if `[remain ...]` molecules need to react together
2. **No scoping** - all molecules are global, making it unclear what's independent
3. **Dynamic behavior** - reactions create arbitrary new molecules at runtime

## Existing Languages with Automatic Parallelism

### 1. **Haskell** (Lazy Functional)
Pure functional languages are naturally parallelizable:

```haskell
-- Automatic parallelism via purity
map (*2) [1..1000000] -- Can automatically parallelize!

-- Explicit parallelism hints
import Control.Parallel.Strategies
parMap rseq (*2) [1..1000000]
```

**Why it works:**
- **Pure functions** - no side effects, so safe to evaluate anywhere
- **Immutability** - no shared mutable state
- **Lazy evaluation** - can reorder computation safely

**Limitation:** Requires explicit strategies for control, not truly automatic

### 2. **Erlang/Elixir** (Actor Model)
Message-passing concurrency:

```elixir
# Each process is isolated
tasks = Enum.map(1..1000, fn i ->
Task.async(fn -> expensive_computation(i) end)
end)

results = Enum.map(tasks, &Task.await/1)
```

**Why it works:**
- **Isolated processes** - no shared memory
- **Message passing** - explicit communication
- **Supervision trees** - fault tolerance

**Limitation:** Programmer must structure code as actors, not automatic

### 3. **Chapel** (Parallel Iterators)
Designed for HPC with parallel-by-default iterators:

```chapel
// Automatically parallelizes
forall i in 1..1000 do
A[i] = compute(i);

// Data parallelism built-in
var A: [1..1000] real;
A = A * 2; // Automatic parallel execution
```

**Why it works:**
- **Parallel iterators** - default to parallel execution
- **Data locality** - explicit control over distribution
- **PGAS model** - partitioned global address space

**Getting closer:** But still requires programmer to use forall vs for

### 4. **Cilk** (Work Stealing)
Fork-join parallelism with provably efficient scheduling:

```c
int fib(int n) {
if (n < 2) return n;

int x = cilk_spawn fib(n-1); // Parallel
int y = fib(n-2);
cilk_sync;

return x + y;
}
```

**Why it works:**
- **Work stealing** - automatic load balancing
- **Provable bounds** - near-optimal scheduling
- **Composability** - spawn/sync compose well

**Limitation:** Requires explicit spawn annotations

### 5. **NESL** (Nested Data Parallelism)
Automatically parallelizes nested data-parallel operations:

```nesl
{sum(a) : a in partition(data)} % Automatic parallelism!
```

**Why it works:**
- **Flat data parallelism** - compiler flattens nested structures
- **Cost model** - predictable performance
- **Purely functional** - no side effects

**Closest to automatic:** Operations are parallel by default

## Design Principles for Auto-Parallel Languages

### Principle 1: **Make Independence Explicit**
```
// Bad: Implicit dependencies
[process data] // Unknown if independent

// Good: Explicit independence
[parallel work 1] // Marked as independent
[parallel work 2]
[parallel work 3]
```

### Principle 2: **Immutability by Default**
```
// Bad: Mutable state
let mut x = 0;
threads.map(|t| x += compute(t)) // Race condition!

// Good: Immutable with explicit reduction
let results = threads.map(|t| compute(t));
let x = results.sum(); // Safe parallel + reduce
```

### Principle 3: **Explicit Communication**
```
// Bad: Shared memory
global_state.update() // Hidden communication

// Good: Message passing
send(result, destination) // Explicit communication
```

### Principle 4: **Pure Computations**
```
// Bad: Side effects
fn compute(x) {
write_to_db(x); // Hidden side effect
return x * 2;
}

// Good: Pure function
fn compute(x) -> Result {
return (x * 2, WriteDB(x)); // Explicit effects
}
```

## Designing Parallel Fraglets

### Option 1: **Scoped Regions (Static)**
Add explicit parallel/sequential annotations:

```fraglets
# Declare independent work items
[parallel work 1]
[parallel work 2]
[parallel work 3]

# Matchp rules can still be global
[matchp work dup result]

# System knows these are independent!
```

**How it works:**
- Parser detects `[parallel ...]` prefix
- Routes by work ID automatically
- Guarantees no cross-dependencies

### Option 2: **Functional Fraglets**
Make molecules immutable:

```fraglets
# Old: Molecules consumed
[matchp work result] # Consumes 'work', creates 'result'

# New: Transform returns new molecules
work(1) -> result(1) # Functional transformation
```

**How it works:**
- No molecule mutation
- All reactions are pure transformations
- Can parallelize freely

### Option 3: **Type System for Locality**
Annotate molecule types with location:

```fraglets
# Local molecules (region-private)
[local work 1] : Local<Work>

# Global molecules (need synchronization)
[global counter] : Global<Counter>

# System routes Local automatically!
```

**How it works:**
- Type checker ensures Local doesn't escape
- Global requires explicit synchronization
- Compiler can parallelize Local

### Option 4: **Dataflow Dependencies**
Explicit dependency graph:

```fraglets
# Declare data dependencies
work1 : [] # No dependencies
work2 : [work1] # Depends on work1
work3 : [] # Independent

# System builds dependency graph!
```

**How it works:**
- Static analysis of dependencies
- Parallel execution of independent nodes
- Sequential execution of dependent chains

## The Best Design: Hybrid Approach

Combine multiple strategies:

```fraglets-parallel
# 1. Scoped parallel blocks
parallel {
[work 1]
[work 2]
[work 100]
}

# 2. Pure matchp rules (immutable)
matchp work(N) -> result(N) {
dup -> step1
step1 -> result
}

# 3. Explicit global state
global {
[counter 0]
}

# 4. Barrier synchronization
parallel {
[work 1]
[work 2]
}
barrier # Wait for all work to complete

sequential {
[sort data] # Sequential algorithm
}
```

**Advantages:**
- `parallel {}` blocks route by hash automatically
- Pure functions enable safe parallelism
- Global state explicit and limited
- Programmer control when needed

## Real-World Example: TensorFlow

TensorFlow achieves automatic parallelism via:

```python
# Implicitly parallel (dataflow graph)
a = tf.constant([1, 2, 3])
b = tf.constant([4, 5, 6])
c = a + b # Automatic parallelism!
```

**How:**
1. **Dataflow graph** - operations are nodes, data flows on edges
2. **Static analysis** - no dependencies = parallel
3. **Device placement** - automatic GPU/CPU assignment
4. **Runtime scheduling** - dynamic load balancing

## Recommendation for Fraglets

Add minimal syntax for explicit parallelism:

```fraglets
# Mark independent work with @parallel
@parallel
[work 1]
[work 2]
[work 100]

# Regular molecules (sequential)
[sort 27 numbers]
```

**Implementation:**
- Parser detects `@parallel` prefix
- Routes by hash(molecule_id) automatically
- No changes to runtime needed
- Programmer makes informed choice

This is **honest**: parallelism requires programmer understanding, but syntax makes it easy.

## Summary

**Existing languages that achieve automatic parallelism:**
1. Haskell (purity + laziness)
2. Chapel (parallel iterators)
3. NESL (nested data parallelism)
4. TensorFlow (dataflow graphs)

**Key insight:** True automatic parallelism requires either:
- **Purity** (no side effects) - Haskell
- **Explicit structure** (parallel blocks) - Chapel
- **Dataflow** (explicit dependencies) - TensorFlow

**For fraglets:** Add `@parallel` annotations to mark independent work, maintaining simplicity while enabling programmer control.
Loading