Skip to content

jamone-de/jamone-nano-CAGS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JamOne Nano: Compressibility-Aware Gradient Surgery (CAGS)

Python PyTorch License Status Efficiency Architecture Precision CAGS

Important

Click above to verify the CAGS-Operator effectiveness via the interactive training trace (5,000 Step Sprint).


Abstract: Standard compression optimizes for sparsity; JamOne Nano optimizes for entropy-conformance. By implementing a custom gradient operator (CAGS), we actively sculpt neural weights during backpropagation to favor spatial coherence and ternary alignment. This allows a high-density, 31-pass recursive transformer (DIM 512) to converge within a strict 16MB zlib budget without sacrificing structural intelligence.

Official implementation for the OpenAI 16MB Efficiency Challenge.


Technical Innovation: CAGS (Gradient Surgery)

Standard compression techniques like L1 regularization or BitNet optimize for weight sparsity indirectly via the loss function. This serves as a proxy and does not directly correlate with bit-stream compressibility in high-entropy environments.

CAGS instrumentizes the backpropagation pass by scaling the gradient of each parameter proportional to its local zlib-compressibility gain when approaching ternary anchors ${-1, 0, +1}$.

Mathematical Foundation

Instead of a standard gradient $g = \nabla L(w)$, we apply the CAGS-Operator:

$$g_{cags} = g \cdot (1.0 + \alpha \cdot \Psi(w) \cdot \Phi(w))$$

Definitions:

  • $\Psi(w)$ (Run-Signal): A local density proxy favoring spatial coherence (zero-runs) to maximize zlib run-length encoding efficiency.
  • $\Phi(w)$ (Snap-Gain): Measurement of immediate distance to the nearest ternary anchor point.
  • $\alpha$ (Surgery-Scaling): A dynamically scheduled hyperparameter used to balance convergence stability and compression density.

Note: Specific windowing functions for $\Psi(w)$ and $\alpha$-scheduling trajectories are proprietary optimizations of the JamOne-Nano framework.


Architecture: JamOne-31-Recursive

To maximize architectural depth within the strict 16MB ZLIB budget, we utilize a high-density recursive transformer architecture with weight-sharing across all blocks.

Parameter Specification
Model Type Recursive Transformer (Weight-Sharing)
Block Count 31-Pass Sequential
Embedding Dimension 512
Vocabulary Size 1024
Precision Strategy Ternary-Pruned (CAGS-v6.4 optimized)

📊 Empirical Validation: The 50k Sprint

Verified via check_grant_potential.py on local CPU-trained checkpoints (v6.4) to demonstrate mathematical entropy conformance.

  • Model Parameters: 4.72 M (High-density Recursive)
  • Standard Entropy (zlib-9): 17.44 MB (Raw FP32 baseline)
  • Ternary Base-3 Compression: 0.93 MB (Projected Weight Volume)
  • Information Density: 1.5850 BPB (Bits Per Byte)
  • Shannon Efficiency: ~99.9% (Approaching theoretical limit)

Tip

View Raw Training Logs (testlog.txt) > Full convergence trace including CAGS-Surgery-Gain and Entropy-Metrics verified on ASH-Developer host (i5-13400).


Performance & Optimization (v6.4 Update)

  • I/O Strategy: Optimized via NumPy mmap (mode='r') for low-latency F-Drive data streaming.
  • UI Bridge: Lock-free Slint-Rust telemetry integration for real-time loss tracking.
  • Deployment: Standalone local PC installation (No-Docker requirement).

🛡️ Founder-Security-Policy (FSP)

This implementation of the CAGS-Operator and the JamOne-31-Recursive architecture is subject to the JamOne Founder-Security-Policy.

  • Integrity: All training logs are cryptographically traceable to the ASH-Developer host.
  • Usage: Commercial use or redistribution of the CAGS-weight-shaping logic requires explicit authorization.

Contact: hallo@jamone.de
Location: Germany

Packages

 
 
 

Contributors

Languages