You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,5 +56,5 @@ This file helps AI agents discover and understand how to work with this reposito
56
56
- Expanded `python/t81/__init__.py` so the higher-level `t81` package re-exports the compiled binding helpers (`t81lib`, `BigInt`, `Limb`, `gemm_ternary`, etc.) while staying import-safe when the extension is unavailable.
57
57
- Added `scripts/ternary_quantization_benchmark.py` plus `BENCHMARKS.md` so contributors can reproduce a Fashion-MNIST FP32/PTQ/QAT benchmark and log accuracy/latency/storage for each mode; README now links the benchmark doc.
58
58
- Rewrote `pyproject.toml` with valid TOML sections so editable installs (and `pip install -e '.[torch]'`) can parse the metadata cleanly before building the extension.
59
-
- Restructured `README.md` into a onboarding-focused front door and added companion docs (`docs/use-cases.md`, `docs/hardware.md`, `docs/api-overview.md`, `docs/python-install.md`, `docs/torch.md`, `examples/README.md`) so heavy reference material lives outside the visitor-facing overview.
59
+
- Restructured `README.md` into a onboarding-focused front door and added companion docs (`docs/use-cases.md`, `docs/hardware.md`, `docs/api-overview.md`, `docs/python-install.md`, `docs/torch.md`, `docs/gpu.md`, `examples/README.md`) so heavy reference material lives outside the visitor-facing overview.
60
60
- Added optional CUDA/ROCm toggles plus a GPU dispatcher sketch (`include/t81/linalg/gemm_gpu.hpp`, `src/linalg/{gemm_cuda.cu,gemm_dispatch.cpp,gemm_rocm.cpp}`) so future teams can wire the new `where`/`clamp`/`lerp`/`addcmul` helpers into GPU kernels, introduced `t81::TensorMetadata` + Python helpers (`python/bindings.cpp`) that extract metadata from NumPy/Torch tensors, and expanded `tests/python/test_gpu_ops.py` to cover the metadata-backed bindings on both CPU and GPU paths.
`pip install .[torch]` unlocks the `t81lib`/`t81` namespace, NumPy quantization helpers, and the `t81.torch`/`t81.nn` layers that mix ternary weights with FP32/BF16 biases. Jump deeper via [docs/python-api.md](docs/python-api.md), [docs/python-cookbook.md](docs/python-cookbook.md), and [docs/torch.md](docs/torch.md).
99
100
100
-
## GPU backends & tensor metadata
101
+
## GPU backends
101
102
102
-
Enable CUDA/ROCm through the optional `-DUSE_CUDA=ON`and`-DUSE_ROCM=ON`flags during CMake configuration so the Python bindings link against the new GPU kernels (`python/CMakeLists.txt`). Once enabled, `t81lib.where`, `t81lib.clamp`, `t81lib.lerp`, and `t81lib.addcmul` accept either NumPy buffers or PyTorch tensors and route the work directly to CUDA/HIP kernels via the lightweight [`t81::TensorMetadata`](include/t81/tensor_metadata.hpp) ABI. The metadata struct carries device/dtype/shape/stride info plus raw `data_ptr`, letting the dispatcher avoid host copies and keep outputs on-device. When torch is installed, `t81lib` automatically wraps GPU tensors; when only NumPy is available it falls back to CPU buffers. Consult[docs/torch.md](docs/torch.md)and `python/bindings.cpp`for the extraction helpers and lifetime semantics.
103
+
Optional CUDA/ROCm backends can be enabled with `-DUSE_CUDA=ON`/`-DUSE_ROCM=ON` so the Python bindings link against the GPU kernels. `t81lib` exposes a compact `TensorMetadata` ABI that carries device, dtype, shape, and stride info, allowing `where`, `clamp`, `lerp`, and `addcmul` to work directly on NumPy arrays or Torch tensors. See[docs/gpu.md](docs/gpu.md) for build flags, device routing, and tensor metadata details.
103
104
104
105
## CLI helpers
105
106
@@ -130,7 +131,7 @@ See [docs/api-overview.md](docs/api-overview.md) for the full surface described
130
131
131
132
## Stability & compatibility
132
133
133
-
- Supported toolchains: recent Clang/GCC/MSVC or `pip install`’s compatible CPython builds; CMake config defaults to host SIMD if available (AVX2/AVX-512, NEON) while falling back to portable kernels.
134
+
- Supported toolchains: C++20-capable Clang/GCC/MSVC (or `pip install`’s compatible CPython builds) with CMake ≥ 3.22; the build auto-detects AVX2/AVX-512/NEON and falls back to portable kernels when those SIMD targets are unavailable.
134
135
- We track the ABI/API surface via `include/t81/t81lib.hpp`; expect the core headers to evolve until we reach a stable v1 release and consult [CHANGELOG.md](CHANGELOG.md) for migration notes.
CUDA/ROCm kernels can be built when you configure with `-DUSE_CUDA=ON` or `-DUSE_ROCM=ON` (see `python/CMakeLists.txt`). The bindings expose `t81lib.where`, `t81lib.clamp`, `t81lib.lerp`, and `t81lib.addcmul`, which accept either NumPy buffers or PyTorch tensors and dispatch directly to the GPU kernels.
4
+
5
+
Dispatch relies on `t81::TensorMetadata` (`include/t81/tensor_metadata.hpp`): a lightweight struct that carries device tags, dtype codes, shape, strides, and `data_ptr` so the dispatcher can call the right CUDA/HIP kernel without copies. When torch is available, `t81lib` automatically wraps tensors; without torch it gracefully falls back to CPU buffers. Review `python/bindings.cpp` for the extraction helpers and lifetime management.
Copy file name to clipboardExpand all lines: docs/index.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,6 +38,7 @@ to understand the balanced ternary engine without digging through specs immediat
38
38
from command-line workflows.
39
39
-**Use cases & demos** — [`docs/use-cases.md`](use-cases.md) and [`examples/README.md`](../examples/README.md) capture the canonical scripts, notebooks, and research stories.
40
40
-**Hardware simulation** — [`docs/hardware.md`](hardware.md) details `t81.hardware.TernaryEmulator`, fuzzy helpers, and the visualizer notebook.
41
+
-**GPU backends** — [`docs/gpu.md`](gpu.md) explains the CUDA/ROCm build flags and tensor metadata routing.
41
42
-**API overview** — [`docs/api-overview.md`](api-overview.md) summarizes the numeric containers and helpers exposed via `<t81/t81lib.hpp>`.
42
43
-**Tests & benchmarks** — [`tests/`](../tests/) documents the unit/property coverage while [`bench/`](../bench/) shows throughput patterns.
0 commit comments